Data Systems Engineering

A curated portfolio of data systems projects focused on performance-aware design, scalability, and real-world data processing.
These projects explore how modern data platforms are built—from relational query optimization to distributed analytics and storage-engine internals.

The repository demonstrates hands-on experience with database design, query execution, distributed computation, spatial analytics, and NoSQL storage systems, using industry-relevant tools and frameworks.

Projects

Relational Query Engineering

Design and optimization of a relational database using PostgreSQL, emphasizing schema modeling, relationships, constraints, and efficient query execution. The project highlights practical aspects of SQL performance tuning and data organization.

Spark Spatial Analytics

Implementation of distributed spatial queries using Apache Spark and SparkSQL. Includes range queries, distance queries, and spatial joins implemented through custom UDFs, showcasing scalable geospatial data processing.

Spatiotemporal Gi* Hotspots

A distributed spatiotemporal hotspot detection pipeline using Apache Spark. The project applies the Getis-Ord Gi* statistic over space-time grids to identify statistically significant activity clusters at scale.

Embedded NoSQL Storage (RocksDB)

A C++-based embedded key-value storage layer built on RocksDB. Demonstrates core NoSQL storage concepts such as batch ingestion, multi-key retrieval, range scans, and persistent data management using an LSM-tree architecture.

Technology Stack

PostgreSQL • Apache Spark • SparkSQL • Scala • C++ • RocksDB • SQL • Distributed Systems

Datasets are intentionally not included due to size and licensing constraints. Each project README provides instructions on expected input formats and execution.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
embedded-nosql-storage-rocksdb		embedded-nosql-storage-rocksdb
relational-query-engineering		relational-query-engineering
spark-spatial-analytics		spark-spatial-analytics
spatiotemporal-gi-star-hotspots		spatiotemporal-gi-star-hotspots
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Systems Engineering

Projects

Relational Query Engineering

Spark Spatial Analytics

Spatiotemporal Gi* Hotspots

Embedded NoSQL Storage (RocksDB)

Technology Stack

About

Uh oh!

Releases

Packages

Languages

JananyaPS/Data-Systems-Engineering

Folders and files

Latest commit

History

Repository files navigation

Data Systems Engineering

Projects

Relational Query Engineering

Spark Spatial Analytics

Spatiotemporal Gi* Hotspots

Embedded NoSQL Storage (RocksDB)

Technology Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages