This project implements a Key-Value storage engine as a console application. The system supports basic operations and builds upon the Log-Structured Merge-Tree (LSM) architecture to ensure efficient data persistence and retrieval.
- PUT(key, value): Insert a new record. Keys are strings, values are byte arrays (or strings converted to byte arrays).
- GET(key): Retrieve a value by key.
- DELETE(key): Remove a record by key.
These operations rely on a well-defined write path (WAL → Memtable → SSTable) and read path (Memtable → Cache → SSTables with Bloom filters, Summary, Index, Data).
- Write-Ahead Log (WAL): Records all operations for durability. Implemented as segmented logs with CRC integrity checks.
- Memtable: In-memory structure (hash map, skip list, or B-tree) for fast inserts and deletes. Flushed to disk when full. Supports multiple instances (N-1 read-only, 1 read-write).
- SSTable: Immutable disk structures containing Data, Bloom Filter, Index, Summary, and Metadata (Merkle tree for validation).
- Checks Memtable, then Cache, then queries SSTables using Bloom filters and index structures until the record is found.
- Organizes SSTables into levels.
- Supports size tiered compaction algorithm.
- Implemented with LRU policy to accelerate lookups.
- Handles block-level disk I/O with Block Cache (LRU). Supports block sizes of 4KB, 8KB, or 16KB.
- External configuration via JSON for system parameters.
- Rate limiting with Token Bucket algorithm.
- Working with Probabilistic data structures: Bloom Filter, Count-Min Sketch, HyperLogLog, and SimHash (fingerprint + Hamming distance).