Skip to content

Conversation

@JSCU-CNI
Copy link

This PR adds a LevelDB storage implementation to dissect.database.

Also adds support for serialization formats building on top of LevelDB: IndexedDB, and Chromium's LocalStorage and SessionStorage. Please let us know if these formats should be structured differently in this project.

Makes use of two (pure Python and/or Rust) dependencies: cramjam (for LevelDB Snappy decompression) and v8serialize (for IndexedDB v8 javascript object deserialization). We do not have the time or resources to port these dependencies to dissect.util or dissect.* - hopefully these dependencies can be accepted.

@JSCU-CNI
Copy link
Author

JSCU-CNI commented Nov 5, 2025

Please let us know if there is anything we can do to move this PR forward or to ease the review process.

@Schamper
Copy link
Member

Schamper commented Nov 5, 2025

Please let us know if there is anything we can do to move this PR forward or to ease the review process.

If you could clone me, that'd be great.

Unfortunately this is a huge PR and I simply have not gotten around to looking at it yet. Between reviewing all other PRs and working on large PRs myself, too, I'm simply stretched thin.
What could help is if you could provide me with a prioritization of PRs from your side, so I can look at them in that order.

Copy link
Member

@Schamper Schamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How you doin'? There's a lot of unnecessary patterns in here (class level type hints for no apparent reason, methods that could easily be inlined or more easily replaced by inheritance, you can take my comments on that in the earlier files as generic comments over the rest as well (it's very slow to review a large PR on GitHub).


for record in self._leveldb.records:
if record.state == c_leveldb.RecordState.LIVE and (
record.key[0:5] == b"META:" or record.key[0:11] == b"METAACCESS:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
record.key[0:5] == b"META:" or record.key[0:11] == b"METAACCESS:"
record.key.startswith((b"META:", b"METAACCESS:"))

Comment on lines +52 to +53
meta_keys.setdefault(meta_key.key, [])
meta_keys[meta_key.key].append(meta_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
meta_keys.setdefault(meta_key.key, [])
meta_keys[meta_key.key].append(meta_key)
meta_keys.setdefault(meta_key.key, []).append(meta_key)

Comment on lines +55 to +56
for meta in meta_keys.values():
yield Store(self, meta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for meta in meta_keys.values():
yield Store(self, meta)
return [Store(self, meta) for meta in meta_keys.values()]

Comment on lines +68 to +71

host: str
records: list[Key]
meta: list[MetaKey]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
host: str
records: list[Key]
meta: list[MetaKey]

Comment on lines +83 to +98
@property
def records(self) -> Iterator[RecordKey]:
"""Yield all records related to this store."""

if self._records:
yield from self._records

# e.g. with "_https://google.com\x00\x01MyKey", the prefix would be "_https://google.com\x00"
prefix = RecordKey.prefix + self.host.encode("iso-8859-1") + b"\x00"
prefix_len = len(prefix)

for record in self._local_storage._leveldb.records:
if record.key[:prefix_len] == prefix:
key = RecordKey(self, record.key, record.value, record.state, record.sequence)
self._records.append(key)
yield key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things about this:

  • A property generator doesn't feel very safe/stable
  • The cache is dangerous: as soon as you do a single generator iteration (but don't exhaust the generator) you'll have an issue where it will only iterate the up-until-then read records.
  • The cache in it's current implementation will yield duplicate records (it doesn't exit after reading from the cached records)

It's probably fine not caching this.

)


class BlinkHostObjectHandlerDecodeError(v8serialize.DecodeV8SerializeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing this file will now fail when the dependency is missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this more a LevelDB util?

- https://github.com/protocolbuffers/protobuf/blob/main/python/google/protobuf/internal/decoder.py
"""

varint_limit: int = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just limit or LIMIT might be a better name.

]

leveldb = [
"cramjam>=2.11.0,<3", # required for snappy decompression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can hopefully soon be replaced with dissect.util once that's merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these files be in tests/_data/indexeddb for the IndexedDB ones?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants