Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
4518b36
Add initial specification for file column type
claude Dec 20, 2025
ba3c66b
Revise file type spec: unified storage backend with fsspec
claude Dec 20, 2025
965a30f
Update file type spec to use existing datajoint.json settings
claude Dec 20, 2025
667e740
Add filename collision avoidance and transaction handling to spec
claude Dec 20, 2025
9d3e194
Major spec revision: files/folders, transactions, fetch handles
claude Dec 20, 2025
93559a4
Update path structure: field after PK, add partition pattern
claude Dec 20, 2025
dc1c899
Add PK value encoding rules for paths
claude Dec 20, 2025
5f27b75
Clarify orphan cleanup as separate maintenance procedure
claude Dec 20, 2025
4f15c90
Add legacy type deprecation notice
claude Dec 20, 2025
af6cef2
Add store metadata and client verification mechanism
claude Dec 20, 2025
ec2e737
Simplify store metadata - remove schema tracking
claude Dec 20, 2025
b32ef8d
Rename type from 'file' to 'object'
claude Dec 20, 2025
93ce01e
Add Zarr compatibility: staged insert and fsspec access
claude Dec 20, 2025
997d992
Finalize staged_insert1 API for direct object storage writes
claude Dec 20, 2025
36806cc
Simplify object naming: field name as base, extension from source
claude Dec 20, 2025
6c6349b
Restructure store paths: objects/ after table, rename store config
claude Dec 20, 2025
0ea880a
Make content hashing optional, add folder manifests
claude Dec 21, 2025
c340ec7
Clarify folder manifest storage location and rationale
claude Dec 21, 2025
6cd9b9c
Add optional database_host and database_name to store metadata
claude Dec 21, 2025
38844f1
Highlight no hidden tables - key architectural difference
claude Dec 21, 2025
d65ece7
Refactor external storage to use fsspec for unified backend
claude Dec 21, 2025
4b7e7bd
Fix unused imports (ruff lint)
claude Dec 21, 2025
949b8a6
Fix ruff-format: add blank lines after local imports
claude Dec 21, 2025
0019109
Implement object column type for managed file storage
claude Dec 21, 2025
b45df2c
Fix ruff lint: line length and unused imports
claude Dec 21, 2025
adf4305
Fix unused imports (ruff lint)
claude Dec 21, 2025
095753f
Add documentation for object column type
claude Dec 21, 2025
08838f6
Fix ruff-format: code formatting adjustments
claude Dec 21, 2025
3da69fd
Add pytest tests for object column type
claude Dec 21, 2025
944c9be
Fix E402: move schema_object import to top of file
claude Dec 21, 2025
752248c
Fix unused imports (ruff lint)
claude Dec 21, 2025
7ef4e61
Fix ruff-format: add blank lines after local imports
claude Dec 21, 2025
15418c3
Address Zarr reviewer feedback: optional metadata fields
claude Dec 22, 2025
fb8c0cb
Add Augmented Schema vs External References section
claude Dec 22, 2025
a9447e7
Rename file-type-spec.md to object-type-spec.md
claude Dec 22, 2025
5170ab1
Fix ruff-format: single line error message
claude Dec 22, 2025
3e32188
Simplify ExternalTable storage initialization
claude Dec 22, 2025
4e90c1e
Clarify staged insert compatibility: Zarr/TileDB yes, HDF5 no
claude Dec 22, 2025
5a727d2
Add remote URL support for copy insert
claude Dec 22, 2025
4bdc882
Remove redundant self.spec attribute from ExternalTable
claude Dec 22, 2025
cc96f03
Fix ruff-format: single line error message in upload_filepath
claude Dec 22, 2025
b2bc219
Merge branch claude/add-type-aliases-6uN3E
claude Dec 22, 2025
8ee058a
Merge pre/v2.0
claude Dec 22, 2025
36f3bb7
Merge pre/v2.0
claude Dec 22, 2025
d5439cf
Address reviewer feedback on object type spec
claude Dec 23, 2025
54460ed
Fix ruff lint and format issues in preview.py
claude Dec 23, 2025
052a40b
Add access control patterns section to spec
claude Dec 23, 2025
260a43a
Update object type spec for multi-store support
claude Dec 24, 2025
5ed7329
Implement multi-store support for object type
claude Dec 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions docs/src/client/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,3 +164,57 @@ Configure external stores in the `stores` section. See [External Storage](../sys
}
}
```

## Object Storage

Configure object storage for the [`object` type](../design/tables/object.md) in the `object_storage` section. This provides managed file and folder storage with fsspec backend support.

### Local Filesystem

```json
{
"object_storage": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the use of the term "object storage" to describe something on the local file system might end up confusing some people, since usually "object storage" (as a storage technology like aws s3) is used to describe an alternative to the local file system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest instead? We do want to drive folks to object storage, even if local, but the system will support both object and file storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just use the term "storage" here? I think the only ambiguity would then be between the storage details of the DB itself and the storage used for the stuff referenced by the DB. Since "object" is somewhat overloaded, maybe the word "asset" works better? You'd have "Asset storage" and "Database storage".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will just stick with object_storage for consistency. The choice to use a file system as object storage is on them. We are steering them toward object storage. This aligns with the object type in the tables and the "Object-Augmented Schema" chapter in the documentation.

"project_name": "my_project",
"protocol": "file",
"location": "/data/my_project"
}
}
```

### Amazon S3

```json
{
"object_storage": {
"project_name": "my_project",
"protocol": "s3",
"bucket": "my-bucket",
"location": "my_project",
"endpoint": "s3.amazonaws.com"
}
}
```

### Object Storage Settings

| Setting | Environment Variable | Required | Description |
|---------|---------------------|----------|-------------|
| `object_storage.project_name` | `DJ_OBJECT_STORAGE_PROJECT_NAME` | Yes | Unique project identifier |
| `object_storage.protocol` | `DJ_OBJECT_STORAGE_PROTOCOL` | Yes | Backend: `file`, `s3`, `gcs`, `azure` |
| `object_storage.location` | `DJ_OBJECT_STORAGE_LOCATION` | Yes | Base path or bucket prefix |
| `object_storage.bucket` | `DJ_OBJECT_STORAGE_BUCKET` | For cloud | Bucket name |
| `object_storage.endpoint` | `DJ_OBJECT_STORAGE_ENDPOINT` | For S3 | S3 endpoint URL |
| `object_storage.partition_pattern` | `DJ_OBJECT_STORAGE_PARTITION_PATTERN` | No | Path pattern with `{attr}` placeholders |
| `object_storage.token_length` | `DJ_OBJECT_STORAGE_TOKEN_LENGTH` | No | Random suffix length (default: 8) |
| `object_storage.access_key` | — | For cloud | Access key (use secrets) |
| `object_storage.secret_key` | — | For cloud | Secret key (use secrets) |

### Object Storage Secrets

Store cloud credentials in the secrets directory:

```
.secrets/
├── object_storage.access_key
└── object_storage.secret_key
```
3 changes: 3 additions & 0 deletions docs/src/design/tables/attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ info).
These types abstract certain kinds of non-database data to facilitate use
together with DataJoint.

- `object`: managed [file and folder storage](object.md) with support for direct writes
(Zarr, HDF5) and fsspec integration. Recommended for new pipelines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would hdf5 support direct writes on cloud storage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, updating. I haven't worked with HDF5 in a while. It's insane that they do not yet support direct writes.


- `attach`: a [file attachment](attach.md) similar to email attachments facillitating
sending/receiving an opaque data file to/from a DataJoint pipeline.

Expand Down
Loading
Loading