Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/development.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
uses: actions/checkout@v2
with:
repository: datajoint/datajoint-python
ref: master
ref: pre/v2.2
path: datajoint-python
- name: Compile docs static artifacts
run: |
Expand Down
9 changes: 7 additions & 2 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ services:
condition: service_healthy
postgres:
condition: service_healthy
minio:
condition: service_healthy
command:
- sh
- -c
Expand All @@ -93,13 +95,16 @@ services:
elif echo "$${MODE}" | grep -i execute_pg &>/dev/null; then
# EXECUTE_PG mode: execute notebooks against PostgreSQL
pip install -e "/datajoint-python[postgres]"
pip install scikit-image pooch nbconvert
pip install scikit-image pooch nbconvert matplotlib faker zarr
mkdir -p /tmp/datajoint-tutorials
echo "Executing notebooks against PostgreSQL..."
export DJ_HOST=postgres
python scripts/execute_notebooks.py --backend postgresql
elif echo "$${MODE}" | grep -i execute &>/dev/null; then
# EXECUTE mode: execute notebooks against MySQL (default)
pip install -e /datajoint-python
pip install scikit-image pooch nbconvert
pip install scikit-image pooch nbconvert matplotlib faker zarr
mkdir -p /tmp/datajoint-tutorials
echo "Executing notebooks against MySQL..."
python scripts/execute_notebooks.py --backend mysql
else
Expand Down
7 changes: 6 additions & 1 deletion mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ nav:
- Overview:
- Data Pipelines: explanation/data-pipelines.md
- What's New in 2.0: explanation/whats-new-2.md
- What's New in 2.2: explanation/whats-new-22.md
- FAQ: explanation/faq.md
- Data Model:
- Relational Workflow Model: explanation/relational-workflow-model.md
Expand Down Expand Up @@ -48,12 +49,14 @@ nav:
- JSON Data Type: tutorials/advanced/json-type.ipynb
- Distributed Computing: tutorials/advanced/distributed.ipynb
- Custom Codecs: tutorials/advanced/custom-codecs.ipynb
- Instances: tutorials/advanced/instances.ipynb
- How-To:
- how-to/index.md
- Setup:
- Installation: how-to/installation.md
- Manage Secrets: how-to/manage-secrets.md
- Configure Database: how-to/configure-database.md
- Use Isolated Instances: how-to/use-instances.md
- Configure Object Storage: how-to/configure-storage.md
- Command-Line Interface: how-to/use-cli.md
- Schema Design:
Expand Down Expand Up @@ -114,6 +117,8 @@ nav:
- AutoPopulate: reference/specs/autopopulate.md
- Job Metadata: reference/specs/job-metadata.md
- Object Store Configuration: reference/specs/object-store-configuration.md
- Instance & Thread Safety:
- Thread-Safe Mode: reference/specs/thread-safe-mode.md
- Configuration: reference/configuration.md
- Definition Syntax: reference/definition-syntax.md
- Operators: reference/operators.md
Expand Down Expand Up @@ -222,7 +227,7 @@ markdown_extensions:
generic: true
extra:
generator: false # Disable watermark
datajoint_version: "2.1" # DataJoint Python version this documentation covers
datajoint_version: "2.2" # DataJoint Python version this documentation covers
social:
- icon: main/company-logo
link: https://www.datajoint.com
Expand Down
11 changes: 11 additions & 0 deletions src/about/citation.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,17 @@ If your work utilizes **DataJoint Elements**, please cite the following manuscri

You should also cite the **DataJoint Core manuscript** detailed below.

## Citing DataJoint 2.0

For work using **DataJoint 2.0** or the **Relational Workflow Model**, cite the following
manuscript:

- **Manuscript**: Yatsenko D, Nguyen TT. DataJoint 2.0: A Computational Substrate for
Agentic Scientific Workflows. arXiv:2602.16585. 2026 Feb 18. doi:
https://doi.org/10.48550/arXiv.2602.16585

- **RRID**: [RRID:SCR_014543](https://scicrunch.org/resolver/SCR_014543)

## Citing the DataJoint Relational Model

For any work relying on the **DataJoint Relational Model**, include the following
Expand Down
2 changes: 2 additions & 0 deletions src/explanation/whats-new-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ DataJoint 2.0 is a major release that establishes DataJoint as a mature framewor
>
> This page summarizes new features and concepts. For step-by-step migration instructions, see the **[Migration Guide](../how-to/migrate-to-v20.md/)**.

> **Citation:** Yatsenko D, Nguyen TT. *DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows.* arXiv:2602.16585. 2026. [doi:10.48550/arXiv.2602.16585](https://doi.org/10.48550/arXiv.2602.16585)

## Overview

DataJoint 2.0 introduces fundamental improvements to type handling, job coordination, and object storage while maintaining compatibility with your existing pipelines during migration. Key themes:
Expand Down
209 changes: 209 additions & 0 deletions src/explanation/whats-new-22.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# What's New in DataJoint 2.2

DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.

> **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.

> **Citation:** Yatsenko D, Nguyen TT. *DataJoint 2.0: A Computational Substrate for Agentic Scientific Workflows.* arXiv:2602.16585. 2026. [doi:10.48550/arXiv.2602.16585](https://doi.org/10.48550/arXiv.2602.16585)

## Overview

DataJoint has traditionally used a global singleton pattern: one configuration (`dj.config`), one connection (`dj.conn()`), shared across all tables in a process. This works well for interactive sessions and single-user scripts, but breaks down when:

- A web server handles requests for different databases simultaneously
- A notebook connects to production and staging databases side by side
- Tests need isolated databases that don't interfere with each other
- Parallel pipelines need independent connections

DataJoint 2.2 solves this with `dj.Instance`—an object that bundles its own configuration and connection, independent of global state.

## `dj.Instance` API

An Instance encapsulates a config and connection pair. Create one by providing database credentials directly:

```python
import datajoint as dj

inst = dj.Instance(host="localhost", user="root", password="secret")
```

Then use `inst.Schema()` instead of `dj.Schema()`:

```python
schema = inst.Schema("my_database")

@schema
class Experiment(dj.Manual):
definition = """
experiment_id : int32
---
description : varchar(255)
"""
```

Tables defined this way use the Instance's connection—completely independent of `dj.config` and `dj.conn()`.

### Instance Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `host` | str | — | Database hostname (required) |
| `user` | str | — | Database username (required) |
| `password` | str | — | Database password (required) |
| `port` | int | from config | Database port (default: 3306 for MySQL, 5432 for PostgreSQL) |
| `use_tls` | bool or dict | `None` | TLS configuration |
| `**kwargs` | — | — | Config overrides (e.g., `safemode=False`) |

### Instance Methods

| Method | Description |
|--------|-------------|
| `inst.Schema(name)` | Create a Schema bound to this Instance's connection |
| `inst.FreeTable(full_name)` | Create a FreeTable bound to this Instance's connection |
| `inst.config` | Access this Instance's Config object |
| `inst.connection` | Access this Instance's Connection object |

### Config Overrides

Pass any config setting as a keyword argument. Use double underscores for nested settings:

```python
inst = dj.Instance(
host="localhost", user="root", password="secret",
safemode=False,
database__reconnect=False,
)
```

## Multiple Databases

Instances make it straightforward to work with multiple databases simultaneously:

```python
production = dj.Instance(host="prod.example.com", user="analyst", password="...")
staging = dj.Instance(host="staging.example.com", user="dev", password="...")

prod_schema = production.Schema("experiment_data")
staging_schema = staging.Schema("experiment_data")

# Query both independently
prod_data = ProdTable.to_dicts()
staging_data = StagingTable.to_dicts()
```

Each Instance maintains its own connection pool and configuration—no cross-contamination.

## Thread-Safe Mode

For applications where global state is dangerous (web servers, multi-threaded workers), enable thread-safe mode:

```bash
export DJ_THREAD_SAFE=true
```

When thread-safe mode is enabled:

- `dj.config` raises `ThreadSafetyError` on any access
- `dj.conn()` raises `ThreadSafetyError`
- `dj.Schema()` without an explicit connection raises `ThreadSafetyError`
- Only `dj.Instance()` works, enforcing explicit connection management

This prevents accidental use of shared global state in concurrent environments.

### `ThreadSafetyError`

```python
import os
os.environ["DJ_THREAD_SAFE"] = "true"

import datajoint as dj

dj.config.database.host # raises ThreadSafetyError
dj.conn() # raises ThreadSafetyError

# Instead, use Instance:
inst = dj.Instance(host="localhost", user="root", password="secret")
schema = inst.Schema("my_db") # works
```

### Environment Variable

| Variable | Values | Default | Description |
|----------|--------|---------|-------------|
| `DJ_THREAD_SAFE` | `true`, `1`, `yes` / `false`, `0`, `no` | `false` | Enable thread-safe mode |

## Connection-Scoped Config

Each Instance carries its own `Config` object. Runtime configuration reads go through the Instance's config, not global state:

```python
inst = dj.Instance(host="localhost", user="root", password="secret")

# Instance-scoped config
inst.config.safemode = False
inst.config.display.limit = 25

# Global config is unaffected
print(dj.config.safemode) # still True (default)
```

Tables created through an Instance's Schema read config from that Instance's connection, not from `dj.config`.

## When to Use Instances

| Scenario | Pattern |
|----------|---------|
| Interactive notebook, single database | `dj.config` + `dj.Schema()` (global pattern) |
| Script connecting to one database | Either pattern works |
| Web server (Flask, FastAPI, Django) | `dj.Instance()` per request/tenant |
| Multi-database comparison | One `dj.Instance()` per database |
| Parallel workers | `dj.Instance()` per worker + `DJ_THREAD_SAFE=true` |
| Test suite | `dj.Instance()` per test for isolation |
| Shared notebook server | `dj.Instance()` per user session |

## Comparison: Global vs Instance

### Global Pattern (unchanged)

```python
import datajoint as dj

# Config set via environment, files, or programmatically
dj.config["database.host"] = "localhost"

schema = dj.Schema("my_db")

@schema
class MyTable(dj.Manual):
definition = """
id : int32
---
value : float64
"""
```

### Instance Pattern (new in 2.2)

```python
import datajoint as dj

inst = dj.Instance(host="localhost", user="root", password="secret")
schema = inst.Schema("my_db")

@schema
class MyTable(dj.Manual):
definition = """
id : int32
---
value : float64
"""
```

Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.

## See Also

- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
- [Configure Database](../how-to/configure-database.md/) — Connection setup
29 changes: 29 additions & 0 deletions src/how-to/configure-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,32 @@ conn = dj.conn()
conn.close()
```

## Instance-Based Connections

!!! version-added "New in 2.2"
`dj.Instance` provides isolated connections independent of global config.

For applications that need multiple connections or thread safety, use `dj.Instance` instead of global config:

```python
import datajoint as dj

inst = dj.Instance(host="db.example.com", user="myuser", password="mypassword")
schema = inst.Schema("my_schema")
```

Each Instance has its own config and connection. This is useful for:

- **Web servers**: One Instance per request or tenant
- **Testing**: Isolated databases per test
- **Multi-database**: Connect to production and staging simultaneously
- **Thread safety**: Set `DJ_THREAD_SAFE=true` to enforce Instance usage

```python
# Multiple simultaneous connections
prod = dj.Instance(host="prod.example.com", user="analyst", password="...")
staging = dj.Instance(host="staging.example.com", user="dev", password="...")
```

See [Use Isolated Instances](use-instances.md/) for a complete guide.

Loading