Add usage instructions README + job state constants + change insert many return (#19)

brandur · web-flow · commit a03cc7904764 · 2024-07-04T21:00:45.000-07:00
Flesh out the README to add detailed usage instructions. The README also
appears on PyPI, so it's nice to have extra information on what to do
there. It'll also be very closely mirrored on River's docs site.

Also, add job state constants like `JOB_STATE_AVAILABLE`, similar to
what's available in Ruby in case anyone needs them (they're useful for
use in `UniqueOpts`).

Lastly, change the return value of `insert_many`/`insert_many_tx` to be
the number of rows inserted instead of a list of inserted jobs to match
what's returned in Go and Ruby. The rationale here is that if someone is
using the bulk insert functions then they're probably concerned about
performance, so it's probably not worth the cost of returning the whole
set of jobs back, which are probably used infrequently anyway. These
functions aren't implemented yet anyway, so the change isn't a problem.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,4 +17,4 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 
-- Initial release, supporting insertion through [SQLAlchemy](https://www.sqlalchemy.org/) and its underlying Postgres drivers like [psycopg2](https://pypi.org/project/psycopg2/) or [asyncpg](https://github.com/MagicStack/asyncpg) (for async).
+- Initial release, supporting insertion through [SQLAlchemy](https://www.sqlalchemy.org/) and its underlying Postgres drivers like [`psycopg2`](https://pypi.org/project/psycopg2/) or [`asyncpg`](https://github.com/MagicStack/asyncpg) (for async).
diff --git a/README.md b/README.md
@@ -2,6 +2,176 @@
 
 An insert-only Python client for [River](https://github.com/riverqueue/river) packaged in the [`riverqueue` package on PyPI](https://pypi.org/project/riverqueue/). Allows jobs to be inserted in Python and run by a Go worker, but doesn't support working jobs in Python.
 
+## Basic usage
+
+Your project should bundle the [`riverqueue` package](https://pypi.org/project/riverqueue/) in its dependencies. How to go about this depend on your toolchain, but for example in [Rye](https://github.com/astral-sh/rye), it'd look like:
+
+```shell
+rye add riverqueue
+```
+
+Initialize a client with:
+
+```python
+import riverqueue
+from riverqueue.driver import riversqlalchemy
+
+engine = sqlalchemy.create_engine("postgresql://...")
+client = riverqueue.Client(riversqlalchemy.Driver(engine))
+```
+
+Define a job and insert it:
+
+```python
+@dataclass
+class SortArgs:
+    strings: list[str]
+
+    kind: str = "sort"
+
+    def to_json(self) -> str:
+        return json.dumps({"strings": self.strings})
+
+insert_res = client.insert(
+    SortArgs(strings=["whale", "tiger", "bear"]),
+)
+insert_res.job # inserted job row
+```
+
+Job args should comply with the following [protocol](https://peps.python.org/pep-0544/):
+
+```python
+class Args(Protocol):
+    kind: str
+
+    def to_json(self) -> str:
+        pass
+```
+
+* `kind` is a unique string that identifies them the job in the database, and which a Go worker will recognize.
+* `to_json()` defines how the job will serialize to JSON, which of course will have to be parseable as an object in Go.
+
+They may also respond to `insert_opts()` with an instance of `InsertOpts` to define insertion options that'll be used for all jobs of the kind.
+
+We recommend using [`dataclasses`](https://docs.python.org/3/library/dataclasses.html) for job args since they should ideally be minimal sets of primitive properties with little other embellishment, and `dataclasses` provide a succinct way of accomplishing this.
+
+## Insertion options
+
+Inserts take an `insert_opts` parameter to customize features of the inserted job:
+
+```python
+insert_res = client.insert(
+    SortArgs(strings=["whale", "tiger", "bear"]),
+    insert_opts=riverqueue.InsertOpts(
+        max_attempts=17,
+        priority=3,
+        queue: "my_queue",
+        tags: ["custom"]
+    ),
+)
+```
+
+## Inserting unique jobs
+
+[Unique jobs](https://riverqueue.com/docs/unique-jobs) are supported through `InsertOpts.unique_opts()`, and can be made unique by args, period, queue, and state. If a job matching unique properties is found on insert, the insert is skipped and the existing job returned.
+
+```python
+insert_res = client.insert(
+    SortArgs(strings=["whale", "tiger", "bear"]),
+    insert_opts=riverqueue.InsertOpts(
+        unique_opts=riverqueue.UniqueOpts(
+            by_args: True,
+            by_period=15*60,
+            by_queue: True,
+            by_state: [riverqueue.JOB_STATE_AVAILABLE]
+        )
+    ),
+)
+
+# contains either a newly inserted job, or an existing one if insertion was skipped
+insert_res.job
+
+# true if insertion was skipped
+insert_res.unique_skipped_as_duplicated
+```
+
+### Custom advisory lock prefix
+
+Unique job insertion takes a Postgres advisory lock to make sure that it's uniqueness check still works even if two conflicting insert operations are occurring in parallel. Postgres advisory locks share a global 64-bit namespace, which is a large enough space that it's unlikely for two advisory locks to ever conflict, but to _guarantee_ that River's advisory locks never interfere with an application's, River can be configured with a 32-bit advisory lock prefix which it will use for all its locks:
+
+```python
+client = riverqueue.Client(riversqlalchemy.Driver(engine), advisory_lock_prefix: 123456)
+```
+
+Doing so has the downside of leaving only 32 bits for River's locks (64 bits total - 32-bit prefix), making them somewhat more likely to conflict with each other.
+
+## Inserting jobs in bulk
+
+Use `#insert_many` to bulk insert jobs as a single operation for improved efficiency:
+
+```python
+num_inserted = client.insert_many([
+    SimpleArgs(job_num: 1),
+    SimpleArgs(job_num: 2)
+])
+```
+
+Or with `InsertManyParams`, which may include insertion options:
+
+```python
+num_inserted = client.insert_many([
+    InsertManyParams(args=SimpleArgs.new(job_num: 1), insert_opts=riverqueue.InsertOpts.new(max_attempts=5)),
+    InsertManyParams(args=SimpleArgs.new(job_num: 2), insert_opts=riverqueue.InsertOpts.new(queue="high_priority"))
+])
+```
+
+## Inserting in a transaction
+
+To insert jobs in a transaction, open one in your driver, and pass it as the first argument to `insert_tx()` or `insert_many_tx()`:
+
+```python
+with engine.begin() as session:
+    insert_res = client.insert_tx(
+        session,
+        SortArgs(strings=["whale", "tiger", "bear"]),
+    )
+```
+
+## Asynchronous I/O (`asyncio`)
+
+The package supports River's [`asyncio` (asynchronous I/O)](https://docs.python.org/3/library/asyncio.html) through an alternate `AsyncClient` and `riversqlalchemy.AsyncDriver`. You'll need to make sure to use SQLAlchemy's alternative async engine and an asynchronous Postgres driver like [`asyncpg`](https://github.com/MagicStack/asyncpg), but otherwise usage looks very similar to use without async:
+
+```python
+engine = sqlalchemy.ext.asyncio.create_async_engine("postgresql+asyncpg://...")
+client = riverqueue.AsyncClient(riversqlalchemy.AsyncDriver(engine))
+
+insert_res = await client.insert(
+    SortArgs(strings=["whale", "tiger", "bear"]),
+)
+```
+
+With a transaction:
+
+```python
+async with engine.begin() as session:
+    insert_res = await client.insert_tx(
+        session,
+        SortArgs(strings=["whale", "tiger", "bear"]),
+    )
+```
+
+## MyPy and type checking
+
+The package exports a `py.typed` file to indicate that it's typed, so you should be able to use [MyPy](https://mypy-lang.org/) to include it in static analysis.
+
+## Drivers
+
+### SQLAlchemy
+
+Our read is that [SQLAlchemy](https://www.sqlalchemy.org/) is the dominant ORM in the Python ecosystem, so it's the only driver available for River. Under the hood of SQLAlchemy, projects will also need a Postgres driver like [`psycopg2`](https://pypi.org/project/psycopg2/) or [`asyncpg`](https://github.com/MagicStack/asyncpg) (for async).
+
+River's driver system should enable integration with other ORMs, so let us know if there's a good reason you need one, and we'll consider it.
+
 ## Development
 
 See [development](./docs/development.md).
diff --git a/src/riverqueue/__init__.py b/src/riverqueue/__init__.py
@@ -1,8 +1,16 @@
 # Reexport for more ergonomic use in calling code.
 from .client import (
+    JOB_STATE_AVAILABLE as JOB_STATE_AVAILABLE,
+    JOB_STATE_CANCELLED as JOB_STATE_CANCELLED,
+    JOB_STATE_COMPLETED as JOB_STATE_COMPLETED,
+    JOB_STATE_DISCARDED as JOB_STATE_DISCARDED,
+    JOB_STATE_RETRYABLE as JOB_STATE_RETRYABLE,
+    JOB_STATE_RUNNING as JOB_STATE_RUNNING,
+    JOB_STATE_SCHEDULED as JOB_STATE_SCHEDULED,
     AsyncClient as AsyncClient,
     Args as Args,
     Client as Client,
+    InsertManyParams as InsertManyParams,
     InsertOpts as InsertOpts,
     UniqueOpts as UniqueOpts,
 )
diff --git a/src/riverqueue/client.py b/src/riverqueue/client.py
@@ -7,10 +7,24 @@
 from .model import InsertResult
 from .fnv import fnv1_hash
 
+JOB_STATE_AVAILABLE = "available"
+JOB_STATE_CANCELLED = "cancelled"
+JOB_STATE_COMPLETED = "completed"
+JOB_STATE_DISCARDED = "discarded"
+JOB_STATE_RETRYABLE = "retryable"
+JOB_STATE_RUNNING = "running"
+JOB_STATE_SCHEDULED = "scheduled"
+
 MAX_ATTEMPTS_DEFAULT = 25
 PRIORITY_DEFAULT = 1
 QUEUE_DEFAULT = "default"
-UNIQUE_STATES_DEFAULT = ["available", "completed", "running", "retryable", "scheduled"]
+UNIQUE_STATES_DEFAULT = [
+    JOB_STATE_AVAILABLE,
+    JOB_STATE_COMPLETED,
+    JOB_STATE_RUNNING,
+    JOB_STATE_RETRYABLE,
+    JOB_STATE_SCHEDULED,
+]
 
 
 class Args(Protocol):
@@ -81,19 +95,13 @@ async def insert():
 
         return await self.__check_unique_job(exec, insert_params, unique_opts, insert)
 
-    async def insert_many(self, args: List[Args]) -> List[InsertResult]:
+    async def insert_many(self, args: List[Args | InsertManyParams]) -> int:
         async with self.driver.executor() as exec:
-            return [
-                InsertResult(x)
-                for x in await exec.job_insert_many(_make_insert_params_many(args))
-            ]
+            return await exec.job_insert_many(_make_insert_params_many(args))
 
-    async def insert_many_tx(self, tx, args: List[Args]) -> List[InsertResult]:
+    async def insert_many_tx(self, tx, args: List[Args | InsertManyParams]) -> int:
         exec = self.driver.unwrap_executor(tx)
-        return [
-            InsertResult(x)
-            for x in await exec.job_insert_many(_make_insert_params_many(args))
-        ]
+        return await exec.job_insert_many(_make_insert_params_many(args))
 
     async def __check_unique_job(
         self,
@@ -154,19 +162,13 @@ def insert():
 
         return self.__check_unique_job(exec, insert_params, unique_opts, insert)
 
-    def insert_many(self, args: List[Args]) -> List[InsertResult]:
+    def insert_many(self, args: List[Args | InsertManyParams]) -> int:
         with self.driver.executor() as exec:
-            return [
-                InsertResult(x)
-                for x in exec.job_insert_many(_make_insert_params_many(args))
-            ]
+            return exec.job_insert_many(_make_insert_params_many(args))
 
-    def insert_many_tx(self, tx, args: List[Args]) -> List[InsertResult]:
+    def insert_many_tx(self, tx, args: List[Args | InsertManyParams]) -> int:
         exec = self.driver.unwrap_executor(tx)
-        return [
-            InsertResult(x)
-            for x in exec.job_insert_many(_make_insert_params_many(args))
-        ]
+        return exec.job_insert_many(_make_insert_params_many(args))
 
     def __check_unique_job(
         self,
@@ -298,7 +300,9 @@ def _make_insert_params(
     return insert_params, unique_opts
 
 
-def _make_insert_params_many(args: List[Args]) -> List[JobInsertParams]:
+def _make_insert_params_many(
+    args: List[Args | InsertManyParams],
+) -> List[JobInsertParams]:
     return [
         _make_insert_params(
             arg.args, arg.insert_opts or InsertOpts(), is_insert_many=True
diff --git a/src/riverqueue/driver/driver_protocol.py b/src/riverqueue/driver/driver_protocol.py
@@ -45,7 +45,7 @@ async def advisory_lock(self, lock: int) -> None:
     async def job_insert(self, insert_params: JobInsertParams) -> Job:
         pass
 
-    async def job_insert_many(self, all_params) -> List[Job]:
+    async def job_insert_many(self, all_params) -> int:
         pass
 
     async def job_get_by_kind_and_unique_properties(
@@ -103,7 +103,7 @@ def advisory_lock(self, lock: int) -> None:
     def job_insert(self, insert_params: JobInsertParams) -> Job:
         pass
 
-    def job_insert_many(self, all_params) -> List[Job]:
+    def job_insert_many(self, all_params) -> int:
         pass
 
     def job_get_by_kind_and_unique_properties(
diff --git a/src/riverqueue/driver/riversqlalchemy/sql_alchemy_driver.py b/src/riverqueue/driver/riversqlalchemy/sql_alchemy_driver.py
@@ -11,7 +11,6 @@
     AsyncIterator,
     Iterator,
     Optional,
-    List,
     cast,
 )
 
@@ -37,7 +36,7 @@ async def job_insert(self, insert_params: JobInsertParams) -> Job:
             ),
         )
 
-    async def job_insert_many(self, all_params) -> List[Job]:
+    async def job_insert_many(self, all_params) -> int:
         raise NotImplementedError("sqlc doesn't implement copy in python yet")
 
     async def job_get_by_kind_and_unique_properties(
@@ -95,7 +94,7 @@ def job_insert(self, insert_params: JobInsertParams) -> Job:
             ),
         )
 
-    def job_insert_many(self, all_params) -> List[Job]:
+    def job_insert_many(self, all_params) -> int:
         raise NotImplementedError("sqlc doesn't implement copy in python yet")
 
     def job_get_by_kind_and_unique_properties(

Original file line number	Diff line number	Diff line change
`@@ -17,4 +17,4 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0`
`17`	`17`
`18`	`18`	`### Added`
`19`	`19`
`20`		`-- Initial release, supporting insertion through [SQLAlchemy](https://www.sqlalchemy.org/) and its underlying Postgres drivers like [psycopg2](https://pypi.org/project/psycopg2/) or [asyncpg](https://github.com/MagicStack/asyncpg) (for async).`
	`20`	+- Initial release, supporting insertion through [SQLAlchemy](https://www.sqlalchemy.org/) and its underlying Postgres drivers like [`psycopg2`](https://pypi.org/project/psycopg2/) or [`asyncpg`](https://github.com/MagicStack/asyncpg) (for async).
Original file line number	Diff line number	Diff line change
`@@ -11,7 +11,6 @@`
`11`	`11`	`AsyncIterator,`
`12`	`12`	`Iterator,`
`13`	`13`	`Optional,`
`14`		`- List,`
`15`	`14`	`cast,`
`16`	`15`	`)`
`17`	`16`
`@@ -37,7 +36,7 @@ async def job_insert(self, insert_params: JobInsertParams) -> Job:`
`37`	`36`	`),`
`38`	`37`	`)`
`39`	`38`
`40`		`- async def job_insert_many(self, all_params) -> List[Job]:`
	`39`	`+ async def job_insert_many(self, all_params) -> int:`
`41`	`40`	`raise NotImplementedError("sqlc doesn't implement copy in python yet")`
`42`	`41`
`43`	`42`	`async def job_get_by_kind_and_unique_properties(`
`@@ -95,7 +94,7 @@ def job_insert(self, insert_params: JobInsertParams) -> Job:`
`95`	`94`	`),`
`96`	`95`	`)`
`97`	`96`
`98`		`- def job_insert_many(self, all_params) -> List[Job]:`
	`97`	`+ def job_insert_many(self, all_params) -> int:`
`99`	`98`	`raise NotImplementedError("sqlc doesn't implement copy in python yet")`
`100`	`99`
`101`	`100`	`def job_get_by_kind_and_unique_properties(`