Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
37df47d
initial outline
LinoGiger Nov 24, 2025
d4df49b
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Nov 24, 2025
d2a91d3
added some more stuff
LinoGiger Nov 25, 2025
2dd05c6
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Nov 26, 2025
49efbab
fixed merge stuff
LinoGiger Nov 26, 2025
013c913
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Nov 26, 2025
0da3ab4
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Nov 26, 2025
b7fb545
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Nov 27, 2025
6c81d03
added endpoint implementation for the audience
LinoGiger Nov 28, 2025
cbe8cef
added different instruction to recruitment order
LinoGiger Nov 28, 2025
b30209f
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 4, 2025
ef4d175
small adjustments
LinoGiger Dec 5, 2025
9c8d27e
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 5, 2025
79cf308
adjustments from merge
LinoGiger Dec 8, 2025
0d94e75
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 8, 2025
511f15a
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 8, 2025
58bbf00
initial changes with new version
LinoGiger Dec 9, 2025
6419414
adjusted typo
LinoGiger Dec 9, 2025
e539f9e
resetting client credentials of autherror (#411)
LinoGiger Dec 12, 2025
7462f68
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 16, 2025
265d3a9
added job api
LinoGiger Dec 17, 2025
b5f0b49
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 17, 2025
a06b147
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 17, 2025
dc5d17f
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Dec 18, 2025
4011767
adjusted filters
LinoGiger Dec 18, 2025
e0debe2
many updates
LinoGiger Dec 19, 2025
29a1129
Merge branch 'main' into feat/RAPID-6301-add-audience-flow
LinoGiger Jan 5, 2026
e261a5c
job definition only saves necessary things, some renamings
LinoGiger Jan 7, 2026
c489519
added the preview method
LinoGiger Jan 12, 2026
ffa9936
fixed creating job definition
LinoGiger Jan 12, 2026
8d24eb0
adjusted the docs
LinoGiger Jan 13, 2026
bfba18d
slight typo fix
LinoGiger Jan 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .claude/settings.local.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
"Bash(tree:*)",
"Bash(find:*)",
"Bash(cat:*)",
"Bash(.venv/Scripts/python.exe:*)"
"Bash(.venv/Scripts/python.exe:*)",
"Bash(git ls-tree:*)"
],
"deny": [],
"ask": []
Expand Down
61 changes: 39 additions & 22 deletions docs/confidence_stopping.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ Early Stopping addresses this by:
The Early Stopping feature leverages the trustworthiness, quantified through their `userScores`, to calculate the confidence level of each category for any given datapoint.

### Confidence Calculation
- **UserScores**: Each annotator has a `userScore` between 0 and 1, representing their reliability. [More information](/understanding_the_results/#understanding-the-user-scores)
- **Aggregated Confidence**: By combining the userScores of annotators who selected a particular category, the system computes the probability that this category is the correct one.
- **UserScores**: Each labeler has a `userScore` between 0 and 1, representing their reliability. [More information](understanding_the_results.md#understanding-the-user-scores)
- **Aggregated Confidence**: By combining the userScores of labelers who selected a particular category, the system computes the probability that this category is the correct one.
- **Threshold Comparison**: If the calculated confidence exceeds your specified threshold, the system stops collecting further responses for that datapoint.

## Understanding the Confidence Threshold
Expand All @@ -28,51 +28,68 @@ We've created a plot based on empirical data aided by simulations to give you an

There are a few things to keep in mind when interpreting the results:

- **Unambiguous Scenario**: The graph represents an ideal situation such as in the [example below](#using-early-stopping-in-your-order) with no ambiguity which category is the correct one. A counter-example would be subjective tasks like "Which image do you prefer?", where there's no clear correct answer.
- **Unambiguous Scenario**: The graph represents an ideal situation such as in the [example below](#using-early-stopping-in-your-job) with no ambiguity which category is the correct one. A counter-example would be subjective tasks like "Which image do you prefer?", where there's no clear correct answer.
- **Real-World Variability**: Actual required responses may vary based on task complexity.
- **Guidance Tool**: Use the graph as a reference to set realistic expectations for your orders.
- **Guidance Tool**: Use the graph as a reference to set realistic expectations for your jobs.
- **Response Overflow**: The number of responses per datapoint may exceed the specified amount due to multiple users answering simultaneously.


<div style="width: 780px; height: 650px; overflow: hidden;">
<iframe src="/plots/confidence_threshold_plot_with_slider_darkmode.html"
width="100%"
height="100%"
frameborder="0"
width="100%"
height="100%"
frameborder="0"
scrolling="no"
style="overflow: hidden;">
</iframe>
</div>

>**Note:** The Early Stopping feature is supported for the Classification and Comparison workflows. The number of categories is the number of options in the Classification task. For the Comparison task, the number of categories is always 2.

## Using Early Stopping in Your Order
## Using Early Stopping in Your Job

Implementing Early Stopping is straightforward. You simply add the confidence threshold as a parameter when creating the order.
Implementing Early Stopping is straightforward. You simply add the confidence threshold as a parameter when creating the job definition.

### Example: Classification Order with Early Stopping
### Example: Classification Job with Early Stopping

```python
order = rapi.order.create_classification_order(
name="Test Classification Order with Early Stopping",
from rapidata import RapidataClient

client = RapidataClient()

# Create audience with qualification example
audience = client.audience.create_audience(name="Animal Classification Audience")
audience.add_classification_example(
instruction="What do you see in the image?",
answer_options=["Cat", "Dog"],
datapoint="https://assets.rapidata.ai/cat.jpeg",
truth=["Cat"]
)

# Create job definition with early stopping
job_definition = client.job.create_classification_job_definition(
name="Test Classification with Early Stopping",
instruction="What do you see in the image?",
answer_options=["Cat", "Dog"],
datapoints=["https://assets.rapidata.ai/dog.jpeg"],
responses_per_datapoint=50,
confidence_threshold=0.99,
).run()

order.display_progress_bar()
result = order.get_results()
print(result)
)

# Preview and run
job_definition.preview()
job = audience.assign_job_to_audience(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)
```

In this example:

- responses_per_datapoint=50: Sets the maximum number of responses per datapoint.
- confidence_threshold=0.99: Specifies that data collection for a datapoint should stop once a 99% confidence level is reached.
- `responses_per_datapoint=50`: Sets the maximum number of responses per datapoint.
- `confidence_threshold=0.99`: Specifies that data collection for a datapoint should stop once a 99% confidence level is reached.

We'd expect this to take roughtly 4 responses to reach the 99% confidence level.
We'd expect this to take roughly 4 responses to reach the 99% confidence level.

## When to Use Early Stopping

Expand All @@ -83,7 +100,7 @@ We recommend using Early Stopping when:

## Analyzing Early Stopping Results

When using Early Stopping, the [results](/understanding_the_results/) will additionally include a `confidencePerCategory` field for each datapoint. This field shows the confidence level for each of the categories in the task.
When using Early Stopping, the [results](understanding_the_results.md) will additionally include a `confidencePerCategory` field for each datapoint. This field shows the confidence level for each of the categories in the task.

Example:
```json
Expand Down Expand Up @@ -117,7 +134,7 @@ Example:
"Cat": 0.0
},
# this only appears when using early stopping
"confidencePerCategory": {
"confidencePerCategory": {
"Dog": 0.9943,
"Cat": 0.0057
},
Expand Down
49 changes: 49 additions & 0 deletions docs/examples/classify_job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Classification Job Example

To learn about the basics of creating a job, please refer to the [quickstart guide](../quickstart.md).

In this example, we want to rate different images based on a Likert scale to assess how well generated images match their descriptions. The `NoShuffle` setting ensures answer options remain in order since they represent a scale.

```python
from rapidata import RapidataClient, NoShuffle

IMAGE_URLS = [
"https://assets.rapidata.ai/tshirt-4o.png",
"https://assets.rapidata.ai/tshirt-aurora.jpg",
"https://assets.rapidata.ai/teamleader-aurora.jpg",
]

CONTEXTS = ["A t-shirt with the text 'Running on caffeine & dreams'"] * len(IMAGE_URLS)

client = RapidataClient()

# Create audience with qualification example
audience = client.audience.create_audience(name="Likert Scale Audience")
audience.add_classification_example(
instruction="How well does the image match the description?",
answer_options=["1: Not at all", "2: A little", "3: Moderately", "4: Very well", "5: Perfectly"],
datapoint="https://assets.rapidata.ai/tshirt-4o.png",
truth=["5: Perfectly"],
context="A t-shirt with the text 'Running on caffeine & dreams'"
)

# Create job definition
job_definition = client.job.create_classification_job_definition(
name="Likert Scale Example",
instruction="How well does the image match the description?",
answer_options=["1: Not at all", "2: A little", "3: Moderately", "4: Very well", "5: Perfectly"],
contexts=CONTEXTS,
datapoints=IMAGE_URLS,
responses_per_datapoint=25,
settings=[NoShuffle()]
)

# Preview the job definition
job_definition.preview()

# Assign to audience and get results
job = audience.assign_job_to_audience(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)
```
46 changes: 0 additions & 46 deletions docs/examples/classify_order.md

This file was deleted.

57 changes: 57 additions & 0 deletions docs/examples/compare_job.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Compare Job Example

To learn about the basics of creating a job, please refer to the [quickstart guide](../quickstart.md).

In this example, we compare images from two image generation models (Flux and Midjourney) to determine which more accurately follows the given prompts.

```python
from rapidata import RapidataClient

PROMPTS = [
"A sign that says 'Diffusion'.",
"A yellow flower sticking out of a green pot.",
"hyperrealism render of a surreal alien humanoid.",
"psychedelic duck",
"A small blue book sitting on a large red book."
]

IMAGE_PAIRS = [
["https://assets.rapidata.ai/flux_sign_diffusion.jpg", "https://assets.rapidata.ai/mj_sign_diffusion.jpg"],
["https://assets.rapidata.ai/flux_flower.jpg", "https://assets.rapidata.ai/mj_flower.jpg"],
["https://assets.rapidata.ai/flux_alien.jpg", "https://assets.rapidata.ai/mj_alien.jpg"],
["https://assets.rapidata.ai/flux_duck.jpg", "https://assets.rapidata.ai/mj_duck.jpg"],
["https://assets.rapidata.ai/flux_book.jpg", "https://assets.rapidata.ai/mj_book.jpg"]
]

client = RapidataClient()

# Create audience with qualification example
audience = client.audience.create_audience(name="Prompt Alignment Audience")
audience.add_compare_example(
instruction="Which image follows the prompt more accurately?",
datapoint=[
"https://assets.rapidata.ai/flux_sign_diffusion.jpg",
"https://assets.rapidata.ai/mj_sign_diffusion.jpg"
],
truth="https://assets.rapidata.ai/flux_sign_diffusion.jpg",
context="A sign that says 'Diffusion'."
)

# Create job definition
job_definition = client.job.create_compare_job_definition(
name="Example Image Prompt Alignment Job",
instruction="Which image follows the prompt more accurately?",
datapoints=IMAGE_PAIRS,
responses_per_datapoint=25,
contexts=PROMPTS
)

# Preview the job definition
job_definition.preview()

# Assign to audience and get results
job = audience.assign_job_to_audience(job_definition)
job.display_progress_bar()
results = job.get_results()
print(results)
```
49 changes: 0 additions & 49 deletions docs/examples/compare_order.md

This file was deleted.

21 changes: 0 additions & 21 deletions docs/examples/draw_order.md

This file was deleted.

Loading