Skip to content

Conversation

@crypdick
Copy link
Contributor

Description

Automatically exclude common directories (.git, .venv, venv, pycache) when uploading working_dir in runtime environment packages.

At a minimum we need to exclude .git/ because unlike the others, nobody includes .git/ in .gitignore. This causes Ray to throw a ray.exceptions.RuntimeEnvSetupError if your .git dir is larger than 512 MiB.

I also updated the documentation in handling-dependencies.rst and improved the error message if the env exceeds the GCS_STORAGE_MAX_SIZE limit.

Related issues

N/A

Additional information

This PR pytorch/tutorials#3709 was failing to run because the PyTorch tutorials .git/ folder is huge.

@crypdick crypdick requested review from a team as code owners December 19, 2025 03:28
Ricardo Decal added 2 commits December 18, 2025 19:29
Signed-off-by: Ricardo Decal <public@ricardodecal.com>
Signed-off-by: Ricardo Decal <public@ricardodecal.com>
@crypdick crypdick force-pushed the bugfix/default-excludes-working-dir branch from 57c19af to 631fa2e Compare December 19, 2025 03:29
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful feature to automatically exclude common directories like .git and venv from working_dir uploads, preventing common errors with large repositories. The implementation is clean, and it's great to see that it's accompanied by thorough documentation updates and both unit and integration tests. My only suggestion is a minor improvement to the type hinting for better code clarity.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ricardo Decal <crypdick@users.noreply.github.com>
@ray-gardener ray-gardener bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core labels Dec 19, 2025
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
@iamjustinhsu
Copy link
Contributor

Nice nice. Just some questions:

Can you clarify the following scenarios in these scenarios:

  • excludes=[], .rayignore=["file.txt"]
  • excludes=["file.txt"], .rayignore=[]
  • excludes=["file.txt"], .rayignore=["file.txt"]
  • excludes=["file.txt"], .rayignore=["file2.txt"]

I'm wondering

  1. Why do have "excludes" when we had ".gitignore" previously?
  2. Is the use-case necessary if users can specify venv, .git, pycache in their .rayignore file?

@github-actions
Copy link

github-actions bot commented Jan 3, 2026

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 3, 2026
@crypdick crypdick added the unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. label Jan 4, 2026
@crypdick
Copy link
Contributor Author

crypdick commented Jan 5, 2026

Thanks for the review @iamjustinhsu . Some thoughts:

  1. the sources are unioned, so a file is excluded if it matches any source. So for your first 3 examples, file.txt & the default files are excluded, and in the last example file.txt & file2.txt & the default files are excluded.
  2. the excludes param existed before this PR. It's a programmatic way to specify exclusions instead of having to edit static files (.gitignore, .rayignore). This PR just adds some default values.
  3. so yes, users are able to set this behavior by manually creating .rayignore. This PR is about improving the default UX.

I don't think users should have to learn about .rayignore in order to use Ray for the first time. For example, I am in the process of submitting a bunch of tutorials to the official PyTorch docs, and if anyone tries to run them they will immediately get the RuntimeEnvSetupError error since pytorch/tutorials/.git/ is so large. .git/ is never included in .gitignore, so users would always have to create a .rayignore file in their repos to prevent this behavior.

If I may flip the question: why is it desirable for the to upload .git/, .venv, __pycache__/ to workers by default? I don't see any value in doing this, only downsides: extra overhead, and the potential for production workloads to break. Ray should just work.

@iamjustinhsu
Copy link
Contributor

Got it

why is it desirable for the to upload .git/, .venv, pycache/

IMO it's not, although I could be wrong. This looks good to me then. I'm not the code owner so feel free to ping someone from core and docs team

@edoakes
Copy link
Collaborator

edoakes commented Jan 8, 2026

@crypdick makes sense to me, but my only hesitation is that this could be a breaking change and would be hard/confusing to debug.

Could we temporarily add a log message when one of the new default excludes is encountered? The log message can say "Directory 'foobar' is now ignored by default. To disable this behavior, ..."

Signed-off-by: Ricardo Decal <public@ricardodecal.com>
@crypdick
Copy link
Contributor Author

crypdick commented Jan 8, 2026

sg @edoakes I've added a temporary warning and a TODO for this to be removed in a few releases. I used log_once() to prevent this message being spammed.

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@edoakes edoakes enabled auto-merge (squash) January 8, 2026 01:25
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jan 8, 2026
Signed-off-by: Ricardo Decal <public@ricardodecal.com>
@github-actions github-actions bot disabled auto-merge January 8, 2026 02:26
Ricardo Decal and others added 3 commits January 7, 2026 18:33
Signed-off-by: Ricardo Decal <public@ricardodecal.com>
Signed-off-by: Ricardo Decal <public@ricardodecal.com>
@edoakes edoakes enabled auto-merge (squash) January 8, 2026 17:23
@github-actions github-actions bot disabled auto-merge January 8, 2026 17:23
@edoakes edoakes enabled auto-merge (squash) January 8, 2026 17:41
@edoakes edoakes merged commit 5e51be5 into master Jan 8, 2026
8 checks passed
@edoakes edoakes deleted the bugfix/default-excludes-working-dir branch January 8, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core docs An issue or change related to documentation go add ONLY when ready to merge, run all tests stale The issue is stale. It will be closed within 7 days unless there are further conversation unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants