Skip to content

Conversation

@GaneshPatil7517
Copy link
Contributor

This PR adds a Docker-based workflow for building DataFusion documentation, allowing contributors to build docs without installing additional host dependencies.

The container includes all required tools (Rust 1.92.0, cargo-depgraph, Sphinx, etc.) and provides a simple, reproducible build process.

This addresses #19777.

What's included:

  • Dockerfile with all doc build dependencies
  • docker-compose.yml for easy usage
  • Updated docs/README.md with clear instructions
  • Cross-platform compatibility (handles Windows/Unix line endings)

Usage:

docker-compose run --rm docs bash build.sh
# or
docker build -t datafusion-docs -f docs/Dockerfile .
docker run --rm -v $(pwd):/work datafusion-docs bash build.sh

- Installs Rust 1.92.0, cargo-depgraph, and all doc dependencies
- Provides isolated, reproducible build environment
- Fixes line endings for cross-platform compatibility (Windows/Unix)
- No additional host setup required beyond Docker
Copilot AI review requested due to automatic review settings January 17, 2026 06:44
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 17, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Docker-based infrastructure to enable building DataFusion documentation in a containerized environment without requiring manual installation of dependencies (Rust, cargo-depgraph, Sphinx, etc.).

Changes:

  • Adds Dockerfile with all necessary documentation build dependencies (Rust 1.92.0, Python packages, cargo-depgraph, graphviz)
  • Updates docker-compose.yml to include a docs service for easy documentation building
  • Updates docs/README.md with Docker-based workflow instructions as the recommended approach

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
docs/Dockerfile Defines container image with all doc build dependencies, handles line endings for cross-platform compatibility
docker-compose.yml Adds docs service configuration for simplified Docker workflow
docs/README.md Documents Docker-based build option as recommended approach, maintains existing local installation instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

GaneshPatil7517 and others added 6 commits January 17, 2026 12:26
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Creates separate entrypoint script instead of complex bash in CMD
- Improves readability and makes logic easier to test and modify
- Cleaner Dockerfile with ENTRYPOINT pattern
- Script intelligently checks for CRLF before converting
- Handles both build-time and runtime line ending fixes
- Use rust:1.92.0-bookworm base image instead of python with manual Rust install
- Remove docker-compose.yml (unnecessary for this use case)
- Remove entrypoint.sh and CRLF conversion logic
- Use volume mount at runtime instead of copying entire repository

Usage:
  docker build -t datafusion-docs ./docs
  docker run --rm -v C:\Users\HP\Music\Apache_org\data\datafusion:/datafusion datafusion-docs
@GaneshPatil7517
Copy link
Contributor Author

Thanks for the feedback @Jefffrey! I've simplified the approach significantly:

Changes made:

✅ Switched to rust:1.92.0-bookworm base image (install Python on top instead of vice versa)
✅ Removed docker-compose.yml - overkill for this use case
✅ Removed entrypoint.sh and CRLF conversion logic - users should configure git properly
✅ Removed COPY . /work - use volume mount at runtime instead
✅ Removed unnecessary comments
Usage is now simply:
docker build -t datafusion-docs ./docs
docker run --rm -v $(pwd):/datafusion datafusion-docs

Tested locally and the docs build successfully. Please take another look!> docker build -t datafusion-docs ./docs
docker run --rm -v $(pwd):/datafusion datafusion-docs

Tested locally and the docs build successfully. Please take another look!

docs/Dockerfile Outdated
COPY requirements.txt .
RUN python3 -m pip install --break-system-packages -r requirements.txt

CMD ["make", "html"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using the build.sh script because that generates the dependency chart for us (and is what is motivating this issue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @Jefffrey! All addressed:
✅ Using rust:bookworm (latest) instead of pinned version
✅ Removed python3-venv
✅ Using build.sh instead of make html - dependency graph now generates correctly
✅ Added dos2unix for Windows line ending compatibility
Tested locally - docs build succeeds with the dependency graph generated. Please take another look!

- Use rust:bookworm base image (latest) instead of pinned version
- Remove python3-venv (not used)
- Use build.sh instead of make html (generates dependency graph)
- Add dos2unix for Windows line ending compatibility
@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 20, 2026

When I try the docker run command it seems to download & install some components each time, slowing down the doc build process:

datafusion (docs/dockerized-docs-build)$ docker run --rm -v $(pwd):/datafusion datafusion-docs
info: syncing channel updates for '1.92.0-aarch64-unknown-linux-gnu'
info: latest update on 2025-12-11, rust version 1.92.0 (ded5c06cf 2025-12-08)
info: downloading component 'clippy'
info: downloading component 'rustfmt'
info: installing component 'clippy'
info: installing component 'rustfmt'

Could you look into this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants