Skip to content
@OpenDCAI

OpenDCAI

Define the future of Data-centric AI together

OpenDCAI

We are dedicated to advancing research and open-source tools in Data-Centric Artificial Intelligence (DCAI).

Our goal is to develop effective and efficient DCAI systems and algorithms that support and enhance the performance of AI models and applications.

Newly Released Works

🔥 2025/6/29 Our DCAI system DataFlow is released! Link

Pinned Loading

  1. DataFlow DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    Python 2k 141

  2. MyScaleDB MyScaleDB Public

    Forked from OriginHubAI/MyScaleDB

    AI Database for unified, scalable SQL + vector data management, search and analytics

    C++ 38 1

  3. DataFlex DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    Python 92 8

  4. Paper2Any Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    Python 243 21

Repositories

Showing 10 of 24 repositories
  • OpenDCAI/DataFlow-Agent’s past year of commit activity
    Python 9 Apache-2.0 0 0 0 Updated Jan 3, 2026
  • DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    OpenDCAI/DataFlow’s past year of commit activity
    Python 2,016 Apache-2.0 141 8 0 Updated Jan 3, 2026
  • DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex’s past year of commit activity
    Python 92 8 0 0 Updated Jan 3, 2026
  • Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    OpenDCAI/Paper2Any’s past year of commit activity
    Python 243 Apache-2.0 21 0 1 Updated Jan 2, 2026
  • OpenDCAI/DataFlow-WebUI’s past year of commit activity
    Python 8 8 0 1 Updated Dec 30, 2025
  • DataFlow-Doc Public

    Documentation for DataFlow, Data-centric AI system for LLM.

    OpenDCAI/DataFlow-Doc’s past year of commit activity
    Python 11 26 4 0 Updated Dec 30, 2025
  • DataFlex-Doc Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex-Doc’s past year of commit activity
    Python 2 7 0 0 Updated Dec 27, 2025
  • DataFlow-MM Public

    Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.

    OpenDCAI/DataFlow-MM’s past year of commit activity
    Python 22 Apache-2.0 14 1 1 Updated Dec 25, 2025
  • MorphoBench Public
    OpenDCAI/MorphoBench’s past year of commit activity
    Python 11 MIT 0 0 0 Updated Dec 23, 2025
  • DataFlow-MM-Doc Public

    Documentation for DataFlow-MM

    OpenDCAI/DataFlow-MM-Doc’s past year of commit activity
    Python 2 6 0 3 Updated Dec 21, 2025

Top languages

Loading…