From 1d14a22d9c45acd53b421bcf8a95187dff29df20 Mon Sep 17 00:00:00 2001 From: mengwei Date: Tue, 30 Dec 2025 17:55:35 +0800 Subject: [PATCH 1/2] add training and evaluation docs --- .../{train_eval.md => evaluation.md} | 93 +++++------- .../user_guide/internnav/quick_start/index.md | 3 +- .../internnav/quick_start/training.md | 132 ++++++++++++++++++ 3 files changed, 167 insertions(+), 61 deletions(-) rename source/en/user_guide/internnav/quick_start/{train_eval.md => evaluation.md} (80%) create mode 100644 source/en/user_guide/internnav/quick_start/training.md diff --git a/source/en/user_guide/internnav/quick_start/train_eval.md b/source/en/user_guide/internnav/quick_start/evaluation.md similarity index 80% rename from source/en/user_guide/internnav/quick_start/train_eval.md rename to source/en/user_guide/internnav/quick_start/evaluation.md index d6dc608..5ac5864 100644 --- a/source/en/user_guide/internnav/quick_start/train_eval.md +++ b/source/en/user_guide/internnav/quick_start/evaluation.md @@ -1,16 +1,16 @@ -# Training and Evaluation +# Evaluation -This document presents how to train and evaluate models for different systems with InternNav. +This document describes how to evaluate models in **InternNav**. -## Whole-system +## InternVLA-N1 (Dual System) -### Training -The training pipeline is currently under preparation and will be open-sourced soon. +Model weights of InternVLA-N1 (Dual System) can be downloaded from [InternVLA-N1-DualVLN](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) and [InternVLA-N1-w-NavDP](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP). -### Evaluation -Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1). +--- + +### Evaluation on Isaac Sim +Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. -#### Evaluation on Isaac Sim [UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU: ```bash @@ -51,7 +51,7 @@ The simulation can be visualized by set `vis_output=True` in eval_cfg. My GIF -#### Evaluation on Habitat Sim +### Evaluation on Habitat Sim Evaluate on Single-GPU: ```bash @@ -74,18 +74,36 @@ For multi-gpu inference, currently we support inference on SLURM as well as envi --config scripts/eval/configs/habitat_dual_system_cfg.py ``` +## InternVLA-N1 (System 2) -## System1 +Model weights of InternVLA-N1 (System2) can be downloaded from [InternVLA-N1-System2](https://huggingface.co/InternRobotics/InternVLA-N1-System2). -### Training +Currently we only support evaluate single System2 on Habitat: -Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and organize them in the form mentioned in [installation](./installation.md). +Evaluate on Single-GPU: ```bash -./scripts/train/start_train.sh --name "$NAME" --model-name navdp +python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py + +# set config with the following fields +eval_cfg = EvalCfg( + agent=AgentCfg( + model_name='internvla_n1', + model_settings={ + "mode": "system2", # inference mode: dual_system or system2 + "model_path": "checkpoints/", # path to model checkpoint + } + ) +) +``` + +For multi-gpu inference, currently we only support inference on SLURM. + +```bash +./scripts/eval/bash/eval_system2.sh ``` -### Evaluation +## VN Systems (System 1) We support the evaluation of diverse System-1 baselines separately in [NavDP](https://github.com/InternRobotics/NavDP/tree/navdp_benchmark) to make it easy to use and deploy. To install the environment, we provide a quick start below: @@ -129,52 +147,7 @@ python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path} python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR} ``` - -## System2 - -### Training - -Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details. - -```base -# train cma model -./scripts/train/start_train.sh --name cma_train --model cma - -# train rdp model -./scripts/train/start_train.sh --name rdp_train --model rdp - -# train seq2seq model -./scripts/train/start_train.sh --name seq2seq_train --model seq2seq -``` -### Evaluation - -#### InternVLA-N1-S2 -Currently we only support evaluate single System2 on Habitat: - -Evaluate on Single-GPU: - -```bash -python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py - -# set config with the following fields -eval_cfg = EvalCfg( - agent=AgentCfg( - model_name='internvla_n1', - model_settings={ - "mode": "system2", # inference mode: dual_system or system2 - "model_path": "checkpoints/", # path to model checkpoint - } - ) -) -``` - -For multi-gpu inference, currently we only support inference on SLURM. - -```bash -./scripts/eval/bash/eval_system2.sh -``` - -#### Baseline Models +## Baseline VLN Single-System Models We provide three small VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment. Download the baseline models: diff --git a/source/en/user_guide/internnav/quick_start/index.md b/source/en/user_guide/internnav/quick_start/index.md index c484f2d..1e6db54 100644 --- a/source/en/user_guide/internnav/quick_start/index.md +++ b/source/en/user_guide/internnav/quick_start/index.md @@ -15,5 +15,6 @@ myst: installation simulation interndata -train_eval +training +evaluation ``` diff --git a/source/en/user_guide/internnav/quick_start/training.md b/source/en/user_guide/internnav/quick_start/training.md new file mode 100644 index 0000000..3600454 --- /dev/null +++ b/source/en/user_guide/internnav/quick_start/training.md @@ -0,0 +1,132 @@ +# Training + +This document provides instructions for training models in **InternNav**. + +## Overview + +InternNav supports training models under three system paradigms: + +- **VLN Multi-System**: integrated System2 + System2 architectures +- **VLN Single-System**: end-to-end vision-and-language navigation models +- **VN System (System1)**: low-level visual navigation and control models + + +Each paradigm follows a different training protocol, which is detailed below. + + +## VLN Multi-System +VLN Multi-System integrates **System2** (high-level reasoning and planning) with +**System1** (low-level action control), supporting both modular integration and joint training. + + +### Supported Systems +- **InternVLA-N1 (System2)** +- **InternVLA-N1 (Dual System) w/ NavDP** + (*NavDP* indicates joint tuning with System2) +- **InternVLA-N1 (Dual System) DualVLN** + + +### 1. Training for InternVLA-N1 (System2) + +**InternVLA-N1 (System2)** is trained independently to predict 2D pixel goals for navigation. + +It can be used with any compatible System1 model capable of executing 2D pixel goals or point goals (given depth and pose). +Alternatively, it can be jointly trained together with a System1 model for end-to-end multi-system optimization. + + +#### Training Command + +```bash +# training system2 separately +sbatch ./scripts/train/base_train/qwenvl_train/train_system2.sh +``` + +--- + +### 2. Joint Training for InternVLA-N1 (Dual System) + +After completing training of **InternVLA-N1 (System2)**, joint training is supported with a pixel-goal navigation System1, using either the **NavDP** or **NextDiT** architecture. + +- **InternVLA-N1 (Dual System) w/ NavDP**: preserves **NavDP**'s model design and uses **RGB-D** input. +- **InternVLA-N1 (Dual System) DualVLN**: uses only **RGB** input, resulting in a smaller model footprint. + +#### Training Command + +```bash +# training system1 based on system2 +sbatch ./scripts/train/base_train/qwenvl_train/train_dual_system.sh +``` + +- For **w/ NavDP** model variant, set `system1=navdp_async`. Optimal performance is typically observed after **30,000 iterations**. +- For **DualVLN** model variant, set `system1=nextdit_async`. Optimal performance is typically observed after **15,000 iterations**. + +## VLN Single-System + +VLN Single-System models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner. + + +### Supported Models + +The following VLN Single-System models are currently supported: + +- Seq2Seq +- CMA +- RDP + +For our VLM-based VLN model **StreamVLN**, please refer to the following repository for training details: +https://github.com/InternRobotics/StreamVLN + +Support for StreamVLN within InternNav is planned for future releases. + + +### Training Command + +Training is performed through a unified training entry script. +Below are example commands for each supported model. + +**Seq2Seq** +``` +./scripts/train/base_train/start_train.sh --name seq2seq_train --model seq2seq +``` + +**CMA** +``` +./scripts/train/base_train/start_train.sh --name cma_train --model cma +``` + +**RDP** +``` +./scripts/train/base_train/start_train.sh --name rdp_train --model rdp +``` + + +## VN System (System1) + +VN System (System1) focuses on **low-level visual navigation and motion control**. + + +### Supported Methods + +The following visual navigation methods are included in the System1 benchmark: + +- DD-PPO +- iPlanner +- ViPlanner +- GNM +- ViNT +- NoMaD +- NavDP (**InternVLA-N1 System1**) + +Among them, **only NavDP is currently supported for training** in InternNav. +All other methods are provided for **evaluation and comparison purposes only**. + + +### Training Command + +**NavDP** + + +```bash +./scripts/train/base_train/start_train.sh --name navdp_train --model-name navdp +``` + From a074ec09de999b32c9bee1c73a4b3742ff8dbbbf Mon Sep 17 00:00:00 2001 From: mengwei Date: Wed, 31 Dec 2025 12:54:33 +0800 Subject: [PATCH 2/2] fix some terms --- .../internnav/quick_start/evaluation.md | 4 ++-- .../internnav/quick_start/training.md | 22 +++++++++---------- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/source/en/user_guide/internnav/quick_start/evaluation.md b/source/en/user_guide/internnav/quick_start/evaluation.md index 5ac5864..6506b01 100644 --- a/source/en/user_guide/internnav/quick_start/evaluation.md +++ b/source/en/user_guide/internnav/quick_start/evaluation.md @@ -147,8 +147,8 @@ python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path} python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR} ``` -## Baseline VLN Single-System Models -We provide three small VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment. +## Single-System VLN Baselines +We provide three small Single-System VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment. Download the baseline models: ```bash diff --git a/source/en/user_guide/internnav/quick_start/training.md b/source/en/user_guide/internnav/quick_start/training.md index 3600454..9c43469 100644 --- a/source/en/user_guide/internnav/quick_start/training.md +++ b/source/en/user_guide/internnav/quick_start/training.md @@ -6,23 +6,23 @@ This document provides instructions for training models in **InternNav**. InternNav supports training models under three system paradigms: -- **VLN Multi-System**: integrated System2 + System2 architectures -- **VLN Single-System**: end-to-end vision-and-language navigation models -- **VN System (System1)**: low-level visual navigation and control models +- **Dual-System VLN Models**: integrated System2 + System1 architectures +- **Single-System VLN Models**: end-to-end vision-and-language navigation models +- **VN System (System1) Models**: low-level visual navigation and control models Each paradigm follows a different training protocol, which is detailed below. -## VLN Multi-System -VLN Multi-System integrates **System2** (high-level reasoning and planning) with +## Dual-System VLN Models +Dual-System VLN Models integrates **System2** (high-level reasoning and planning) with **System1** (low-level action control), supporting both modular integration and joint training. ### Supported Systems - **InternVLA-N1 (System2)** -- **InternVLA-N1 (Dual System) w/ NavDP** - (*NavDP* indicates joint tuning with System2) +- **InternVLA-N1 (Dual System) w/ NavDP*** + (*NavDP** indicates joint tuning with System2) - **InternVLA-N1 (Dual System) DualVLN** @@ -60,14 +60,14 @@ sbatch ./scripts/train/base_train/qwenvl_train/train_dual_system.sh - For **w/ NavDP** model variant, set `system1=navdp_async`. Optimal performance is typically observed after **30,000 iterations**. - For **DualVLN** model variant, set `system1=nextdit_async`. Optimal performance is typically observed after **15,000 iterations**. -## VLN Single-System +## Single-System VLN Models -VLN Single-System models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner. +Single-System VLN Models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner. ### Supported Models -The following VLN Single-System models are currently supported: +The following Single-System VLN Models are currently supported: - Seq2Seq - CMA @@ -100,7 +100,7 @@ Below are example commands for each supported model. ``` -## VN System (System1) +## VN System (System1) Models VN System (System1) focuses on **low-level visual navigation and motion control**.