From 1d14a22d9c45acd53b421bcf8a95187dff29df20 Mon Sep 17 00:00:00 2001
From: mengwei <kellymeng0427@gmail.com>
Date: Tue, 30 Dec 2025 17:55:35 +0800
Subject: [PATCH 1/2] add training and evaluation docs

---
 .../{train_eval.md => evaluation.md}          |  93 +++++-------
 .../user_guide/internnav/quick_start/index.md |   3 +-
 .../internnav/quick_start/training.md         | 132 ++++++++++++++++++
 3 files changed, 167 insertions(+), 61 deletions(-)
 rename source/en/user_guide/internnav/quick_start/{train_eval.md => evaluation.md} (80%)
 create mode 100644 source/en/user_guide/internnav/quick_start/training.md
diff --git a/source/en/user_guide/internnav/quick_start/train_eval.md b/source/en/user_guide/internnav/quick_start/evaluation.md
similarity index 80%
rename from source/en/user_guide/internnav/quick_start/train_eval.md
rename to source/en/user_guide/internnav/quick_start/evaluation.md
index d6dc608..5ac5864 100644
--- a/source/en/user_guide/internnav/quick_start/train_eval.md
+++ b/source/en/user_guide/internnav/quick_start/evaluation.md
@@ -1,16 +1,16 @@
-# Training and Evaluation
+# Evaluation
 
-This document presents how to train and evaluate models for different systems with InternNav. 
+This document describes how to evaluate models in **InternNav**.
 
-## Whole-system
+## InternVLA-N1 (Dual System) 
 
-### Training
-The training pipeline is currently under preparation and will be open-sourced soon.
+Model weights of InternVLA-N1 (Dual System) can be downloaded from [InternVLA-N1-DualVLN](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) and [InternVLA-N1-w-NavDP](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP).
 
-### Evaluation
-Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
+---
+
+### Evaluation on Isaac Sim
+Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory.
 
-#### Evaluation on Isaac Sim
 [UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU:
 
 ```bash
@@ -51,7 +51,7 @@ The simulation can be visualized by set `vis_output=True` in eval_cfg.
 
 <img src="../../../_static/video/nav_eval.gif" alt="My GIF">
 
-#### Evaluation on Habitat Sim
+### Evaluation on Habitat Sim
 Evaluate on Single-GPU:
 
 ```bash
@@ -74,18 +74,36 @@ For multi-gpu inference, currently we support inference on SLURM as well as envi
     --config scripts/eval/configs/habitat_dual_system_cfg.py
 ```
 
+## InternVLA-N1 (System 2)
 
-## System1
+Model weights of InternVLA-N1 (System2) can be downloaded from [InternVLA-N1-System2](https://huggingface.co/InternRobotics/InternVLA-N1-System2).
 
-### Training
+Currently we only support evaluate single System2 on Habitat:
 
-Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and organize them in the form mentioned in [installation](./installation.md).
+Evaluate on Single-GPU:
 
 ```bash
-./scripts/train/start_train.sh --name "$NAME" --model-name navdp
+python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py
+
+# set config with the following fields
+eval_cfg = EvalCfg(
+    agent=AgentCfg(
+        model_name='internvla_n1',
+        model_settings={
+            "mode": "system2",  # inference mode: dual_system or system2
+            "model_path": "checkpoints/<s2_checkpoint>",  # path to model checkpoint
+        }
+    )
+)
+```
+
+For multi-gpu inference, currently we only support inference on SLURM.
+
+```bash
+./scripts/eval/bash/eval_system2.sh
 ```
 
-### Evaluation
+## VN Systems (System 1) 
 
 We support the evaluation of diverse System-1 baselines separately in [NavDP](https://github.com/InternRobotics/NavDP/tree/navdp_benchmark) to make it easy to use and deploy.
 To install the environment, we provide a quick start below:
@@ -129,52 +147,7 @@ python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path}
 python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR}
 ```
 
-
-## System2
-
-### Training
-
-Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.
-
-```base
-# train cma model
-./scripts/train/start_train.sh --name cma_train --model cma
-
-# train rdp model
-./scripts/train/start_train.sh --name rdp_train --model rdp
-
-# train seq2seq model
-./scripts/train/start_train.sh --name seq2seq_train --model seq2seq
-```
-### Evaluation
-
-#### InternVLA-N1-S2
-Currently we only support evaluate single System2 on Habitat:
-
-Evaluate on Single-GPU:
-
-```bash
-python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py
-
-# set config with the following fields
-eval_cfg = EvalCfg(
-    agent=AgentCfg(
-        model_name='internvla_n1',
-        model_settings={
-            "mode": "system2",  # inference mode: dual_system or system2
-            "model_path": "checkpoints/<s2_checkpoint>",  # path to model checkpoint
-        }
-    )
-)
-```
-
-For multi-gpu inference, currently we only support inference on SLURM.
-
-```bash
-./scripts/eval/bash/eval_system2.sh
-```
-
-#### Baseline Models
+## Baseline VLN Single-System Models 
 We provide three small VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.
 
 Download the baseline models:
diff --git a/source/en/user_guide/internnav/quick_start/index.md b/source/en/user_guide/internnav/quick_start/index.md
index c484f2d..1e6db54 100644
--- a/source/en/user_guide/internnav/quick_start/index.md
+++ b/source/en/user_guide/internnav/quick_start/index.md
@@ -15,5 +15,6 @@ myst:
 installation
 simulation
 interndata
-train_eval
+training
+evaluation
 ```
diff --git a/source/en/user_guide/internnav/quick_start/training.md b/source/en/user_guide/internnav/quick_start/training.md
new file mode 100644
index 0000000..3600454
--- /dev/null
+++ b/source/en/user_guide/internnav/quick_start/training.md
@@ -0,0 +1,132 @@
+# Training
+
+This document provides instructions for training models in **InternNav**.  
+
+## Overview
+
+InternNav supports training models under three system paradigms:
+
+- **VLN Multi-System**: integrated System2 + System2 architectures  
+- **VLN Single-System**: end-to-end vision-and-language navigation models  
+- **VN System (System1)**: low-level visual navigation and control models  
+
+
+Each paradigm follows a different training protocol, which is detailed below.
+
+
+## VLN Multi-System
+VLN Multi-System integrates **System2** (high-level reasoning and planning) with  
+**System1** (low-level action control), supporting both modular integration and joint training.
+
+
+### Supported Systems
+- **InternVLA-N1 (System2)**  
+- **InternVLA-N1 (Dual System) w/ NavDP**
+  (*NavDP* indicates joint tuning with System2)
+- **InternVLA-N1 (Dual System) DualVLN**
+
+
+### 1. Training for InternVLA-N1 (System2)
+
+**InternVLA-N1 (System2)** is trained independently to predict 2D pixel goals for navigation.
+
+It can be used with any compatible System1 model capable of executing 2D pixel goals or point goals (given depth and pose).  
+Alternatively, it can be jointly trained together with a System1 model for end-to-end multi-system optimization.
+
+
+#### Training Command
+
+```bash
+# training system2 separately
+sbatch ./scripts/train/base_train/qwenvl_train/train_system2.sh 
+```
+
+---
+
+### 2. Joint Training for InternVLA-N1 (Dual System)
+
+After completing training of **InternVLA-N1 (System2)**, joint training is supported with a pixel-goal navigation System1, using either the **NavDP** or **NextDiT** architecture.
+
+- **InternVLA-N1 (Dual System) w/ NavDP**: preserves **NavDP**'s model design and uses **RGB-D** input.  
+- **InternVLA-N1 (Dual System) DualVLN**: uses only **RGB** input, resulting in a smaller model footprint.
+
+#### Training Command
+
+```bash
+# training system1 based on system2
+sbatch ./scripts/train/base_train/qwenvl_train/train_dual_system.sh 
+```
+
+- For **w/ NavDP** model variant, set `system1=navdp_async`. Optimal performance is typically observed after **30,000 iterations**.  
+- For **DualVLN** model variant, set `system1=nextdit_async`. Optimal performance is typically observed after **15,000 iterations**.
+
+## VLN Single-System
+
+VLN Single-System models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner.
+
+
+### Supported Models
+
+The following VLN Single-System models are currently supported:
+
+- Seq2Seq  
+- CMA  
+- RDP  
+
+For our VLM-based VLN model **StreamVLN**, please refer to the following repository for training details:  
+https://github.com/InternRobotics/StreamVLN  
+
+Support for StreamVLN within InternNav is planned for future releases.
+
+
+### Training Command
+
+Training is performed through a unified training entry script.  
+Below are example commands for each supported model.
+
+**Seq2Seq**
+```
+./scripts/train/base_train/start_train.sh --name seq2seq_train --model seq2seq
+```
+
+**CMA**
+```
+./scripts/train/base_train/start_train.sh --name cma_train --model cma
+```
+
+**RDP**
+```
+./scripts/train/base_train/start_train.sh --name rdp_train --model rdp
+```
+
+
+## VN System (System1)
+
+VN System (System1) focuses on **low-level visual navigation and motion control**.  
+
+
+### Supported Methods
+
+The following visual navigation methods are included in the System1 benchmark:
+
+- DD-PPO  
+- iPlanner  
+- ViPlanner  
+- GNM  
+- ViNT  
+- NoMaD  
+- NavDP (**InternVLA-N1 System1**)
+
+Among them, **only NavDP is currently supported for training** in InternNav.  
+All other methods are provided for **evaluation and comparison purposes only**.
+
+
+### Training Command
+
+**NavDP**
+
+
+```bash
+./scripts/train/base_train/start_train.sh --name navdp_train --model-name navdp
+```
+

From a074ec09de999b32c9bee1c73a4b3742ff8dbbbf Mon Sep 17 00:00:00 2001
From: mengwei <kellymeng0427@gmail.com>
Date: Wed, 31 Dec 2025 12:54:33 +0800
Subject: [PATCH 2/2] fix some terms

---
 .../internnav/quick_start/evaluation.md       |  4 ++--
 .../internnav/quick_start/training.md         | 22 +++++++++----------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/source/en/user_guide/internnav/quick_start/evaluation.md b/source/en/user_guide/internnav/quick_start/evaluation.md
index 5ac5864..6506b01 100644
--- a/source/en/user_guide/internnav/quick_start/evaluation.md
+++ b/source/en/user_guide/internnav/quick_start/evaluation.md
@@ -147,8 +147,8 @@ python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path}
 python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR}
 ```
 
-## Baseline VLN Single-System Models 
-We provide three small VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.
+## Single-System VLN Baselines
+We provide three small Single-System VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.
 
 Download the baseline models:
 ```bash
diff --git a/source/en/user_guide/internnav/quick_start/training.md b/source/en/user_guide/internnav/quick_start/training.md
index 3600454..9c43469 100644
--- a/source/en/user_guide/internnav/quick_start/training.md
+++ b/source/en/user_guide/internnav/quick_start/training.md
@@ -6,23 +6,23 @@ This document provides instructions for training models in **InternNav**.
 
 InternNav supports training models under three system paradigms:
 
-- **VLN Multi-System**: integrated System2 + System2 architectures  
-- **VLN Single-System**: end-to-end vision-and-language navigation models  
-- **VN System (System1)**: low-level visual navigation and control models  
+- **Dual-System VLN Models**: integrated System2 + System1 architectures  
+- **Single-System VLN Models**: end-to-end vision-and-language navigation models  
+- **VN System (System1) Models**: low-level visual navigation and control models  
 
 
 Each paradigm follows a different training protocol, which is detailed below.
 
 
-## VLN Multi-System
-VLN Multi-System integrates **System2** (high-level reasoning and planning) with  
+## Dual-System VLN Models
+Dual-System VLN Models integrates **System2** (high-level reasoning and planning) with  
 **System1** (low-level action control), supporting both modular integration and joint training.
 
 
 ### Supported Systems
 - **InternVLA-N1 (System2)**  
-- **InternVLA-N1 (Dual System) w/ NavDP**
-  (*NavDP* indicates joint tuning with System2)
+- **InternVLA-N1 (Dual System) w/ NavDP***
+  (*NavDP** indicates joint tuning with System2)
 - **InternVLA-N1 (Dual System) DualVLN**
 
 
@@ -60,14 +60,14 @@ sbatch ./scripts/train/base_train/qwenvl_train/train_dual_system.sh
 - For **w/ NavDP** model variant, set `system1=navdp_async`. Optimal performance is typically observed after **30,000 iterations**.  
 - For **DualVLN** model variant, set `system1=nextdit_async`. Optimal performance is typically observed after **15,000 iterations**.
 
-## VLN Single-System
+## Single-System VLN Models
 
-VLN Single-System models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner.
+Single-System VLN Models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner.
 
 
 ### Supported Models
 
-The following VLN Single-System models are currently supported:
+The following Single-System VLN Models are currently supported:
 
 - Seq2Seq  
 - CMA  
@@ -100,7 +100,7 @@ Below are example commands for each supported model.
 ```
 
 
-## VN System (System1)
+## VN System (System1) Models
 
 VN System (System1) focuses on **low-level visual navigation and motion control**.