Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Training and Evaluation
# Evaluation

This document presents how to train and evaluate models for different systems with InternNav.
This document describes how to evaluate models in **InternNav**.

## Whole-system
## InternVLA-N1 (Dual System)

### Training
The training pipeline is currently under preparation and will be open-sourced soon.
Model weights of InternVLA-N1 (Dual System) can be downloaded from [InternVLA-N1-DualVLN](https://huggingface.co/InternRobotics/InternVLA-N1-DualVLN) and [InternVLA-N1-w-NavDP](https://huggingface.co/InternRobotics/InternVLA-N1-w-NavDP).

### Evaluation
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory. Model weights of InternVLA-N1 can be downloaded from [InternVLA-N1](https://huggingface.co/InternRobotics/InternVLA-N1).
---

### Evaluation on Isaac Sim
Before evaluation, we should download the robot assets from [InternUTopiaAssets](https://huggingface.co/datasets/InternRobotics/Embodiments) and move them to the `data/` directory.

#### Evaluation on Isaac Sim
[UPDATE] We support using local model and isaac sim in one process now. Evaluate on Single-GPU:

```bash
Expand Down Expand Up @@ -51,7 +51,7 @@ The simulation can be visualized by set `vis_output=True` in eval_cfg.

<img src="../../../_static/video/nav_eval.gif" alt="My GIF">

#### Evaluation on Habitat Sim
### Evaluation on Habitat Sim
Evaluate on Single-GPU:

```bash
Expand All @@ -74,18 +74,36 @@ For multi-gpu inference, currently we support inference on SLURM as well as envi
--config scripts/eval/configs/habitat_dual_system_cfg.py
```

## InternVLA-N1 (System 2)

## System1
Model weights of InternVLA-N1 (System2) can be downloaded from [InternVLA-N1-System2](https://huggingface.co/InternRobotics/InternVLA-N1-System2).

### Training
Currently we only support evaluate single System2 on Habitat:

Download the training data from [Hugging Face](https://huggingface.co/datasets/InternRobotics/InternData-N1/), and organize them in the form mentioned in [installation](./installation.md).
Evaluate on Single-GPU:

```bash
./scripts/train/start_train.sh --name "$NAME" --model-name navdp
python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py

# set config with the following fields
eval_cfg = EvalCfg(
agent=AgentCfg(
model_name='internvla_n1',
model_settings={
"mode": "system2", # inference mode: dual_system or system2
"model_path": "checkpoints/<s2_checkpoint>", # path to model checkpoint
}
)
)
```

For multi-gpu inference, currently we only support inference on SLURM.

```bash
./scripts/eval/bash/eval_system2.sh
```

### Evaluation
## VN Systems (System 1)

We support the evaluation of diverse System-1 baselines separately in [NavDP](https://github.com/InternRobotics/NavDP/tree/navdp_benchmark) to make it easy to use and deploy.
To install the environment, we provide a quick start below:
Expand Down Expand Up @@ -129,53 +147,8 @@ python navdp_server.py --port {PORT} --checkpoint {CHECKPOINT_path}
python eval_pointgoal_wheeled.py --port {PORT} --scene_dir {SCENE_DIR}
```


## System2

### Training

Currently, we only support training of small VLN models (CMA, RDP, Seq2Seq) in this repo. For the training of LLM-based VLN (Navid, StreamVLN, etc), please refer to [StreamVLN](https://github.com/OpenRobotLab/StreamVLN) for training details.

```base
# train cma model
./scripts/train/start_train.sh --name cma_train --model cma

# train rdp model
./scripts/train/start_train.sh --name rdp_train --model rdp

# train seq2seq model
./scripts/train/start_train.sh --name seq2seq_train --model seq2seq
```
### Evaluation

#### InternVLA-N1-S2
Currently we only support evaluate single System2 on Habitat:

Evaluate on Single-GPU:

```bash
python scripts/eval/eval.py --config scripts/eval/configs/habitat_s2_cfg.py

# set config with the following fields
eval_cfg = EvalCfg(
agent=AgentCfg(
model_name='internvla_n1',
model_settings={
"mode": "system2", # inference mode: dual_system or system2
"model_path": "checkpoints/<s2_checkpoint>", # path to model checkpoint
}
)
)
```

For multi-gpu inference, currently we only support inference on SLURM.

```bash
./scripts/eval/bash/eval_system2.sh
```

#### Baseline Models
We provide three small VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.
## Single-System VLN Baselines
We provide three small Single-System VLN baselines (Seq2Seq, CMA, RDP) for evaluation in the InterUtopia (Isaac-Sim) environment.

Download the baseline models:
```bash
Expand Down
3 changes: 2 additions & 1 deletion source/en/user_guide/internnav/quick_start/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ myst:
installation
simulation
interndata
train_eval
training
evaluation
```
132 changes: 132 additions & 0 deletions source/en/user_guide/internnav/quick_start/training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Training

This document provides instructions for training models in **InternNav**.

## Overview

InternNav supports training models under three system paradigms:

- **Dual-System VLN Models**: integrated System2 + System1 architectures
- **Single-System VLN Models**: end-to-end vision-and-language navigation models
- **VN System (System1) Models**: low-level visual navigation and control models


Each paradigm follows a different training protocol, which is detailed below.


## Dual-System VLN Models
Dual-System VLN Models integrates **System2** (high-level reasoning and planning) with
**System1** (low-level action control), supporting both modular integration and joint training.


### Supported Systems
- **InternVLA-N1 (System2)**
- **InternVLA-N1 (Dual System) w/ NavDP***
(*NavDP** indicates joint tuning with System2)
- **InternVLA-N1 (Dual System) DualVLN**


### 1. Training for InternVLA-N1 (System2)

**InternVLA-N1 (System2)** is trained independently to predict 2D pixel goals for navigation.

It can be used with any compatible System1 model capable of executing 2D pixel goals or point goals (given depth and pose).
Alternatively, it can be jointly trained together with a System1 model for end-to-end multi-system optimization.


#### Training Command

```bash
# training system2 separately
sbatch ./scripts/train/base_train/qwenvl_train/train_system2.sh
```

---

### 2. Joint Training for InternVLA-N1 (Dual System)

After completing training of **InternVLA-N1 (System2)**, joint training is supported with a pixel-goal navigation System1, using either the **NavDP** or **NextDiT** architecture.

- **InternVLA-N1 (Dual System) w/ NavDP**: preserves **NavDP**'s model design and uses **RGB-D** input.
- **InternVLA-N1 (Dual System) DualVLN**: uses only **RGB** input, resulting in a smaller model footprint.

#### Training Command

```bash
# training system1 based on system2
sbatch ./scripts/train/base_train/qwenvl_train/train_dual_system.sh
```

- For **w/ NavDP** model variant, set `system1=navdp_async`. Optimal performance is typically observed after **30,000 iterations**.
- For **DualVLN** model variant, set `system1=nextdit_async`. Optimal performance is typically observed after **15,000 iterations**.

## Single-System VLN Models

Single-System VLN Models directly map **visual observations and language instructions** to navigation actions in an end-to-end manner.


### Supported Models

The following Single-System VLN Models are currently supported:

- Seq2Seq
- CMA
- RDP

For our VLM-based VLN model **StreamVLN**, please refer to the following repository for training details:
https://github.com/InternRobotics/StreamVLN

Support for StreamVLN within InternNav is planned for future releases.


### Training Command

Training is performed through a unified training entry script.
Below are example commands for each supported model.

**Seq2Seq**
```
./scripts/train/base_train/start_train.sh --name seq2seq_train --model seq2seq
```

**CMA**
```
./scripts/train/base_train/start_train.sh --name cma_train --model cma
```

**RDP**
```
./scripts/train/base_train/start_train.sh --name rdp_train --model rdp
```


## VN System (System1) Models

VN System (System1) focuses on **low-level visual navigation and motion control**.


### Supported Methods

The following visual navigation methods are included in the System1 benchmark:

- DD-PPO
- iPlanner
- ViPlanner
- GNM
- ViNT
- NoMaD
- NavDP (**InternVLA-N1 System1**)

Among them, **only NavDP is currently supported for training** in InternNav.
All other methods are provided for **evaluation and comparison purposes only**.


### Training Command

**NavDP**


```bash
./scripts/train/base_train/start_train.sh --name navdp_train --model-name navdp
```