用MobileNetV2（CNN）训练ImageNet子集的分类器

中文版介绍：跳转

English intro: GOTO

用MobileNetV2（CNN）训练ImageNet子集的分类器

项目简介

基于 PyTorch 实现 MobileNetV2（一种轻量CNN）模型，完成 ImageNet 子集的 20 类图像分类任务，输出训练损失曲线、卷积核可视化、中间层特征图可视化等

项目结构（标*为重要）

MobileNetV2-ImageNet-Classification/
├── data/                  # *数据集文件夹
|	├──train				# *训练集
|	│   ├── n04251144/
│   |	├── n04258138/
│   |	└── ...（共20个类别文件夹）
|	└──val					# *验证集
|	   	├── n04251144/
│  		├── n04258138/
│   	└── ...（共20个类别文件夹）
├── mobileNetV2.py         # *主程序（模型定义+训练+可视化）
├── mobilenetv2_scratch_model.pth  # 训练好的模型权重
└── imagenet_class_index.json	# 物品编号-名称 映射文件

快速上手

注意：GPU硬件要求支持CUDA的英伟达显卡（我用的RTX 3050,4GB 显存，训练4小时左右）

1、安装python、pycharm、pytorch、conda（建议用Anaconda创建虚拟环境），具体教程可以从网上搜一下

2、安装项目核心依赖：

# 在命令行里
pip install torch torchvision matplotlib numpy pillow

3、下载项目到本地，三种方式选一个：

3.1、直接下载到本地：最简单（点code按钮 -> Download ZIP按钮）

3.2、先点击fork按钮，fork到你自己的账号。打开你的仓库，下载到本地（3.1）

3.3、克隆项目（粘贴在本地git终端里，回车）

git clone https://github.com/DbtSpring/MobileNetV2-Classifier-for-ImageNet.git
cd MobileNetV2-Classifier-for-ImageNet

4、注意：imageNet的子集我已经划分好了，在data文件夹里，可以直接用

附官方imageNet完整版的下载链接：https://image-net.org/

5、运行mobileNetV2.py，训练

项目运行流程

自动检测 GPU/CPU 设备（优先使用 GPU）
加载并预处理数据集（训练集数据增强，验证集仅归一化）
初始化 MobileNetV2 模型、损失函数和优化器
开始训练（默认 80 轮，控制台实时输出训练损失和验证准确率）
训练完成后自动保存：
- 模型权重文件：mobilenetv2_scratch_model.pth
- 损失曲线图片：loss_curve.png
- 卷积核可视化图片：kernels_visualization.png
- 特征图可视化图片：feature_maps_visualization_layer1.png、feature_maps_visualization_layer2.png

详细介绍

1 核心构建单元：倒残差块（Inverted Residual Block）

倒残差块是 MobileNetV2 的核心组件，通过 “升维 - 特征过滤 - 降维” 的三段式结构，在降低计算量的同时保留关键特征，具体参数如下：

步骤	卷积核大小 (K)	激活函数	作用	关键特性
1. 升维 (Expansion)	1 x 1	ReLU6	通道数 C_in 扩大 t 倍	逐点卷积（Pointwise），仅在扩张倍数 t≠1 时执行
2. 特征过滤	3 x 3	ReLU6	在每个通道上进行空间滤波	深度卷积（Depthwise），每组输入通道单独卷积，大幅降低计算量
3. 降维 / 瓶颈 (Projection)	1 x 1	线性激活 (无 ReLU)	将通道数压缩回较小值 C_out	线性瓶颈（Linear Bottleneck）：保护低维度特征信息不被 ReLU 置零丢失
残差连接	-	-	优化梯度传播	仅当输入输出尺寸相同（步长 = 1 且通道数一致）时添加，输出 = 该层处理结果 + 输入

残差连接说明：

计算公式：输出 = 该层处理后的结果 + 输入
核心作用：解决深层网络中的梯度消失问题，强制网络层学习输入与输出之间的差异（残差），提升模型训练稳定性

2 整体网络结构与参数

整个网络由 19 层卷积操作构成（含起始层、17 个倒残差块、结束层），最终连接分类头完成类别预测，各层详细参数如下：

3 训练与评估核心参数

4 参数优化

在mobileNetV2.py中可修改以下超参数：

NUM_CLASSES = 20        # 类别数（根据你的数据集调整）
BATCH_SIZE = 16         # 批次大小（GPU显存不足时改为8）
LEARNING_RATE = 0.001   # 学习率（可尝试0.0005或0.002）
NUM_EPOCHS = 80         # 训练轮次（可调整为50-100轮）

实验结果参考

1. 随着训练次数的增加，模型损失函数的变化曲线图

2. 训练完成后，可视化输出各层feature map及卷积核kernal_visualization:

结果分析

1. 损失函数变化曲线

1.1 损失收敛情况

趋势：训练损失与验证损失随迭代轮次增加持续下降，模型持续学习并优化参数。
最终值：80 轮训练后，验证损失稳定在 0.65 左右，训练损失稳定在 0.75 左右，模型表现良好。

1.2 泛化能力与过拟合

初期：验证损失（橙线）低于训练损失（蓝线），因训练时启用 Dropout 和数据增强增加数据复杂度，验证时关闭 Dropout 使用稳定参数。
后期（50-80 轮）：两条曲线趋于接近，验证损失无反弹或激增。
结论：模型泛化能力优秀，无明显过拟合，训练集学到的特征可有效迁移到未见过的验证集。

2. 卷积核（Kernel）

分析：为 MobileNetV2 第一层 nn.Conv2d (3, 32, kernel_size=3) 的 32 个 3×3 卷积核。
模式：呈现多种颜色对比和边缘检测模式（如左上角青色 / 紫色对比、右下角深色 / 白色对比）。
结论：初始卷积层成功学习到颜色、边缘、纹理等基础视觉特征，为后续复杂特征提取奠定基础，模型初始化和训练过程有效。

3. 特征图（Feature Map）

图示名称	对应层级描述	代码层索引	特征性质
Layer 1	较浅层特征（离输入层近）	model.features [0]（第一个倒残差块）	提取边缘、轮廓等低级特征
Layer 2	较深层特征（离输出层近）	model.features [9]（第十个倒残差块）	提取抽象语义信息、物体部件等高级特征

Train a classifier for the ImageNet subset using MobileNetV2 (a Convolutional Neural Network, CNN).

Project Introduction

Implement the MobileNetV2 (a lightweight Convolutional Neural Network, CNN) model based on PyTorch to complete the 20-class image classification task on the ImageNet subset. Outputs include training loss curves, convolution kernel visualization, and intermediate layer feature map visualization.

Project Structure (marked with * as important)

MobileNetV2-ImageNet-Classification/
├── data/                  # *Dataset folder
|   ├── train              # *Training set
|   │   ├── n04251144/
│   |   ├── n04258138/
│   |   └── ... (20 class folders in total)
|   └── val                # *Validation set
|       ├── n04251144/
│       ├── n04258138/
│       └── ... (20 class folders in total)
├── mobileNetV2.py         # *Main program (model definition + training + visualization)
├── mobilenetv2_scratch_model.pth  # Trained model weights
└── imagenet_class_index.json    # Item ID-name mapping file

Quick Start

Note: GPU hardware requires an NVIDIA graphics card supporting CUDA (I used RTX 3050 with 4GB VRAM, training takes about 4 hours).

1.Install Python, PyCharm, PyTorch, and Conda (it is recommended to create a virtual environment using Anaconda). Specific tutorials can be found online.

2.Install core project dependencies:

# In the command line
pip install torch torchvision matplotlib numpy pillow

3.Download the project to your local machine, choose one of the three methods:

3.1. Direct download to local: The simplest way (click the Code button -> Download ZIP button).

3.2. First click the Fork button to fork to your own account. Open your repository and download to local (as in 3.1).

3.3. Clone the project (paste in the local Git terminal and press Enter):

git clone https://github.com/DbtSpring/MobileNetV2-Classifier-for-ImageNet.git
cd MobileNetV2-Classifier-for-ImageNet

4.Note: I have already split the ImageNet subset, which is in the data folder and can be used directly.Official full ImageNet download link: https://image-net.org/

5.Run mobileNetV2.py to start training.

Project Running Process

Automatically detect GPU/CPU devices (GPU is preferred).
Load and preprocess the dataset (data augmentation for the training set, only normalization for the validation set).
Initialize the MobileNetV2 model, loss function, and optimizer.
Start training (80 epochs by default, real-time output of training loss and validation accuracy in the console).
Automatically save after training:
- Model weight file: mobilenetv2_scratch_model.pth
- Loss curve image: loss_curve.png
- Convolution kernel visualization image: kernels_visualization.png
- Feature map visualization images: feature_maps_visualization_layer1.png, feature_maps_visualization_layer2.png

Detailed Introduction

1 Core Building Block: Inverted Residual Block

The inverted residual block is the core component of MobileNetV2. Through a three-stage structure of "expansion - feature filtering - projection", it retains key features while reducing computational complexity. The specific parameters are as follows:

Step	Convolution Kernel Size (K)	Activation Function	Function	Key Features
1. Expansion	1 x 1	ReLU6	Expand the number of channels C_in by t times	Pointwise convolution, executed only when the expansion factor t≠1
2. Feature Filtering	3 x 3	ReLU6	Perform spatial filtering on each channel	Depthwise convolution, each group of input channels is convolved independently, significantly reducing computational complexity
3. Projection (Bottleneck)	1 x 1	Linear Activation (no ReLU)	Compress the number of channels back to a smaller value C_out	Linear Bottleneck: Protect low-dimensional feature information from being zeroed out by ReLU
Residual Connection	-	-	Optimize gradient propagation	Added only when the input and output sizes are the same (stride=1 and consistent number of channels), Output = Result of the layer processing + Input

Explanation of Residual Connection:

Calculation formula: Output = Result of the layer processing + Input
Core function: Solve the gradient vanishing problem in deep networks, force network layers to learn the difference (residual) between input and output, and improve the stability of model training.

2 Overall Network Structure and Parameters

The entire network consists of 19 convolutional operations (including the initial layer, 17 inverted residual blocks, and the final layer), and finally connects to the classification head to complete category prediction. The detailed parameters of each layer are as follows:

3 Core Training and Evaluation Parameters

4 Parameter Optimization

The following hyperparameters can be modified in mobileNetV2.py:

NUM_CLASSES = 20        # Number of classes (adjust according to your dataset)
BATCH_SIZE = 16         # Batch size (change to 8 if GPU VRAM is insufficient)
LEARNING_RATE = 0.001   # Learning rate (can try 0.0005 or 0.002)
NUM_EPOCHS = 80         # Number of training epochs (can adjust to 50-100 epochs)

Experimental Results Reference

1. Model Loss Function Variation Curve with Training Epochs

2. Visualization of Feature Maps and Convolution Kernels After Training

Result Analysis

1. Loss Function Variation Curve

1.1 Loss Convergence

Trend: Both training loss and validation loss continuously decrease with the increase of training epochs, indicating the model continuously learns and optimizes parameters.
Final Values: After 80 training epochs, the validation loss stabilizes around 0.65, and the training loss stabilizes around 0.75, showing good model performance.

1.2 Generalization Ability and Overfitting

Early Stage: The validation loss (orange line) is lower than the training loss (blue line). This is because Dropout and data augmentation are enabled during training to increase data complexity, while Dropout is disabled and stable parameters are used during validation.
Late Stage (Epochs 50-80): The two curves tend to converge, and there is no rebound or sharp surge in validation loss.
Conclusion: The model has excellent generalization ability with no obvious overfitting. The features learned from the training set can be effectively transferred to the unseen validation set.

2. Convolutional Kernels (Kernel)

Analysis: These are 32 3×3 convolutional kernels from the first layer nn.Conv2d(3, 32, kernel_size=3) of MobileNetV2.
Patterns: They exhibit various color contrast and edge detection patterns (e.g., cyan/purple contrast in the top-left corner, dark/white contrast in the bottom-right corner).
Conclusion: The initial convolutional layer successfully learns basic visual features such as colors, edges, and textures, laying a foundation for subsequent complex feature extraction. This confirms the effectiveness of the model initialization and training process.

3. Feature Maps

Visualization Name	Layer Description	Corresponding Code Index	Feature Nature
Layer 1	Shallow features (close to the input layer)	`model.features[0]` (first inverted residual block)	Extracts low-level features such as edges and contours
Layer 2	Deep features (close to the output layer)	`model.features[9]` (tenth inverted residual block)	E

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
imagesForREADME		imagesForREADME
.gitignore		.gitignore
README.md		README.md
imagenet_class_index.json		imagenet_class_index.json
mobileNetV2.py		mobileNetV2.py
mobilenetv2_scratch_model.pth		mobilenetv2_scratch_model.pth

DbtSpring/MobileNetV2-ImageNet-20Class-Classification

Folders and files

Latest commit

History

Repository files navigation

用MobileNetV2（CNN）训练ImageNet子集的分类器

项目简介

项目结构（标*为重要）

快速上手

项目运行流程

详细介绍

1 核心构建单元：倒残差块（Inverted Residual Block）

2 整体网络结构与参数

3 训练与评估核心参数

4 参数优化

实验结果参考

1. 随着训练次数的增加，模型损失函数的变化曲线图

2. 训练完成后，可视化输出各层feature map及卷积核kernal_visualization:

结果分析

1. 损失函数变化曲线

2. 卷积核（Kernel）

3. 特征图（Feature Map）

Train a classifier for the ImageNet subset using MobileNetV2 (a Convolutional Neural Network, CNN).

Project Introduction

Project Structure (marked with * as important)

Quick Start

Project Running Process

Detailed Introduction

1 Core Building Block: Inverted Residual Block

2 Overall Network Structure and Parameters

3 Core Training and Evaluation Parameters

4 Parameter Optimization

Experimental Results Reference

1. Model Loss Function Variation Curve with Training Epochs

2. Visualization of Feature Maps and Convolution Kernels After Training

Result Analysis

1. Loss Function Variation Curve

2. Convolutional Kernels (Kernel)

3. Feature Maps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages