中文版介绍:跳转
English intro: GOTO
基于 PyTorch 实现 MobileNetV2(一种轻量CNN)模型,完成 ImageNet 子集的 20 类图像分类任务,输出训练损失曲线、卷积核可视化、中间层特征图可视化等
MobileNetV2-ImageNet-Classification/
├── data/ # *数据集文件夹
| ├──train # *训练集
| │ ├── n04251144/
│ | ├── n04258138/
│ | └── ...(共20个类别文件夹)
| └──val # *验证集
| ├── n04251144/
│ ├── n04258138/
│ └── ...(共20个类别文件夹)
├── mobileNetV2.py # *主程序(模型定义+训练+可视化)
├── mobilenetv2_scratch_model.pth # 训练好的模型权重
└── imagenet_class_index.json # 物品编号-名称 映射文件
注意:GPU硬件要求支持CUDA的英伟达显卡(我用的RTX 3050,4GB 显存,训练4小时左右)
1、安装python、pycharm、pytorch、conda(建议用Anaconda创建虚拟环境),具体教程可以从网上搜一下
2、安装项目核心依赖:
# 在命令行里
pip install torch torchvision matplotlib numpy pillow3、下载项目到本地,三种方式选一个:
3.1、直接下载到本地:最简单(点code按钮 -> Download ZIP按钮)
3.2、先点击fork按钮,fork到你自己的账号。打开你的仓库,下载到本地(3.1)
3.3、克隆项目(粘贴在本地git终端里,回车)
git clone https://github.com/DbtSpring/MobileNetV2-Classifier-for-ImageNet.git
cd MobileNetV2-Classifier-for-ImageNet4、注意:imageNet的子集我已经划分好了,在data文件夹里,可以直接用
附官方imageNet完整版的下载链接:https://image-net.org/
5、运行mobileNetV2.py,训练
- 自动检测 GPU/CPU 设备(优先使用 GPU)
- 加载并预处理数据集(训练集数据增强,验证集仅归一化)
- 初始化 MobileNetV2 模型、损失函数和优化器
- 开始训练(默认 80 轮,控制台实时输出训练损失和验证准确率)
- 训练完成后自动保存:
- 模型权重文件:
mobilenetv2_scratch_model.pth - 损失曲线图片:
loss_curve.png - 卷积核可视化图片:
kernels_visualization.png - 特征图可视化图片:
feature_maps_visualization_layer1.png、feature_maps_visualization_layer2.png
- 模型权重文件:
倒残差块是 MobileNetV2 的核心组件,通过 “升维 - 特征过滤 - 降维” 的三段式结构,在降低计算量的同时保留关键特征,具体参数如下:
| 步骤 | 卷积核大小 (K) | 激活函数 | 作用 | 关键特性 |
|---|---|---|---|---|
| 1. 升维 (Expansion) | 1 x 1 | ReLU6 | 通道数 C_in 扩大 t 倍 | 逐点卷积(Pointwise),仅在扩张倍数 t≠1 时执行 |
| 2. 特征过滤 | 3 x 3 | ReLU6 | 在每个通道上进行空间滤波 | 深度卷积(Depthwise),每组输入通道单独卷积,大幅降低计算量 |
| 3. 降维 / 瓶颈 (Projection) | 1 x 1 | 线性激活 (无 ReLU) | 将通道数压缩回较小值 C_out | 线性瓶颈(Linear Bottleneck):保护低维度特征信息不被 ReLU 置零丢失 |
| 残差连接 | - | - | 优化梯度传播 | 仅当输入输出尺寸相同(步长 = 1 且通道数一致)时添加,输出 = 该层处理结果 + 输入 |
残差连接说明:
- 计算公式:输出 = 该层处理后的结果 + 输入
- 核心作用:解决深层网络中的梯度消失问题,强制网络层学习输入与输出之间的差异(残差),提升模型训练稳定性
整个网络由 19 层卷积操作构成(含起始层、17 个倒残差块、结束层),最终连接分类头完成类别预测,各层详细参数如下:
在mobileNetV2.py中可修改以下超参数:
NUM_CLASSES = 20 # 类别数(根据你的数据集调整)
BATCH_SIZE = 16 # 批次大小(GPU显存不足时改为8)
LEARNING_RATE = 0.001 # 学习率(可尝试0.0005或0.002)
NUM_EPOCHS = 80 # 训练轮次(可调整为50-100轮)1.1 损失收敛情况
- 趋势:训练损失与验证损失随迭代轮次增加持续下降,模型持续学习并优化参数。
- 最终值:80 轮训练后,验证损失稳定在 0.65 左右,训练损失稳定在 0.75 左右,模型表现良好。
1.2 泛化能力与过拟合
- 初期:验证损失(橙线)低于训练损失(蓝线),因训练时启用 Dropout 和数据增强增加数据复杂度,验证时关闭 Dropout 使用稳定参数。
- 后期(50-80 轮):两条曲线趋于接近,验证损失无反弹或激增。
- 结论:模型泛化能力优秀,无明显过拟合,训练集学到的特征可有效迁移到未见过的验证集。
- 分析:为 MobileNetV2 第一层 nn.Conv2d (3, 32, kernel_size=3) 的 32 个 3×3 卷积核。
- 模式:呈现多种颜色对比和边缘检测模式(如左上角青色 / 紫色对比、右下角深色 / 白色对比)。
- 结论:初始卷积层成功学习到颜色、边缘、纹理等基础视觉特征,为后续复杂特征提取奠定基础,模型初始化和训练过程有效。
| 图示名称 | 对应层级描述 | 代码层索引 | 特征性质 |
|---|---|---|---|
| Layer 1 | 较浅层特征(离输入层近) | model.features [0](第一个倒残差块) | 提取边缘、轮廓等低级特征 |
| Layer 2 | 较深层特征(离输出层近) | model.features [9](第十个倒残差块) | 提取抽象语义信息、物体部件等高级特征 |
Implement the MobileNetV2 (a lightweight Convolutional Neural Network, CNN) model based on PyTorch to complete the 20-class image classification task on the ImageNet subset. Outputs include training loss curves, convolution kernel visualization, and intermediate layer feature map visualization.
MobileNetV2-ImageNet-Classification/
├── data/ # *Dataset folder
| ├── train # *Training set
| │ ├── n04251144/
│ | ├── n04258138/
│ | └── ... (20 class folders in total)
| └── val # *Validation set
| ├── n04251144/
│ ├── n04258138/
│ └── ... (20 class folders in total)
├── mobileNetV2.py # *Main program (model definition + training + visualization)
├── mobilenetv2_scratch_model.pth # Trained model weights
└── imagenet_class_index.json # Item ID-name mapping file
Note: GPU hardware requires an NVIDIA graphics card supporting CUDA (I used RTX 3050 with 4GB VRAM, training takes about 4 hours).
1.Install Python, PyCharm, PyTorch, and Conda (it is recommended to create a virtual environment using Anaconda). Specific tutorials can be found online.
2.Install core project dependencies:
# In the command line
pip install torch torchvision matplotlib numpy pillow3.Download the project to your local machine, choose one of the three methods:
3.1. Direct download to local: The simplest way (click the Code button -> Download ZIP button).
3.2. First click the Fork button to fork to your own account. Open your repository and download to local (as in 3.1).
3.3. Clone the project (paste in the local Git terminal and press Enter):
git clone https://github.com/DbtSpring/MobileNetV2-Classifier-for-ImageNet.git
cd MobileNetV2-Classifier-for-ImageNet4.Note: I have already split the ImageNet subset, which is in the data folder and can be used directly.Official full ImageNet download link: https://image-net.org/
5.Run mobileNetV2.py to start training.
- Automatically detect GPU/CPU devices (GPU is preferred).
- Load and preprocess the dataset (data augmentation for the training set, only normalization for the validation set).
- Initialize the MobileNetV2 model, loss function, and optimizer.
- Start training (80 epochs by default, real-time output of training loss and validation accuracy in the console).
- Automatically save after training:
- Model weight file:
mobilenetv2_scratch_model.pth - Loss curve image:
loss_curve.png - Convolution kernel visualization image:
kernels_visualization.png - Feature map visualization images:
feature_maps_visualization_layer1.png,feature_maps_visualization_layer2.png
- Model weight file:
The inverted residual block is the core component of MobileNetV2. Through a three-stage structure of "expansion - feature filtering - projection", it retains key features while reducing computational complexity. The specific parameters are as follows:
| Step | Convolution Kernel Size (K) | Activation Function | Function | Key Features |
|---|---|---|---|---|
| 1. Expansion | 1 x 1 | ReLU6 | Expand the number of channels C_in by t times | Pointwise convolution, executed only when the expansion factor t≠1 |
| 2. Feature Filtering | 3 x 3 | ReLU6 | Perform spatial filtering on each channel | Depthwise convolution, each group of input channels is convolved independently, significantly reducing computational complexity |
| 3. Projection (Bottleneck) | 1 x 1 | Linear Activation (no ReLU) | Compress the number of channels back to a smaller value C_out | Linear Bottleneck: Protect low-dimensional feature information from being zeroed out by ReLU |
| Residual Connection | - | - | Optimize gradient propagation | Added only when the input and output sizes are the same (stride=1 and consistent number of channels), Output = Result of the layer processing + Input |
Explanation of Residual Connection:
- Calculation formula: Output = Result of the layer processing + Input
- Core function: Solve the gradient vanishing problem in deep networks, force network layers to learn the difference (residual) between input and output, and improve the stability of model training.
The entire network consists of 19 convolutional operations (including the initial layer, 17 inverted residual blocks, and the final layer), and finally connects to the classification head to complete category prediction. The detailed parameters of each layer are as follows:
The following hyperparameters can be modified in mobileNetV2.py:
NUM_CLASSES = 20 # Number of classes (adjust according to your dataset)
BATCH_SIZE = 16 # Batch size (change to 8 if GPU VRAM is insufficient)
LEARNING_RATE = 0.001 # Learning rate (can try 0.0005 or 0.002)
NUM_EPOCHS = 80 # Number of training epochs (can adjust to 50-100 epochs)1.1 Loss Convergence
- Trend: Both training loss and validation loss continuously decrease with the increase of training epochs, indicating the model continuously learns and optimizes parameters.
- Final Values: After 80 training epochs, the validation loss stabilizes around 0.65, and the training loss stabilizes around 0.75, showing good model performance.
1.2 Generalization Ability and Overfitting
- Early Stage: The validation loss (orange line) is lower than the training loss (blue line). This is because Dropout and data augmentation are enabled during training to increase data complexity, while Dropout is disabled and stable parameters are used during validation.
- Late Stage (Epochs 50-80): The two curves tend to converge, and there is no rebound or sharp surge in validation loss.
- Conclusion: The model has excellent generalization ability with no obvious overfitting. The features learned from the training set can be effectively transferred to the unseen validation set.
- Analysis: These are 32 3×3 convolutional kernels from the first layer
nn.Conv2d(3, 32, kernel_size=3)of MobileNetV2. - Patterns: They exhibit various color contrast and edge detection patterns (e.g., cyan/purple contrast in the top-left corner, dark/white contrast in the bottom-right corner).
- Conclusion: The initial convolutional layer successfully learns basic visual features such as colors, edges, and textures, laying a foundation for subsequent complex feature extraction. This confirms the effectiveness of the model initialization and training process.
| Visualization Name | Layer Description | Corresponding Code Index | Feature Nature |
|---|---|---|---|
| Layer 1 | Shallow features (close to the input layer) | model.features[0] (first inverted residual block) |
Extracts low-level features such as edges and contours |
| Layer 2 | Deep features (close to the output layer) | model.features[9] (tenth inverted residual block) |
E |





