-
Notifications
You must be signed in to change notification settings - Fork 0
Python and benchmark #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6f97ec2
numpy + python jpeg decoder
5000user5000 da7357a
減少上傳照片 (側視圖片用 lena 就行)
5000user5000 92c9e0a
benchmark 腳本
5000user5000 9ba7225
修復 decoder 失真率過高問題
5000user5000 c04a8f0
報告內容整理
5000user5000 f98a580
修正路徑問題
5000user5000 21bdaea
文件整理
5000user5000 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,293 @@ | ||
| # JPEG Decoder Benchmark Results | ||
|
|
||
| 本文檔記錄 Fast JPEG Decoder 專案的性能測試結果和正確性驗證。 | ||
|
|
||
| ## 測試概述 | ||
|
|
||
| ### 實現對比 | ||
|
|
||
| | 實現方式 | 語言 | 描述 | | ||
| |---------|------|------| | ||
| | **C++ 核心** | C++17 | 使用 pybind11 綁定的高性能實現 | | ||
| | **NumPy** | Python | 使用 NumPy 向量化優化的純 Python 實現 | | ||
|
|
||
| ### 測試環境 | ||
|
|
||
| - **Python**: 3.8+ | ||
| - **NumPy**: 最新版本 | ||
| - **編譯器**: GCC/Clang with `-O3` optimization | ||
| - **測試圖片**: `tests/test_data/` | ||
| - **重複次數**: 每個測試執行 10 次取平均 | ||
| - **Ground Truth**: PIL (Pillow) 9.x | ||
|
|
||
| ## 如何執行 Benchmark | ||
|
|
||
| ### 編譯專案 | ||
|
|
||
| ```bash | ||
| # 安裝依賴 | ||
| pip install numpy pybind11 | ||
|
|
||
| # 編譯 C++ 模組(開發模式) | ||
| make develop | ||
| ``` | ||
|
|
||
| ### 執行性能測試 | ||
|
|
||
| ```bash | ||
| # 從專案根目錄執行 | ||
| python benchmarks/run_benchmark.py | ||
|
|
||
| # 或從 benchmarks 目錄執行 | ||
| cd benchmarks | ||
| python run_benchmark.py | ||
| ``` | ||
|
|
||
| ## 性能測試結果 | ||
|
|
||
| ### 完整性能數據 | ||
|
|
||
| | 圖片 | C++ Decoder | NumPy Decoder | 加速比 | | ||
| |------|-------------|---------------|--------| | ||
| | **Lena (512×512)** | 67.50 ms | 295.99 ms | **4.38×** | | ||
| | **Images (183×275)** | 7.50 ms | 33.09 ms | **4.41×** | | ||
| | **Sample (64×64)** | 0.56 ms | 2.05 ms | **3.63×** | | ||
|
|
||
| **平均加速比**: C++ 比 NumPy 快 **約 4.4 倍** | ||
|
|
||
| ### 性能分析 | ||
|
|
||
| #### C++ 實現優勢 | ||
| - ✅ **編譯優化**: 編譯為機器碼,無解釋器開銷 | ||
| - ✅ **直接記憶體操作**: 減少記憶體拷貝和分配 | ||
| - ✅ **高效 BitStream**: 32-bit 緩衝區機制 | ||
| - ✅ **內聯優化**: 函數調用開銷最小化 | ||
|
|
||
| #### NumPy 實現瓶頸 | ||
| - ⚠️ **Huffman 解碼**: 佔總時間 30-40%,無法向量化 | ||
| - ⚠️ **Python 迴圈開銷**: 逐位元處理的效率限制 | ||
| - ✅ **IDCT 優化**: 使用矩陣運算加速(但仍受限於整體流程) | ||
|
|
||
| **結論**: 即使使用 NumPy 優化,Python 的直譯特性在位元級操作上仍有顯著開銷。 | ||
|
|
||
| ## 正確性驗證 | ||
|
|
||
| ### PSNR (峰值訊噪比) 指標 | ||
|
|
||
| 使用 **PIL/Pillow** 作為參考標準(Ground Truth)進行比較。 | ||
|
|
||
| #### PSNR 品質判定標準 | ||
| - **> 40 dB**: 優秀 (Excellent) | ||
| - **30-40 dB**: 良好 (Good) - 視覺上無失真 | ||
| - **20-30 dB**: 可接受 (Acceptable) | ||
| - **< 20 dB**: 品質較差 (Poor) | ||
|
|
||
| ### 驗證結果 | ||
|
|
||
| | 解碼器 | vs PIL (Lena) | vs PIL (Images) | 判定 | | ||
| |--------|---------------|-----------------|------| | ||
| | **C++ Decoder** | **35.20 dB** | **31.25 dB** | ✅ 良好 | | ||
| | **NumPy Decoder** | **35.15 dB** | **31.20 dB** | ✅ 良好 | | ||
|
|
||
| #### 分析 | ||
|
|
||
| - ✅ **兩個版本的 PSNR 均超過 30 dB**,屬於**高品質還原** | ||
| - ✅ **C++ 與 NumPy 的結果極為接近**,證明兩者的演算法邏輯一致且正確 | ||
| - ✅ **視覺上無失真**:PSNR > 30 dB 代表人眼無法察覺差異 | ||
| - 📊 **細微差異來源**:浮點數運算精度、四捨五入策略等 | ||
|
|
||
| ## 已修復的關鍵問題 | ||
|
|
||
| 在開發過程中,我們解決了多個嚴重影響正確性與穩定性的問題: | ||
|
|
||
| ### 🔥 問題 1: 量化表 Zigzag 排列錯誤 | ||
|
|
||
| **問題現象**: | ||
| - NumPy 版本解碼出的圖片嚴重變暗(Mean ~85 vs 標準值 128) | ||
| - 細節完全破壞 | ||
|
|
||
| **根本原因**: | ||
| - JPEG 文件中的量化表以 **Zigzag 順序** 儲存 | ||
| - 初版代碼直接 `reshape(8, 8)`,導致高頻量化係數錯位到低頻位置 | ||
|
|
||
| **解決方案**: | ||
| ```python | ||
| # 修正前(錯誤) | ||
| self.quantization_tables[id] = np.array(values).reshape(8, 8) | ||
|
|
||
| # 修正後(正確) | ||
| self.quantization_tables[id] = self.zigzag_to_2d(np.array(values)) | ||
| ``` | ||
|
|
||
| **影響**: | ||
| - ✅ 修復後 PSNR 從 ~15 dB 提升到 35+ dB | ||
| - ✅ 圖像亮度和細節完全恢復 | ||
|
|
||
| ### 🔥 問題 2: 4:2:0 色度上採樣崩潰 | ||
|
|
||
| **問題現象**: | ||
| - 解碼 4:2:0 子採樣圖片時發生 Segmentation Fault (C++) 或 Index Error (Python) | ||
|
|
||
| **根本原因**: | ||
| - Cb/Cr 通道在 4:2:0 模式下尺寸為 Y 通道的 1/4 | ||
| - 上採樣邏輯未正確處理維度變換 | ||
|
|
||
| **解決方案**: | ||
| ```python | ||
| # C++ 版本 | ||
| if (sampling_factor == 0x22) { // 4:2:0 | ||
| upsample_2x2(cb_channel); | ||
| upsample_2x2(cr_channel); | ||
| } | ||
|
|
||
| # Python 版本 | ||
| cb_upsampled = np.repeat(np.repeat(cb, 2, axis=0), 2, axis=1) | ||
| cr_upsampled = np.repeat(np.repeat(cr, 2, axis=0), 2, axis=1) | ||
| ``` | ||
|
|
||
| **影響**: | ||
| - ✅ 現在可正確處理各種子採樣模式(4:4:4, 4:2:0, 4:2:2) | ||
|
|
||
| ### 🔥 問題 3: 數值精度導致的微小差異 | ||
|
|
||
| **問題現象**: | ||
| - 即便邏輯正確,自製解碼器與 PIL 仍有細微差異 | ||
|
|
||
| **根本原因**: | ||
| - 浮點數運算順序不同 | ||
| - 四捨五入策略差異 | ||
| - IDCT 實現的數值精度 | ||
|
|
||
| **解決方案**: | ||
| - 使用 PSNR 而非像素完全匹配來驗證正確性 | ||
| - PSNR > 30 dB 即代表視覺上無失真 | ||
|
|
||
| **結論**: | ||
| - ✅ 當前誤差在合理範圍內(35+ dB) | ||
| - ✅ 不影響實際應用 | ||
|
|
||
| ## 性能瓶頸分析 | ||
|
|
||
| ### NumPy 實現的時間分佈 | ||
|
|
||
| 使用 `cProfile` 分析: | ||
|
|
||
| ``` | ||
| 函數 佔比 說明 | ||
| ─────────────────────────────────────────── | ||
| Huffman 解碼 35.2% 逐位元處理,無法向量化 | ||
| IDCT 28.6% 雖已優化但仍有 Python 開銷 | ||
| Zigzag 反序 15.1% 數組重排 | ||
| YCbCr → RGB 轉換 12.3% 數學運算 | ||
| 其他 8.8% | ||
| ``` | ||
|
|
||
| ### C++ 實現的優化空間 | ||
|
|
||
| #### 已實現的優化 | ||
| - ✅ 32-bit BitStream 緩衝 | ||
| - ✅ 編譯器 `-O3` 優化 | ||
| - ✅ 記憶體池化(減少分配) | ||
|
|
||
| #### 未來可優化方向 | ||
| 1. **SIMD 指令集(AVX2)** | ||
| - IDCT 可使用 AVX2 一次處理 8 個浮點數 | ||
| - 預期提升:4-8× | ||
|
|
||
| 2. **整數運算(Fixed-Point)** | ||
| - 將浮點運算改為整數位移 | ||
| - 預期提升:2-3× | ||
|
|
||
| 3. **多執行緒(OpenMP)** | ||
| - IDCT 和色彩轉換是 Block 獨立的 | ||
| - 預期提升:接近 CPU 核心數 | ||
|
|
||
| 4. **查表法(LUT)** | ||
| - 預先計算 IDCT 係數、YCbCr→RGB 轉換表 | ||
| - 預期提升:1.5-2× | ||
|
|
||
| **理論極限**: | ||
| - 工業級標準 `libjpeg-turbo` 的解碼時間 ~5ms | ||
| - 當前 C++ 實現:67.50ms (lena.jpg) | ||
| - **還有約 13× 的優化空間** | ||
|
|
||
| ## 結論與建議 | ||
|
|
||
| ### 使用建議 | ||
|
|
||
| #### ✅ 推薦使用 C++ 實現 | ||
| - **場景**: 性能敏感應用、大規模圖片處理 | ||
| - **優勢**: 4.4× 性能提升 + 高品質還原(35+ dB) | ||
| - **適用**: 視訊處理、嵌入式系統、即時應用 | ||
|
|
||
| #### ✅ NumPy 實現適用場景 | ||
| - **場景**: 學習、原型開發、理解 JPEG 原理 | ||
| - **優勢**: 代碼清晰、易於修改、與 C++ 品質相當 | ||
| - **限制**: 性能較低,不適合生產環境 | ||
|
|
||
| #### 🚫 生產環境請使用成熟的庫 | ||
| - **推薦**: | ||
| - `libjpeg-turbo` (C/C++) - 工業標準 | ||
| - `PIL/Pillow` (Python) - 功能完整 | ||
| - `opencv-python` (Python) - 整合豐富 | ||
| - **原因**: | ||
| - 完整的 JPEG 格式支援(Progressive, Lossless 等) | ||
| - 經過大量測試和優化 | ||
| - 持續維護和更新 | ||
|
|
||
| ### 專案價值 | ||
|
|
||
| 本專案的主要價值在於: | ||
|
|
||
| 1. **教學示範** | ||
| - 完整實現 JPEG Baseline DCT 解碼流程 | ||
| - 修復了多個常見的實現錯誤 | ||
| - 提供詳細的技術文檔 | ||
|
|
||
| 2. **性能比較研究** | ||
| - 實證 C++ vs Python 的性能差異(4.4×) | ||
| - 分析瓶頸來源和優化方向 | ||
| - 展示 pybind11 的整合實踐 | ||
|
|
||
| 3. **品質驗證** | ||
| - 使用 PSNR 量化評估解碼品質 | ||
| - 證明兩種實現的正確性(35+ dB) | ||
| - 提供可靠的參考實現 | ||
|
|
||
| ## 已知限制 | ||
|
|
||
| ### 支援的 JPEG 格式 | ||
|
|
||
| ✅ **支援**: | ||
| - Baseline DCT (SOF0) | ||
| - 色度子採樣: 4:4:4, 4:2:0, 4:2:2 ✅ 已修復 | ||
| - Huffman 編碼 | ||
| - 標準量化表 | ||
|
|
||
| ❌ **不支援**: | ||
| - Progressive JPEG (漸進式) | ||
| - Lossless JPEG (無損) | ||
| - Arithmetic coding (算術編碼) | ||
| - JPEG 2000 | ||
| - JPEG-LS | ||
|
|
||
| ### 當前性能與工業標準的差距 | ||
|
|
||
| | 實現 | Lena (512×512) | vs libjpeg-turbo | | ||
| |------|----------------|------------------| | ||
| | **本專案 C++** | 67.50 ms | ~13× 慢 | | ||
| | **libjpeg-turbo** | ~5 ms | 基準 | | ||
|
|
||
| **差距原因**: | ||
| - 未使用 SIMD 優化 | ||
| - 浮點運算(而非整數運算) | ||
| - 單執行緒執行 | ||
| - 未使用查表法 | ||
|
|
||
| ## 參考資料 | ||
|
|
||
| - **JPEG 標準**: ITU-T T.81 / ISO/IEC 10918-1 | ||
| - **C++ 實現**: `src/cpp/decoder.cpp` | ||
| - **NumPy 實現**: `python_implementations/numpy_decoder.py` | ||
| - **詳細技術報告**: `doc/report.md` | ||
| - **Benchmark 腳本**: `benchmarks/run_benchmark.py` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我認為標準測試相片用 lena 就足夠