Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions funasr/utils/load_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,10 +200,14 @@ def extract_fbank(data, data_len=None, data_type: str = "sound", frontend=None,
data = torch.from_numpy(data)
if len(data.shape) < 2:
data = data[None, :] # data: [batch, N]
elif data.shape[0] > 1:
data = data.mean(dim=0, keepdim=True) # convert stereo/multi-channel to mono
Comment on lines +203 to +204
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此處新增的處理邏輯與下方 torch.Tensor 分支中的代碼完全重複。建議重構此部分,在將 np.ndarray 轉換為 torch.Tensor 後,統一進行維度檢查與多聲道轉單聲道的處理,以減少代碼冗餘並提高可維護性。

data_len = [data.shape[1]] if data_len is None else data_len
elif isinstance(data, torch.Tensor):
if len(data.shape) < 2:
data = data[None, :] # data: [batch, N]
elif data.shape[0] > 1:
data = data.mean(dim=0, keepdim=True) # convert stereo/multi-channel to mono
Comment on lines +209 to +210
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

此處邏輯與上方重複。此外,建議檢查 isinstance(data, (list, tuple)) 的分支(第 212 行起)。如果傳入的是包含多聲道音檔的列表,目前的修改並未涵蓋該情況,這會導致在第 218 行計算 data_len 時取得錯誤的數值(聲道數而非時間長度)。建議在處理列表元素的循環中也加入相同的 downmix 邏輯以確保功能完整性。

data_len = [data.shape[1]] if data_len is None else data_len
elif isinstance(data, (list, tuple)):
data_list, data_len = [], []
Expand Down