我正在从 mp3 语音文件中提取 MFCC 功能,但我确实希望保持源文件不变并且不添加任何新文件。我的处理包括以下步骤:
- 使用
pydub
加载.mp3文件,消除静音,生成.wav数据 - 使用
scipy.io.wavfile.read()
读取音频数据和速率 - 使用
python_speech_features
提取特征
但是,eliminate_silence()
返回一个 AudioSegment
对象,而 scipy.io.wavfile.read()
接受一个 .wav
文件名,所以我不得不暂时将数据保存/导出为 wave 以确保两者之间的转换。此步骤耗费内存和时间,所以我的问题是:如何避免导出波形文件步骤?或者有解决方法吗?
这是我的代码。
import os
from pydub import AudioSegment
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc
from pydub.silence import split_on_silence
def eliminate_silence(input_path):
""" Eliminate silent chunks from original call recording """
# Import input wave file
sound = AudioSegment.from_mp3(input_path)
chunks = split_on_silence(sound,
# split on silences longer than 1000ms (1 sec)
min_silence_len=500,
# anything under -16 dBFS is considered silence
silence_thresh=-30,
# keep 200 ms of leading/trailing silence
keep_silence=100)
output_chunks = AudioSegment.empty()
for chunk in chunks: output_chunks += chunk
return output_chunks
silence_clear_data = eliminate_silence("file.mp3")
silence_clear_data.export("temp.wav", format="wav")
rate, audio = read("temp.wav")
os.remove("temp.wav")
# Extract MFCCs
mfcc_feature = mfcc(audio, rate, winlen = 0.025, winstep = 0.01, numcep = 15,
nfilt = 35, nfft = 512, appendEnergy = True)
mfcc_feature = preprocessing.scale(mfcc_feature)
请您参考如下方法:
我目前正在做一个项目,在这个项目中我使用静音和 mfcc 系数进行音频剪切,我留下我的解决方案:
import pydub
import python_speech_features as p
import numpy as np
def generate_mfcc_without_silences(path):
#get audio and change frame rate to 16KHz
audio_file = pydub.AudioSegment.from_wav(path)
audio_file = audio_file.set_frame_rate(16000)
#cut audio using silences
chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=audio_file.dBFS, min_silence_len=200)
mfccs = []
for chunk in chunks:
#compute mfcc from chunk array
np_chunk = np.frombuffer(chunk.get_array_of_samples(), dtype=np.int16)
mfccs.append(p.mfcc(np_chunk, samplerate=audio_file.frame_rate, numcep=26))
return mfccs
注意事项:
·我将音频改为16KHz,但它是可选的
·我将 min_silence_len 的值设置为 200,因为我想尝试获取单个单词
使用我的功能内容和你的要求,你需要的功能可能是:
import pydub
import python_speech_features as p
import numpy as np
from sklearn import preprocessing
def mfcc_from_audio_without_silences(path):
audio_file = pydub.AudioSegment.from_mp3(input_path)
chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=-30, min_silence_len=500, keep_silence=100)
output_chunks = pydub.AudioSegment.empty()
for chunk in chunks:
output_chunks += chunk
output_chunks = np.frombuffer(output_chunks.get_array_of_samples(), dtype=np.int16)
mfcc_feature = p.mfcc(output_chunks, samplerate=audio_file.frame_rate, numcep=15, nfilt = 35)
return preprocessing.scale(mfcc_feature)