Skip to main content
 首页 » 编程设计

python之Audiosegment 对象和波形文件/数据之间的转换

2024年08月15日18mfrbuaa

我正在从 mp3 语音文件中提取 MFCC 功能,但我确实希望保持源文件不变并且不添加任何新文件。我的处理包括以下步骤:

  • 使用pydub加载.mp3文件,消除静音,生成.wav数据
  • 使用scipy.io.wavfile.read()读取音频数据和速率
  • 使用 python_speech_features 提取特征

但是,eliminate_silence() 返回一个 AudioSegment 对象,而 scipy.io.wavfile.read() 接受一个 .wav 文件名,所以我不得不暂时将数据保存/导出为 wave 以确保两者之间的转换。此步骤耗费内存和时间,所以我的问题是:如何避免导出波形文件步骤?或者有解决方法吗?

这是我的代码。

import os 
from pydub import AudioSegment 
from scipy.io.wavfile import read 
from sklearn import preprocessing 
from python_speech_features import mfcc 
from pydub.silence import split_on_silence 
 
def eliminate_silence(input_path): 
    """ Eliminate silent chunks from original call recording """ 
    # Import input wave file 
    sound  = AudioSegment.from_mp3(input_path) 
    chunks = split_on_silence(sound, 
                              # split on silences longer than 1000ms (1 sec) 
                              min_silence_len=500, 
                              # anything under -16 dBFS is considered silence 
                              silence_thresh=-30, 
                              # keep 200 ms of leading/trailing silence 
                              keep_silence=100) 
 
    output_chunks = AudioSegment.empty() 
    for chunk in chunks: output_chunks += chunk 
    return output_chunks 
 
 
silence_clear_data = eliminate_silence("file.mp3") 
silence_clear_data.export("temp.wav", format="wav") 
rate, audio = read("temp.wav") 
os.remove("temp.wav") 
 
# Extract MFCCs 
mfcc_feature = mfcc(audio, rate, winlen = 0.025, winstep = 0.01, numcep = 15, 
                    nfilt = 35, nfft = 512, appendEnergy = True) 
mfcc_feature = preprocessing.scale(mfcc_feature) 

请您参考如下方法:

我目前正在做一个项目,在这个项目中我使用静音和 mfcc 系数进行音频剪切,我留下我的解决方案:

import pydub 
import python_speech_features as p 
import numpy as np 
 
def generate_mfcc_without_silences(path): 
    #get audio and change frame rate to 16KHz 
    audio_file = pydub.AudioSegment.from_wav(path) 
    audio_file = audio_file.set_frame_rate(16000) 
    #cut audio using silences 
    chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=audio_file.dBFS, min_silence_len=200) 
    mfccs = [] 
    for chunk in chunks: 
        #compute mfcc from chunk array 
        np_chunk = np.frombuffer(chunk.get_array_of_samples(), dtype=np.int16) 
        mfccs.append(p.mfcc(np_chunk, samplerate=audio_file.frame_rate, numcep=26)) 
    return mfccs 

注意事项:

·我将音频改为16KHz,但它是可选的

·我将 min_silence_len 的值设置为 200,因为我想尝试获取单个单词

使用我的功能内容和你的要求,你需要的功能可能是:

import pydub 
import python_speech_features as p 
import numpy as np 
from sklearn import preprocessing 
 
def mfcc_from_audio_without_silences(path): 
    audio_file  = pydub.AudioSegment.from_mp3(input_path) 
    chunks = pydub.silence.split_on_silence(audio_file, silence_thresh=-30, min_silence_len=500, keep_silence=100) 
 
    output_chunks = pydub.AudioSegment.empty() 
    for chunk in chunks: 
        output_chunks += chunk 
 
    output_chunks = np.frombuffer(output_chunks.get_array_of_samples(), dtype=np.int16) 
    mfcc_feature = p.mfcc(output_chunks, samplerate=audio_file.frame_rate, numcep=15, nfilt = 35) 
    return preprocessing.scale(mfcc_feature)