朋友有一个需求,将视频文件转化为音频文件、音频文件获取音频转化为文本文件。
思路:通过ffmpeg转化视频为音频,通过百度ai提供的voice_t_text接口提取语音文本,但是需要将音频分割成1分钟内的pcm编码 ,采样率16000的小文件。关键过程如下:
配置
1、将ffmpeg, ffplay, ffprobe拷贝到文件夹下,设置环境变量。
2、安装ffmpeg-python
pip install ffmpeg-python
#调用
import ffmpeg
3、这个库的本质还是调用以上3个工具的命令行执行,如下获取视频或者音频的信息,返回Json,源代码如下:
def probe(filename, cmd='ffprobe', **kwargs):"""Run ffprobe on the specified file and return a JSON representation of the output.Raises::class:`ffmpeg.Error`: if ffprobe returns a non-zero exit code,an :class:`Error` is returned with a generic error message.The stderr output can be retrieved by accessing the``stderr`` property of the exception."""args = [cmd, '-show_format', '-show_streams', '-of', 'json']args += convert_kwargs_to_cmd_line_args(kwargs)args += [filename]p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)out, err = p.communicate()if p.returncode != 0:raise Error('ffprobe', out, err)return json.loads(out.decode('utf-8'))
等价于在命令行里执行命令:
ffprobe -v quiet -print_format json -show_format -show_streams xdhyxl.mp3
{"streams": [{"index": 0,"codec_name": "mp3","codec_long_name": "MP3 (MPEG audio layer 3)","codec_type": "audio","codec_tag_string": "[0][0][0][0]","codec_tag": "0x0000","sample_fmt": "fltp","sample_rate": "44100","channels": 2,"channel_layout": "stereo","bits_per_sample": 0,"initial_padding": 0,"r_frame_rate": "0/0","avg_frame_rate": "0/0","time_base": "1/14112000","start_pts": 353600,"start_time": "0.025057","duration_ts": 37154488320,"duration": "2632.829388","bit_rate": "128000","disposition": {"default": 0,"dub": 0,"original": 0,"comment": 0,"lyrics": 0,"karaoke": 0,"forced": 0,"hearing_impaired": 0,"visual_impaired": 0,"clean_effects": 0,"attached_pic": 0,"timed_thumbnails": 0,"non_diegetic": 0,"captions": 0,"descriptions": 0,"metadata": 0,"dependent": 0,"still_image": 0,"multilayer": 0},"tags": {"encoder": "Lavf"}}],"format": {"filename": "xdhyxl.mp3","nb_streams": 1,"nb_programs": 0,"nb_stream_groups": 0,"format_name": "mp3","format_long_name": "MP2/3 (MPEG audio layer 2/3)","start_time": "0.025057","duration": "2632.829388","size": "42125732","bit_rate": "128001","probe_score": 51,"tags": {"encoder": "Lavf57.83.100"}}
}
参数说明
1、音频参数说明
-aframes number (output)
Set the number of audio frames to output. This is an obsolete alias for -frames:a, which you should use instead.-ar[:stream_specifier] freq (input/output,per-stream)
Set the audio sampling frequency. For output streams it is set by default to the frequency of the corresponding input stream. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.-aq q (output)
Set the audio quality (codec-specific, VBR). This is an alias for -q:a.-ac[:stream_specifier] channels (input/output,per-stream)
Set the number of audio channels. For output streams it is set by default to the number of input audio channels. For input streams this option only makes sense for audio grabbing devices and raw demuxers and is mapped to the corresponding demuxer options.-an (input/output)
As an input option, blocks all audio streams of a file from being filtered or being automatically selected or mapped for any output. See -discard option to disable streams individually.As an output option, disables audio recording i.e. automatic selection or mapping of any audio stream. For full manual control see the -map option.-acodec codec (input/output)
Set the audio codec. This is an alias for -codec:a.-sample_fmt[:stream_specifier] sample_fmt (output,per-stream)
Set the audio sample format. Use -sample_fmts to get a list of supported sample formats.-af filtergraph (output)
Create the filtergraph specified by filtergraph and use it to filter the stream.This is an alias for -filter:a, see the -filter option.
2、视频参数说明
-vframes number (output)
Set the number of video frames to output. This is an obsolete alias for -frames:v, which you should use instead.-r[:stream_specifier] fps (input/output,per-stream)
Set frame rate (Hz value, fraction or abbreviation).As an input option, ignore any timestamps stored in the file and instead generate timestamps assuming constant frame rate fps. This is not the same as the -framerate option used for some input formats like image2 or v4l2 (it used to be the same in older versions of FFmpeg). If in doubt use -framerate instead of the input option -r.As an output option:video encoding
Duplicate or drop frames right before encoding them to achieve constant output frame rate fps.video streamcopy
Indicate to the muxer that fps is the stream frame rate. No data is dropped or duplicated in this case. This may produce invalid files if fps does not match the actual stream frame rate as determined by packet timestamps. See also the setts bitstream filter.-fpsmax[:stream_specifier] fps (output,per-stream)
Set maximum frame rate (Hz value, fraction or abbreviation).Clamps output frame rate when output framerate is auto-set and is higher than this value. Useful in batch processing or when input framerate is wrongly detected as very high. It cannot be set together with -r. It is ignored during streamcopy.-s[:stream_specifier] size (input/output,per-stream)
Set frame size.As an input option, this is a shortcut for the video_size private option, recognized by some demuxers for which the frame size is either not stored in the file or is configurable – e.g. raw video or video grabbers.As an output option, this inserts the scale video filter to the end of the corresponding filtergraph. Please use the scale filter directly to insert it at the beginning or some other place.The format is ‘wxh’ (default - same as source).-aspect[:stream_specifier] aspect (output,per-stream)
Set the video display aspect ratio specified by aspect.aspect can be a floating point number string, or a string of the form num:den, where num and den are the numerator and denominator of the aspect ratio. For example "4:3", "16:9", "1.3333", and "1.7777" are valid argument values.If used together with -vcodec copy, it will affect the aspect ratio stored at container level, but not the aspect ratio stored in encoded frames, if it exists.-display_rotation[:stream_specifier] rotation (input,per-stream)
Set video rotation metadata.rotation is a decimal number specifying the amount in degree by which the video should be rotated counter-clockwise before being displayed.This option overrides the rotation/display transform metadata stored in the file, if any. When the video is being transcoded (rather than copied) and -autorotate is enabled, the video will be rotated at the filtering stage. Otherwise, the metadata will be written into the output file if the muxer supports it.If the -display_hflip and/or -display_vflip options are given, they are applied after the rotation specified by this option.-display_hflip[:stream_specifier] (input,per-stream)
Set whether on display the image should be horizontally flipped.See the -display_rotation option for more details.-display_vflip[:stream_specifier] (input,per-stream)
Set whether on display the image should be vertically flipped.See the -display_rotation option for more details.-vn (input/output)
As an input option, blocks all video streams of a file from being filtered or being automatically selected or mapped for any output. See -discard option to disable streams individually.As an output option, disables video recording i.e. automatic selection or mapping of any video stream. For full manual control see the -map option.-vcodec codec (output)
Set the video codec. This is an alias for -codec:v.-pass[:stream_specifier] n (output,per-stream)
Select the pass number (1 or 2). It is used to do two-pass video encoding. The statistics of the video are recorded in the first pass into a log file (see also the option -passlogfile), and in the second pass that log file is used to generate the video at the exact requested bitrate. On pass 1, you may just deactivate audio and set output to null, examples for Windows and Unix:ffmpeg -i foo.mov -c:v libxvid -pass 1 -an -f rawvideo -y NUL
ffmpeg -i foo.mov -c:v libxvid -pass 1 -an -f rawvideo -y /dev/null
-passlogfile[:stream_specifier] prefix (output,per-stream)
Set two-pass log file name prefix to prefix, the default file name prefix is “ffmpeg2pass”. The complete file name will be PREFIX-N.log, where N is a number specific to the output stream-vf filtergraph (output)
Create the filtergraph specified by filtergraph and use it to filter the stream.This is an alias for -filter:v, see the -filter option.-autorotate
Automatically rotate the video according to file metadata. Enabled by default, use -noautorotate to disable it.-autoscale
Automatically scale the video according to the resolution of first frame. Enabled by default, use -noautoscale to disable it. When autoscale is disabled, all output frames of filter graph might not be in the same resolution and may be inadequate for some encoder/muxer. Therefore, it is not recommended to disable it unless you really know what you are doing. Disable autoscale at your own risk.
3、命令
获取多媒体元数据信息:
ffprobe -v quiet -print_format json -show_format -show_streams xd.mp3
简化为:
ffprobe -show_format -show_streams -of json xd.mp3audio转化:
ffmpeg -i xd.mp3 -ss 00:00:00 -t 00:00:60 -acodec pcm_s16le -ar 16000 output.wav
实例
ffmpeg -i in_file -codec:a pcm_s16le -ac 1 -ar 16000 out_file -loglevel quiet
命令 | 含义 |
---|---|
-i filemname.fmt | 后面跟设置输入文件名filemname.fmt |
-f fmt | 强制格式,设置输出格式为fmt |
-c/-codec codec | 编解码器名称codec(wav格式对应pcm_s16le,signed 16 bits little endian, 有符号 16 位小端) |
-ar samplerate | 设置音频采样率(Hz) |
-ac channels | 设置音频通道数,比如-ac 1为单通道 |
-acodec copy | 指定音频编码,若用参数copy是直接复制相应的流 |
异常
ffmpeg-python不能够正常输出一些参数,ffmpy3是可以正常使用的
pip install ffmpy3
#正常
ff = ffmpy3.FFmpeg(# executable的值为ffmpeg的路径,配置了环境变量可以不写inputs={fls: '-y'},outputs={pcmpath: '-acodec pcm_s16le -f s16le -ac 1 -ar 16000'}
#命令行中可以执行,但是脚本中报错,实际执行失败#这里加上Pcm参数执行失败,需要用os.system或者subprocessstream=ffmpeg.input(fls)#stream=ffmpeg.output(stream,pcmpath,ac=1,ar=16000)stream = ffmpeg.output(stream, pcmpath, ac=1, ar=16000)ffmpeg.run(stream)'''命令行中可以运行成功,但是在这里运行失败'''command = ['ffmpeg','-i', fls, # 输入音频文件'-f','s161e','-acodec', 'pcm_s16le', # 音频编码为PCM 16位小端'-ac', '1', # 单声道'-ar', '16000', # 采样率为16000Hzpcmpath # 输出文件路径]# 运行FFmpeg命令subprocess.run(command)