FFmpeg 入门(3)：播放音频

时间 2019-11-12

标签 ffmpeg 入门播放音频繁體版

原文原文链接

本文转自：FFmpeg 入门(3)：播放音频 | www.samirchen.comgit

音频

SDL 提供了播放音频的方法。SDL_OpenAudio 函数用来让设备播放音频，它须要咱们传入一个包含了全部咱们输出须要的音频信息的 SDL_AudioSpec 结构体数据。github

在展现接下来的代码以前，咱们先说说 PC 上是如何处理音频的。数字音频包含了一长串「音频采样(sample)」，每个采样表明着一个音频波形的值。声音是在必定的「音频采样率(sample rate)」下被录制下来的，音频采样率即每秒音频采样的数量，表示的是播放音频速度。常见的音频采样率是 22500 和 44100，分别用于广播和 CD。此外，大部分音频还能够用更多的通道来实现立体声和环绕声等效果，好比立体声会一次来 2 个音频采样。这样当咱们从媒体文件中获取数据时，咱们不知道咱们会得到多少音频采样，而 FFmpeg 也不会只给咱们部分采样，也就是说，它不会对立体声的多通道采样进行分割。缓存

SDL 的音频播放的实现大体是这样的：建立 SDL_AudioSpec 结构体，设置你的音频播放数据，包括：采样率(freq)、音频格式(format)、通道数(channels)、采样大小(samples)、回调函数(callback)和用户数据(userdata)等。当开始播放音频时，SDL 会持续调用这个回调方法来填充固定数量的字节到音频缓冲区。而后咱们调用 SDL_OpenAudio() 函数，传入这个 SDL_AudioSpec 结构体数据，这时它会打开音频设备并给咱们返回一个另外的 SDL_AudioSpec，这个才是咱们真正使用的 SDL_AudioSpec，这个跟咱们传入的可能有不一样。安全

建立音频

简单介绍了音频相关的知识后，咱们接下来开始看代码：像处理视频流同样，咱们从媒体文件中获取音频流。数据结构

// Find the first video stream.
videoStream = -1;
audioStream = -1;
for (i = 0; i < pFormatCtx->nb_streams; i++) {
    if (pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO && videoStream < 0) {
        videoStream = i;
    }
    if (pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO && audioStream < 0) {
        audioStream = i;
    }
}
if (videoStream == -1) {
    return -1; // Didn't find a video stream.
}
if (audioStream == -1) {
    return -1; // Didn't find a audio stream.
}

接着咱们能够从 AVStream 的中 AVCodecContext 结构体中获取咱们想要的全部信息。拿到 AVCodecContext 后，咱们就能够用这些信息来建立音频了：ide

AVCodecContext *aCodecCtx = NULL;
AVCodec *aCodec = NULL;
SDL_AudioSpec wanted_spec, spec;

aCodecCtx = pFormatCtx->streams[audioStream]->codec;
aCodec = avcodec_find_decoder(aCodecCtx->codec_id);
if (!aCodec) {
    fprintf(stderr, "Unsupported codec!\n");
    return -1;
}

// Set audio settings from codec info.
wanted_spec.freq = aCodecCtx->sample_rate;
wanted_spec.format = AUDIO_S16SYS;
wanted_spec.channels = aCodecCtx->channels;
wanted_spec.silence = 0;
wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;
wanted_spec.callback = audio_callback;
wanted_spec.userdata = aCodecCtx;

if (SDL_OpenAudio(&wanted_spec, &spec) < 0) {
    fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
    return -1;
}

avcodec_open2(aCodecCtx, aCodec, &audioOptionsDict);

对于 SDL_AudioSpec 的成员，咱们这里说明一下：函数

freq: 采样率。
format: 这个参数用来告诉 SDL 音频的格式。S16SYS 中的 S 表示 signed，16 表示每一个采样占 16 bits。SYS 表示大小端和系统保持一致。这个格式是 avcodec_decode_audio4() 会给咱们的。
channels: 音频通道数。
silence: 是否静音。
samples: 音频缓存的大小。推荐值为 512~8192，ffplay 使用的是 1024。
callback: 回调函数。
userdata: 回调函数带的用户数据。

最后咱们经过 SDL_OpenAudio() 来打开音频。ui

队列

如今咱们能够从码流里拉取音频数据了，可是咱们接下来改如何处理这些数据呢？咱们持续从媒体文件中获取数据包(packet)，与此同时，SDL 也会持续调用回调方法。一种解决方案时，建立一块全局的存储区，让咱们能不断把数据放进去，让 SDL 能不断经过 audio_callback 从里面把数据取出来做进一步处理。因此接下来咱们将建立一个数据包的队列(packet queue)。事实上，FFmpeg 已经提供了相应的数据结构 AVPacketList，这是一个 packet 链表。基于此，咱们定义了咱们的 PacketQueue：this

typedef struct PacketQueue {
    AVPacketList *first_pkt, *last_pkt;
    int nb_packets;
    int size;
    SDL_mutex *mutex;
    SDL_cond *cond;
} PacketQueue;

须要指出的是 nb_packets 跟 size 不是一回事，size 是从 packet->size 得到的字节数。你能够看到咱们这里还有 SDL_mutex 和 SDL_cond 成员，这是由于 SDL 是在一个独立的线程里面来处理音频，因此咱们须要有互斥机制来保证对队列数据的正确操做。线程

下面是队列初始化的代码：

void packet_queue_init(PacketQueue *q) {
    memset(q, 0, sizeof(PacketQueue));
    q->mutex = SDL_CreateMutex();
    q->cond = SDL_CreateCond();
}

下面是向队列添加数据的代码：

int packet_queue_put(PacketQueue *q, AVPacket *pkt) {
    AVPacketList *pkt1;
    if (av_packet_ref(pkt, pkt) < 0) {
        return -1;
    }
    pkt1 = av_malloc(sizeof(AVPacketList));
    if (!pkt1) {
        return -1;
    }
    pkt1->pkt = *pkt;
    pkt1->next = NULL;
    
    
    SDL_LockMutex(q->mutex);
    
    if (!q->last_pkt) {
        q->first_pkt = pkt1;
    }
    else {
        q->last_pkt->next = pkt1;
    }
    q->last_pkt = pkt1;
    q->nb_packets++;
    q->size += pkt1->pkt.size;
    SDL_CondSignal(q->cond);
    
    SDL_UnlockMutex(q->mutex);
    return 0;
}

SDL_LockMutex() 经过锁住 mutex 来让咱们能安全的想队列里写入数据。SDL_CondSignal() 则在添加完数据后发出信号告诉数据消费方准备获取数据进行下一步处理，同时 SDL_UnlockMutex() 解锁 mutex 让消费方能正常获取数据。

下面是对应的从队列取数据的代码：

int quit = 0;
static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block) {
    AVPacketList *pkt1;
    int ret;
    
    SDL_LockMutex(q->mutex);
    
    for (;;) {
        if (quit) {
            ret = -1;
            break;
        }
        
        pkt1 = q->first_pkt;
        if (pkt1) {
            q->first_pkt = pkt1->next;
            if (!q->first_pkt) {
                q->last_pkt = NULL;
            }
            q->nb_packets--;
            q->size -= pkt1->pkt.size;
            *pkt = pkt1->pkt;
            av_free(pkt1);
            ret = 1;
            break;
        } else if (!block) {
            ret = 0;
            break;
        } else {
            SDL_CondWait(q->cond, q->mutex);
        }
    }
    SDL_UnlockMutex(q->mutex);
    return ret;
}

在上面代码中，咱们实现了一个 for 循环，当这个 for 循环遇到阻塞了那就说明确定是获得一组数据了。咱们经过 SDL 的 SDL_CondWait() 函数来避免永远循环。基本上全部的 SDL_CondWait() 都会等待 SDL_CondSignal() 或者 SDL_CondBroadcast() 的信号，而后继续。看起好像咱们已经在 mutex 这死锁了，由于若是咱们不开锁 packet_queue_put() 函数就没法向队列里写数据，但事实上 SDL_CondWait() 函数是会在合适的时候解开咱们传给它的锁的，并在收到信号时再尝试去锁上。

程序退出

代码中咱们有一个全局变量 quit，这个变量是为了当咱们在界面上点了退出后，能告诉线程退出。

SDL_PollEvent(&event);
switch (event.type) {
    case SDL_QUIT:
        quit = 1;
        SDL_Quit();
        exit(0);
        break;
    default:
        break;
}

填充队列数据

接着要作的就是建立咱们的队列：

PacketQueue audioq;
int main(int argc, char *argv[]) {
    // ... code ...

    avcodec_open2(aCodecCtx, aCodec, &audioOptionsDict);

    // audio_st = pFormatCtx->streams[index].
    packet_queue_init(&audioq);
    SDL_PauseAudio(0);
    
    // ... code ...
}

SDL_PauseAudio() 最终开启了音频设备，若是这时候没有得到数据，那么它就静音。

一旦当咱们的队列创建起来，咱们就能够开始往里面填充数据包了。下面就是咱们用于填数据的循环：

while (av_read_frame(pFormatCtx, &packet) >= 0) {
    // Is this a packet from the video stream?
    if (packet.stream_index == videoStream) {
        // Decode video frame.
        
        // ... code ...

    } else if (packet.stream_index == audioStream) {
        packet_queue_put(&audioq, &packet);
    } else {
        // Free the packet that was allocated by av_read_frame.
        av_packet_unref(&packet);
    }

    // ... code ...

}

注意，咱们没有在把音频数据包 packet 放入队列后就当即释放它，咱们会在解码它以后才释放。

获取队列数据

这里咱们开始实现咱们获取和处理队列数据的回调函数，这个回调函数必须遵循这样的形式：void callback(void *userdata, Uint8 *stream, int len)，其中 userdata 是咱们给 SDL 的指针，stream 是咱们写入音频数据的缓冲区，len 是缓冲区的大小。

void audio_callback(void *userdata, Uint8 *stream, int len) {
    AVCodecContext *aCodecCtx = (AVCodecContext *)userdata;
    int len1, audio_size;
    
    static uint8_t audio_buf[(MAX_AUDIO_FRAME_SIZE * 3) / 2];
    static unsigned int audio_buf_size = 0;
    static unsigned int audio_buf_index = 0;
    
    while (len > 0) {
        if (audio_buf_index >= audio_buf_size) {
            // We have already sent all our data; get more.
            audio_size = audio_decode_frame(aCodecCtx, audio_buf, audio_buf_size);
            if (audio_size < 0) {
                // If error, output silence.
                audio_buf_size = 1024; // arbitrary?
                memset(audio_buf, 0, audio_buf_size);
            } else {
                audio_buf_size = audio_size;
            }
            audio_buf_index = 0;
        }
        len1 = audio_buf_size - audio_buf_index;
        if (len1 > len) {
            len1 = len;
        }
        memcpy(stream, (uint8_t *) audio_buf + audio_buf_index, len1);
        len -= len1;
        stream += len1;
        audio_buf_index += len1;
    }
}

这里实现了一个简单的循环来拉取数据，audio_decode_frame() 会存储解码结果在一个临时缓冲区，这个缓冲区的数据会流向 stream。audio_buf 这个缓冲区的大小是 FFmpeg 给咱们的最大音频帧大小的 1.5 倍，从而起到一个很好的弹性的做用。

音频解码

音频解码的代码都在 audio_decode_frame() 函数中实现：

int audio_decode_frame(AVCodecContext *aCodecCtx, uint8_t *audio_buf, int buf_size) {
    static AVPacket pkt;
    static uint8_t *audio_pkt_data = NULL;
    static int audio_pkt_size = 0;
    static AVFrame frame;
    
    int len1, data_size = 0;
    
    for (;;) {
        while(audio_pkt_size > 0) {
            int got_frame = 0;
            len1 = avcodec_decode_audio4(aCodecCtx, &frame, &got_frame, &pkt);
            if (len1 < 0) {
                // if error, skip frame.
                audio_pkt_size = 0;
                break;
            }
            audio_pkt_data += len1;
            audio_pkt_size -= len1;
            if (got_frame) {
                data_size = av_samples_get_buffer_size(NULL, aCodecCtx->channels, frame.nb_samples, aCodecCtx->sample_fmt, 1);
                memcpy(audio_buf, frame.data[0], data_size);
            }
            if (data_size <= 0) {
                // No data yet, get more frames.
                continue;
            }
            // We have data, return it and come back for more later.
            return data_size;
        }
        if (pkt.data) {
            av_packet_unref(&pkt);
        }
        
        if (quit) {
            return -1;
        }
        
        if (packet_queue_get(&audioq, &pkt, 1) < 0) {
            return -1;
        }
        audio_pkt_data = pkt.data;
        audio_pkt_size = pkt.size;
    }
}

咱们从最后面开始看这整段代码的处理逻辑，咱们调用 packet_queue_get() 函数从队列中取数据包并存下来，接着一旦咱们有了数据包就调用 avcodec_decode_audio4() 函数来进行解码。在一些状况下，一个数据包 packet 可能有超过 1 个帧，那样就须要屡次调用这段处理逻辑来从数据包里取得全部数据。一旦咱们取得了一个帧，咱们就把它拷贝的音频缓冲区。这里须要注意的是数据类型转换，由于 SDL 给咱们的是 8 bit 的整型缓冲区，可是 FFmpeg 给咱们的数据是 16 bit 的整型缓冲区。另外，还须要搞清楚 len1 和 data_size 的区别，len1 是一个 packet 中已被咱们使用的字节数，data_size 是返回的原生数据的大小。

当咱们获取到一些数据后就当即返回来看看是否须要从队列中得到更多的数据或者已经能够足够。若是一个 packet 中还有更多的数据须要继续处理，咱们就将这些数据保存一会；若是已经处理完一个 packet 中的全部数据，那咱们就释放这个 packet 的内存。

略做总结，主线程的数据读取的循环负责从媒体文件中读取数据并写入到队列，咱们从队列中取出数据给 audio_callback 回调函数处理，回调函数会将数据交给 SDL，SDL 将数据交给声卡来播放出声音。

以上即是咱们这节教程的所有内容，其中的完整代码你能够从这里得到：https://github.com/samirchen/TestFFmpeg

编译执行

你可使用下面的命令编译它：

$ gcc -o tutorial03 tutorial03.c -lavutil -lavformat -lavcodec -lswscale -lz -lm `sdl-config --cflags --libs`

找一个视频文件，你能够这样执行一下试试：

$ tutorial03 myvideofile.mp4