引言

最近一段时间的主要工作内容是开发一个远程控制手机的功能,其中音视频传输的部分是采用WebRTC技术来进行的,而我们的手机都是通过与其直接连接的Agent服务器进行管理,Agent服务是Java写的,现在市面上又没有合适的Java版WebRTC库,所以我就基于Google开源代码,写了一个JNI调用WebRTC Native的库。之前的一篇文章,我主要讲了讲我是怎么编译WebRTC的。这篇文章,我就来分享一下我是怎么在Java中使用WebRTC的,以及我根据业务需要对WebRTC的一些改动。 说实话,在刚开始着手进行这部分工作的时候,真的可谓步履维艰,主要是太久没有写C的代码了,又对WebRTC Native APIs不熟悉,而且WebRTC这个技术用的人也不是很多,文档比较少。所以我当时在进行这部分开发的时候,先是参考Javascript中WebRTC的使用,简单的熟悉了一下Native APIs,此外还参考了NodeJS的实现,遇到了问题就去Google的论坛WebRTC-Discuss,如果上述流程均没找到解决方案,就针对想要实现的功能走读所有相关代码=。=。 整个功能开发完之后,在回过头来看所有写过的代码,感觉这个东西真的并不难,感慨自己当是真的是菜的抠脚^.^。

Native APIs介绍

如果您也要进行和我类似的工作,我觉得最主要的还是要先熟悉整个Native APIs的使用流程,梳理一下,你就会发现整个使用过程其实非常简单,也就八个大步骤。接下来我会先简单介绍这八个主要步骤,然后再针对每一个步骤,详细的介绍我是怎么做的。

Native APIs使用流程: 1. 通过Native APIs创建三个WebRTC工作的线程:Worker Thread,Network Thread,Signaling Thread * 如果您像我一样需要自定义的音频采集模块以及自定义的编解码实现的话,也需要在这一步将其初始化。 2. 创建PeerConnectionFactory,这个工厂是所有后续工作的源头,无论是连接,还是音视频采集都需要由它来创建。 3. 创建PeerConnection,在这个过程中您可以设置连接的一些参数,比如ICE Server用哪个,网络TCP/UDP策略是怎样的。 * 如果您像我一样需要对端口的使用进行一些限制的话,需要指定自定义PortAllocator 4. 创建Audio/VideoSource,创建AudioSource时可以指定一些采集参数,VideoSource需要一个VideoCapturer对象作为参数。 * 如果您想我一样需要自己提供视频图像的话,就要实现一个自定义的VideoCapturer 5. 以上一步创建的Audio/VideoSource作为参数,创建AudioTrackInterface,这个对象代表了Audio/Video的采集过程 6. 创建MediaStreamInterface并将前一步创建的Audio/VideoTrack添加进去,这个对象代表了传输通道 7. 将上一步创建的MediaStream添加到第三步创建的PeerConnection中 8. PeerConnection通过Observer以回调的形式通知使用者,当前的连接状态等。我们需要通过各类回调以及PeerConnection的API,来完成与另一个连接者之间的SDP和ICE Candidate的交换。

这八个步骤中,前两个是Native APIs这里特有的内容,其后的这些步骤基本上和Web中对WebRTC的使用流程相似。我当时就是在这些Native特有的内容上遇到了很多坑,接下来就让我详细的介绍一下我是如何在Java服务中通过Native APIs和其他客户端建立起连接吧。

JNI Vs JNA

大家应该都知道,要想在Java中调用C++的代码,需要使用JNI或者JNA技术,那么它们两个有什么不同呢?在我们这个场景中应该使用哪一个呢?

上图就是JNI的使用方式,从图中可以看到使用步骤非常多,很繁琐。我们先要在Java代码里定义好接口,然后通过工具生成对应的C语言头文件,接着再用C语言实现这些接口并编译成共享库,最终在JVM中Load该库,从而达到调用C语言代码的目的。

而JNA相对来说就简单了许多,我们不需要重写我们的动态链接库文件,而是有直接调用的API,大大简化了我们的工作量。看似JNA好像完胜JNI,这部分工作非JNA莫属了。但是在我的这个场景中,JNA有几个致命的问题,以至于我只能用JNI。 为什么不用JNA 1. JNA只能实现Java访问C函数,而我们在使用PeerConnection相关的APIs时,很多都是以Observer的形式回调的,这就需要C代码回调Java的ObserverWrapper。 2. JNA技术比使用JNI技术调用动态链接库会有些微的性能损失,虽然我不确定这个损失有多大,但是考虑到我们需要从Java传输每帧的图像给C,这个过程我们希望是越快越好。

好了,既然我们已经确定要使用JNI技术了,就让我来介绍一下我具体是怎么做的吧。

代码结构

Java代码结构

 

1. script/build-header-files.sh: 根据我写的Java接口,生成对应C语言头文件的脚本。 #!/usr/bin/env bash ls -l ../path/to/rtc4j/core| grep ^- | awk '{print $9}' | sed 's/.class//g'| sed 's/^/package.name.of.core.&/g'| xargs javah -classpath ../target/classes -d ../../cpp/src/jni/ 2. src/XXX/core/: 这个包下就是这个库的核心部分,主要包含了音频采集器,视频采集器,连接过程中需要用到的各种回调接口,WebRTC核心类的Wrapper: * RTC -> webrtc::PeerConnectionFactoryInterface * PeerConnection -> webrtc::PeerConnectionInterface * DataChannel -> webrtc::DataChannelInterface 3. src/XXX/model/: 定义了核心类中使用到的POJO对象 4. src/XXX/utils/: 实现了不同平台下在Java端加载Shared Lib的过程

C++代码结构

C++这边的代码结构也比较简单,基本上和Java的接口是一一对应的。

 

  1. src/jni/: 由Java接口自动生成出来的C语言头文件,和Java相关的类型工具包
  2. src/media/: 音视频采集相关类,自定义编码相关类
  • 音频部分实现了一个自定义的AudioDeviceModule,在创建PeerConnectionFactory的时候将其注入
  • 视频部分实现了一个自定义的VideoCapturer,在创建VideoSource的时候将其注入
  • H264的视频编解码使用了FFMPEG中提供的libx264以及h264_nvenc(英伟达加速),这部分代码在创建PeerConnectionFactory的时候将其注入
  1. src/rtc/: 各个Java Wrapper接口的实现类
  2. src/rtc/network: 这里面定义了我自己的SocketFactory,通过它达到了限制端口的目的,这部分在创建PeerConnection的时候将其注入

Java代码相对来说都比较简单,就是给Native APIs做个壳儿,C++也有不少代码就是对更下层WebRTC lib的简单封装,这些部分我就一笔带过了,着重来讲一下这里比较难啃的骨头。

在C++中引入需要的库

整个C++项目我是基于CMake搭建的,其中使用到了libwebrtcFFMPEG(用于视频编码),libjpeg-turbo(用于将JavaVideoCapturer中获取的图片转码成YUV), CMake文件如下:

cmake_minimum_required(VERSION 3.8)
   project(rtc)
   set(CMAKE_CXX_STANDARD 11)

   if (APPLE)
       set(CMAKE_CXX_FLAGS "-fno-rtti -pthread") #WebRTC库用到的FLAGS
   elseif (UNIX)
       #除了前两个-fno-rtti -pthread,其他都是FFMPEG需要使用到的FLAGS
       set(CMAKE_CXX_FLAGS "-fno-rtti -pthread -lva -lva-drm -lva-x11 -llzma -lX11 -lz -ldl -ltheoraenc -ltheoradec")
       set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,-Bsymbolic")
   endif()

   include(./CMakeModules/FindFFMPEG.cmake) #引入FFMPEG
   include(./CMakeModules/FindLibJpegTurbo.cmake) #引入Jpeg-Turbo

   if (CMAKE_SYSTEM_NAME MATCHES "Linux") #C++代码中用于区分系统环境使用到属性
       set_property(DIRECTORY APPEND PROPERTY COMPILE_DEFINITIONS WEBRTC_LINUX)
   elseif(CMAKE_SYSTEM_NAME MATCHES "Darwin")
       set_property(DIRECTORY APPEND PROPERTY COMPILE_DEFINITIONS WEBRTC_MAC)
   endif()

   find_package(LibWebRTC REQUIRED) #引入WebRTC
   find_package(JNI REQUIRED) #引入JNI
   include_directories(${Java_INCLUDE_PATH}) #JNI头文件
   include_directories(${Java_INCLUDE_PATH2}) #JNI头文件
   include(${LIBWEBRTC_USE_FILE}) #WebRTC头文件
   include_directories("src")
   include_directories(${CMAKE_CURRENT_BINARY_DIR})
   include_directories(${TURBO_INCLUDE_DIRS}) #Jpeg-Turbo头文件

   file(GLOB_RECURSE SOURCES *.cpp) #需要编译的内容
   file(GLOB_RECURSE HEADERS *.h) #需要编译的内容头文件

   add_library(rtc SHARED ${SOURCES} ${HEADERS}) #编译共享库
   target_include_directories(rtc PRIVATE ${TURBO_INCLUDE_DIRS} ${FFMPEG_INCLUDE_DIRS})
   target_link_libraries(rtc PRIVATE ${TURBO_LIBRARIES} ${FFMPEG_LIBRARIES} ${LIBWEBRTC_LIBRARIES}) #链接共享库

引入这些库的时候也踩了不少坑,尤其是使用FFMPEG的时候,下面简单分享一下。

编译FFMPEG

  1. 在Linux下编译FFMPEG,我主要参考了官方Guide, 但是我们这里需要有一些改动 a. 如果有enable-shared开关一定要打开,官方Guide中都是disable的 b. 编译的时候一定要加上"-fPIC",否则在Linux下链接时会有错误提示。共享对象可能会被不同的进程加载到不同的位置上,如果共享对象中的指令使用了绝对地址、外部模块地址,那么在共享对象被加载时就必须根据相关模块的加载位置对这个地址做调整,也就是修改这些地址,让它在对应进程中能正确访问,而被修改到的段就不能实现多进程共享一份物理内存,它们在每个进程中都必须有一份物理内存的拷贝。fPIC指令就是为了让使用到同一个共享对象的多个进程能尽可能多的共享物理内存,它背后把那些涉及到绝对地址、外部模块地址访问的地方都抽离出来,保证代码段的内容可以多进程相同,实现共享。 /usr/bin/ld: test.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC test.o: could not read symbols: Bad value collect2: ld returned 1 exit status c. 如果您也需要Nvidia的支持的话,请参考官方Guide d. 最后分享一下我最终编译FFMPEG时使用到的命令 PATH="$HOME/bin:$PATH" PKG_CONFIG_PATH="$HOME/ffmpeg_build/lib/pkgconfig" ./configure \ --prefix="$HOME/ffmpeg_build" \ --pkg-config-flags="--static" \ --extra-cflags="-I$HOME/ffmpeg_build/include" \ --extra-ldflags="-L$HOME/ffmpeg_build/lib" \ --extra-libs=-lpthread \ --extra-libs=-lm \ --bindir="$HOME/bin" \ --enable-gpl \ --enable-libfdk_aac \ --enable-libfreetype \ --enable-libmp3lame \ --enable-libopus \ --enable-libvorbis \ --enable-libvpx \ --enable-libx264 \ --enable-libx265 \ --enable-nonfree \ --extra-cflags=-I/usr/local/cuda/include/ \ --extra-ldflags=-L/usr/local/cuda/lib64 \ --enable-shared \ --cc="gcc -m64 -fPIC” \ --enable-nvenc \ --enable-cuda \ --enable-cuvid \ --enable-libnpp
  2. Mac上安装FFMPEG就比较简单粗暴, 一键安装带所有参数的版本 brew install ffmpeg $(brew options ffmpeg | grep -vE '\s' | grep -- '--with-' | tr '\n' ' ')

安装libjpeg-turbo

因为这个库比简单,我就直接下载了别人编译的版本

引入Turbo和FFMPEG

引入这两个库的方式非常类似,这里我就选取比较简单的FindLibJpegTurbo.cmake作为例子,FFMPEG与其相比就是寻找的下层依赖更多罢了。

# Try to find the libjpeg-turbo libraries and headers
   #
   # TURBO_INCLUDE_DIRS
   # TURBO_LIBRARIES
   # TURBO_FOUND

   # Find header files
   FIND_PATH(
       TURBO_INCLUDE_DIRS turbojpeg.h
       /opt/libjpeg-turbo/include/
   )

   FIND_LIBRARY(
       TURBO_LIBRARY
       NAMES libturbojpeg.a
       PATH /opt/libjpeg-turbo/lib64
   )

   FIND_LIBRARY(
       JPEG_LIBRARY
       NAMES libjpeg.a
       PATH /opt/libjpeg-turbo/lib64
   )


   IF (TURBO_LIBRARY)
       SET(TURBO_FOUND TRUE)
   ENDIF ()

   IF (FFMPEG_FOUND AND TURBO_INCLUDE_DIRS)
       SET(TURBO_FOUND TRUE)
       SET(TURBO_LIBRARIES ${TURBO_LIBRARY} ${JPEG_LIBRARY})
       MESSAGE(STATUS "Found Turbo library: ${TURBO_LIBRARIES}, ${TURBO_INCLUDE_DIRS}")
   ELSE (FFMPEG_FOUND AND TURBO_INCLUDE_DIRS)
       MESSAGE(STATUS "Not found Turbo library")
   ENDIF ()

至此,所有准备工作总算是完了,让我们来看看到底是怎么调用Native APIs的吧。

使用Native APIs

创建PeerConnectionFactory

之前介绍Native APIs的时候就提过,WebRTC有三个主要线程来处理各项事务,这里我们先通过API来创建相应的线程,顺便一提说这个WebRTC提供的线程库真的很强大,你甚至可以把它作为一个跨平台的线程库来时候。如果有机会,我以后会专门写一篇文章介绍它的实现。书归正传,在创建线程的时候有一个重点的点就是创建NetworkThread时需要使用CreateWithSocketServer方法

void RTC::InitThreads() {
       signaling_thread = rtc::Thread::Create();
       signaling_thread->SetName("signaling", nullptr);
       RTC_CHECK(signaling_thread->Start()) << "Failed to start thread";
       WEBRTC_LOG("Original socket server used.", INFO);
       worker_thread = rtc::Thread::Create();
       worker_thread->SetName("worker", nullptr);
       RTC_CHECK(worker_thread->Start()) << "Failed to start thread";
       network_thread = rtc::Thread::CreateWithSocketServer();
       network_thread->SetName("network", nullptr);
       RTC_CHECK(network_thread->Start()) << "Failed to start thread";
   }

此外如果您像我一样,有特殊的音频采集需求的话,就需要自己实现一个自己的AudioDeviceModule,这里有一个注意的内容是创建AudioDeviceModule的过程必须在工作线程中进行,而且我们也需要在工作线程中释放该对象

void RTC::Init(jobject audio_capturer, jobject video_capturer) { //初始化PeerConnectionFactory过程
       this->video_capturer = video_capturer;
       InitThreads(); //初始化线程
       audio_device_module = worker_thread->Invoke<rtc::scoped_refptr<webrtc::AudioDeviceModule>>(
               RTC_FROM_HERE,
               rtc::Bind(
                       &RTC::InitJavaAudioDeviceModule,
                       this,
                       audio_capturer)); //在工作线程中初始化AudioDeviceModule
       WEBRTC_LOG("After fake audio device module.", INFO);
       InitFactory();
   }

   //通过Java获取音频数据的AudioDeviceModule,之后会详细讲其具体的实现
   rtc::scoped_refptr<webrtc::AudioDeviceModule> RTC::InitJavaAudioDeviceModule(jobject audio_capturer) {
       RTC_DCHECK(worker_thread.get() == rtc::Thread::Current());
       WEBRTC_LOG("Create fake audio device module.", INFO);
       auto result = new rtc::RefCountedObject<FakeAudioDeviceModule>(
               FakeAudioDeviceModule::CreateJavaCapturerWrapper(audio_capturer),
               FakeAudioDeviceModule::CreateDiscardRenderer(44100));
       WEBRTC_LOG("Create fake audio device module finished.", INFO);
       is_connect_to_audio_card = true;
       return result;
   }

   ...
   //释放AudioDeviceModule的过程
   worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::ReleaseAudioDeviceModule, this));
   ...

   //因为audio_device_module是以rtc::RefCountedObject的形式存储的,它其实是一个计数指针,当该指针的引用数为0时,会自动调用对应实例的析构函数,所以我们在这里只需要将其赋值为nullptr即可
   void RTC::ReleaseAudioDeviceModule() {
       RTC_DCHECK(worker_thread.get() == rtc::Thread::Current());
       audio_device_module = nullptr;
   }

有了三个关键线程和AudioDeviceModule之后,就可以创建PeerConnectionFactory了,我这里因为业务的需要,会有一些端口的限制,我也在这里进行了初始化,我们将在创建PortAllocator的时候使用它。看到这里您可能会有疑惑,为什么视频采集的注入和音频采集的注入不是在同一个地方进行的,那么你不是一个人,我也很疑惑=。=,我甚至觉得SocketFactory也应该丢到PeerConnectionFactory里管理,这样就不用每次创建PeerConnection的时候自己创建一个PortAllocator。

void RTC::InitFactory() {
       //创建带端口和IP限制的SocketFacotry
       socket_factory.reset(
               new rtc::SocketFactoryWrapper(network_thread.get(), this->white_private_ip_prefix, this->min_port,
                                             this->max_port));
       network_manager.reset(new rtc::BasicNetworkManager());
       //这里使用到了我自己实现的视频编码器,这部分我也会在后续进行详细介绍
       peer_connection_factory = webrtc::CreatePeerConnectionFactory(
               network_thread.get(), worker_thread.get(), signaling_thread.get(), audio_device_module,
               webrtc::CreateBuiltinAudioEncoderFactory(), webrtc::CreateBuiltinAudioDecoderFactory(),
               CreateVideoEncoderFactory(hardware_accelerate), CreateVideoDecoderFactory(),
               nullptr, nullptr);
   }

诚然,在创建PeerConnectionFactory的过程中,有许多和我想法不一样的接口设计,我觉得可能是因为我的使用场景并不是常规使用场景,这样WebRTC的接口就显得不是很顺手。总之,PeerConnectionFactory也算是整出来了,整理一下整个过程就是,创建线程->创建音频采集模块->创建EncoderFactory->实例化PeerConnectionFactory。

创建PeerConnection

有了PeerConnectionFactory之后,我们就可以通过它来创建连接了。在这一步,我们需要提供Ice Server的相关信息,而且我在这里使用到了上一步中创建的SocketFactory来创建PortAllocator,从而达到了限制端口的目的。此外我还在这一步中通过调用PeerConnection的API,添加了最大传输速度的限制。

//创建PeerConnection
   PeerConnection *
   RTC::CreatePeerConnection(PeerConnectionObserver *peerConnectionObserver, std::string uri,
                             std::string username, std::string password, int max_bit_rate) {
       //传递Ice Server信息
       webrtc::PeerConnectionInterface::RTCConfiguration configuration;
       webrtc::PeerConnectionInterface::IceServer ice_server;
       ice_server.uri = std::move(uri);
       ice_server.username = std::move(username);
       ice_server.password = std::move(password);
       configuration.servers.push_back(ice_server);
       //禁用TCP协议
       configuration.tcp_candidate_policy = webrtc::PeerConnectionInterface::TcpCandidatePolicy::kTcpCandidatePolicyDisabled;
       //减少音频延迟
       configuration.audio_jitter_buffer_fast_accelerate = true;
       //利用之前创建的SocketFacotry生成PortAllocator达到限制端口的效果
       std::unique_ptr<cricket::PortAllocator> port_allocator(
               new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
       port_allocator->SetPortRange(this->min_port, this->max_port);
       //创建PeerConnection并限制比特率
       return new PeerConnection(peer_connection_factory->CreatePeerConnection(
               configuration, std::move(port_allocator), nullptr, peerConnectionObserver), peerConnectionObserver,
                                 is_connect_to_audio_card, max_bit_rate);
   }

   //调用API限制比特率
   void PeerConnection::ChangeBitrate(int bitrate) {
       auto bit_rate_setting = webrtc::BitrateSettings();
       bit_rate_setting.min_bitrate_bps = 30000;
       bit_rate_setting.max_bitrate_bps = bitrate;
       bit_rate_setting.start_bitrate_bps = bitrate;
       this->peer_connection->SetBitrate(bit_rate_setting);
   }

创建Audio/VideoSource

这一步我们需要使用PeerConnectionFactory的API来创建Audio/VideoSource。在创建AudioSource时,我可以指定一些音频参数,而在创建VideoSource时,我们要指定一个VideoCapturer。值得一提的是,需要在SignallingThread创建VideoCapturer

...
   //创建Audio/VideoSource
   audio_source = rtc->CreateAudioSource(GetAudioOptions());
   video_source = rtc->CreateVideoSource(rtc->CreateFakeVideoCapturerInSignalingThread());
   ...

   //获取默认Audio Configurations
   cricket::AudioOptions PeerConnection::GetAudioOptions() {
       cricket::AudioOptions options;
       options.audio_jitter_buffer_fast_accelerate = absl::optional<bool>(true);
       options.audio_jitter_buffer_max_packets = absl::optional<int>(10);
       options.echo_cancellation = absl::optional<bool>(false);
       options.auto_gain_control = absl::optional<bool>(false);
       options.noise_suppression = absl::optional<bool>(false);
       options.highpass_filter = absl::optional<bool>(false);
       options.stereo_swapping = absl::optional<bool>(false);
       options.typing_detection = absl::optional<bool>(false);
       options.experimental_agc = absl::optional<bool>(false);
       options.extended_filter_aec = absl::optional<bool>(false);
       options.delay_agnostic_aec = absl::optional<bool>(false);
       options.experimental_ns = absl::optional<bool>(false);
       options.residual_echo_detector = absl::optional<bool>(false);
       options.audio_network_adaptor = absl::optional<bool>(true);
       return options;
   }

   //创建AudioSource
   rtc::scoped_refptr<webrtc::AudioSourceInterface> RTC::CreateAudioSource(const cricket::AudioOptions &options) {
       return peer_connection_factory->CreateAudioSource(options);
   }

   //在SignalingThread创建VideoCapturer
   FakeVideoCapturer *RTC::CreateFakeVideoCapturerInSignalingThread() {
       if (video_capturer) {
           return signaling_thread->Invoke<FakeVideoCapturer *>(RTC_FROM_HERE,
                                                                rtc::Bind(&RTC::CreateFakeVideoCapturer, this,
                                                                          video_capturer));
       } else {
           return nullptr;
       }
   }

创建Audio/VideoTrack

这一步相对来说就很简单了,以上一步创建的Source作为参数,加个名字就能创建出Audio/VideoTrack。这个接口同样也是PeerConnectionFactory的。

...
   //创建Audio/VideoTrack
   video_track = rtc->CreateVideoTrack("video_track", video_source.get());
   audio_track = rtc->CreateAudioTrack("audio_track", audio_source);
   ...

   //创建VideoTrack
   rtc::scoped_refptr<webrtc::VideoTrackSourceInterface> RTC::CreateVideoSource(cricket::VideoCapturer *capturer) {
       return peer_connection_factory->CreateVideoSource(capturer);
   }

   //创建AudioTrack
   rtc::scoped_refptr<webrtc::VideoTrackInterface> RTC::CreateVideoTrack(const std::string &label,
                                                                         webrtc::VideoTrackSourceInterface *source) {
       return peer_connection_factory->CreateVideoTrack(label, source);
   }

创建LocalMediaStream

调用PeerConnectionFactory的API创建LocalMediaStream,并将之前的Audio/VideoTrack添加到该Stream中,最后将其添加到PeerConnection中。

...
   //创建LocalMediaStream
   transport_stream = rtc->CreateLocalMediaStream("stream");
   //添加Audio/VideoTrack
   transport_stream->AddTrack(video_track);
   transport_stream->AddTrack(audio_track);
   //添加Stream到PeerConnection
   peer_connection->AddStream(transport_stream);
   ...

创建Data Channel

创建Data Channel的过程相比于前面创建音视频传输的过程就简单多了,调用一个PeerConnection的API就创建出来了,在创建的时候可以指令一些配置项,主要是用来约束该Data Channel的可靠性。需要注意的是,一个Data Channel在客户端这里会有两个对象一个代表本地端,一个代表远端,本地端的DataChannel对象通过CreateDataChannel获得,远端的DataChannel通过PeerConnection的OnDataChannel回调获得。当需要发送数据时,调用DataChannel的Send接口,当远端发送数据过来时,会触发OnMessage的回调函数。

//创建Data Channel
   DataChannel *
   PeerConnection::CreateDataChannel(std::string label, webrtc::DataChannelInit config, DataChannelObserver *observer) {
       rtc::scoped_refptr<webrtc::DataChannelInterface> data_channel = peer_connection->CreateDataChannel(label, &config);
       data_channel->RegisterObserver(observer);
       return new DataChannel(data_channel, observer);
   }

   //可配置内容
   struct DataChannelInit {
     // Deprecated. Reliability is assumed, and channel will be unreliable if
     // maxRetransmitTime or MaxRetransmits is set.
     bool reliable = false;

     // True if ordered delivery is required.
     bool ordered = true;

     // The max period of time in milliseconds in which retransmissions will be
     // sent. After this time, no more retransmissions will be sent. -1 if unset.
     //
     // Cannot be set along with |maxRetransmits|.
     int maxRetransmitTime = -1;

     // The max number of retransmissions. -1 if unset.
     //
     // Cannot be set along with |maxRetransmitTime|.
     int maxRetransmits = -1;

     // This is set by the application and opaque to the WebRTC implementation.
     std::string protocol;

     // True if the channel has been externally negotiated and we do not send an
     // in-band signalling in the form of an "open" message. If this is true, |id|
     // below must be set; otherwise it should be unset and will be negotiated
     // in-band.
     bool negotiated = false;

     // The stream id, or SID, for SCTP data channels. -1 if unset (see above).
     int id = -1;
   };

   //发送数据
   void DataChannel::Send(webrtc::DataBuffer &data_buffer) {
       data_channel->Send(data_buffer);
   }

   // Message received.
   void OnMessage(const webrtc::DataBuffer &buffer) override {
       //C++回调Java时需要将当前线程Attach到一个Java线程上
       JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
       jbyteArray jbyte_array = CHAR_POINTER_2_J_BYTE_ARRAY(env, buffer.data.cdata(),
                                                            static_cast<int>(buffer.data.size()));
       jclass data_buffer = GET_DATA_BUFFER_CLASS();
       jmethodID init_method = env->GetMethodID(data_buffer, "<init>", "([BZ)V");
       jobject data_buffer_object = env->NewObject(data_buffer, init_method,
                                                   jbyte_array,
                                                   buffer.binary);
       jclass observer_class = env->GetObjectClass(java_observer);
       jmethodID java_event_method = env->GetMethodID(observer_class, "onMessage",
                                                      "(Lpackage/name/of/rtc4j/model/DataBuffer;)V");
       //找到对应的回调函数,并执行该函数
       env->CallVoidMethod(java_observer, java_event_method, data_buffer_object);
       //释放相关引用
       env->ReleaseByteArrayElements(jbyte_array, env->GetByteArrayElements(jbyte_array, nullptr), JNI_ABORT);
       env->DeleteLocalRef(data_buffer_object);
       env->DeleteLocalRef(observer_class);
   }

   //Attach c++线程到Java线程
   JNIEnv *ATTACH_CURRENT_THREAD_IF_NEEDED() {
       JNIEnv *jni = GetEnv();
       if (jni)
           return jni;
       JavaVMAttachArgs args;
       args.version = JNI_VERSION_1_8;
       args.group = nullptr;
       args.name = const_cast<char *>("JNI-RTC");
   // Deal with difference in signatures between Oracle's jni.h and Android's.
   #ifdef _JavaSOFT_JNI_H_  // Oracle's jni.h violates the JNI spec!
       void *env = nullptr;
   #else
       JNIEnv* env = nullptr;
   #endif
       RTC_CHECK(!g_java_vm->AttachCurrentThread(&env, &args)) << "Failed to attach thread";
       RTC_CHECK(env) << "AttachCurrentThread handed back NULL!";
       jni = reinterpret_cast<JNIEnv *>(env);
       return jni;
   }

   JNIEnv *GetEnv() {
       void *env = nullptr;
       jint status = g_java_vm->GetEnv(&env, JNI_VERSION_1_8);
       RTC_CHECK(((env != nullptr) && (status == JNI_OK)) ||
                 ((env == nullptr) && (status == JNI_EDETACHED)))
           << "Unexpected GetEnv return: " << status << ":" << env;
       return reinterpret_cast<JNIEnv *>(env);
   }

   //Detach 当前C++线程对应的Java线程
   void DETACH_CURRENT_THREAD_IF_NEEDED() {
       // This function only runs on threads where |g_jni_ptr| is non-NULL, meaning
       // we were responsible for originally attaching the thread, so are responsible
       // for detaching it now.  However, because some JVM implementations (notably
       // Oracle's http://goo.gl/eHApYT) also use the pthread_key_create mechanism,
       // the JVMs accounting info for this thread may already be wiped out by the
       // time this is called. Thus it may appear we are already detached even though
       // it was our responsibility to detach!  Oh well.
       if (!GetEnv())
           return;
       jint status = g_java_vm->DetachCurrentThread();
       RTC_CHECK(status == JNI_OK) << "Failed to detach thread: " << status;
       RTC_CHECK(!GetEnv()) << "Detaching was a successful no-op???";
   }

在这一步中,我引入了一些关于Attach Thread和Detach Thread的相关内容,我觉得有必要进行简单的解释。之前我们提过,在WebRTC中会有三个主要线程,Worker Thread,Network Thread,Signaling Thread,其中WebRTC的回调都是通过Worker Thread来执行的。 而这个Worker Thread是我们用C++代码创建的独立线程,这类线程不像Java调用C++代码那样能简单容易得获取到JNIEnv,举个例子: 比如如下代码:

public class Widget {
   private native void nativeMethod();
   }

他生成的Native头文件里对应的函数声明是这个样子:

JNIEXPORT void JNICALL
   Java_xxxxx_nativeMethod(JNIEnv *env, jobject instance);

我们可以看到,这个函数声明中第一个参数就是JNIEnv,我们可以通过它以类似反射的形式调用Java中的函数代码。而C++中独立创建的线程,是没有JNIEnv与之对应的,对于这些线程,如果你想要在其中调用Java代码,就必须先通过JavaVM::AttachCurrentThread,将其Attach到一个Java线程上去,然后就能获得一个JNIEnv。 需要注意的是对于一个已经绑定到JavaVM上的线程调用AttachCurrentThread不会有任何影响。如果你的线程已经绑定到了JavaVM上,你还可以通过调用JavaVM::GetEnv获取 JNIEnv,如果你的线程没有绑定,这个函数返回JNI_EDETACHED。最后当我们不再需要该线程调用Java代码时,需要调用DetachCurrentThread来释放。

PeerConnection建立连接

从上一步Stream加入到PeerConnection之后,剩下的工作就是如何利用PeerConnection的API和回调函数与其他客户端建立起连接了。这一步中主要涉及的API就是CreateOffer,CreateAnswer,SetLocalDescription, SetRemoteDescription。在调用CreateOffer,CreateAnswer时,我们需要指定当前客户端是否接受另一客户端的Audio/Video,而在我的使用场景中只会出现Java服务器给其他客户端推送音视频数据这种情况,所以我在使用的时候ReceiveAudio/Video均为false。

void PeerConnection::CreateAnswer(jobject java_observer) {
       create_session_observer->SetGlobalJavaObserver(java_observer, "answer");
       auto options = webrtc::PeerConnectionInterface::RTCOfferAnswerOptions();
       options.offer_to_receive_audio = false;
       options.offer_to_receive_video = false;
       peer_connection->CreateAnswer(create_session_observer, options);
   }

   void PeerConnection::CreateOffer(jobject java_observer) {
       create_session_observer->SetGlobalJavaObserver(java_observer, "offer");
       auto options = webrtc::PeerConnectionInterface::RTCOfferAnswerOptions();
       options.offer_to_receive_audio = false;
       options.offer_to_receive_video = false;
       peer_connection->CreateOffer(create_session_observer, options);
   }

   webrtc::SdpParseError PeerConnection::SetLocalDescription(JNIEnv *env, jobject sdp) {
       webrtc::SdpParseError error;
       webrtc::SessionDescriptionInterface *session_description(
               webrtc::CreateSessionDescription(GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("type")),
                                                GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("sdp")), &error));
       peer_connection->SetLocalDescription(set_session_description_observer, session_description);
       return error;
   }

   webrtc::SdpParseError PeerConnection::SetRemoteDescription(JNIEnv *env, jobject sdp) {
       webrtc::SdpParseError error;
       webrtc::SessionDescriptionInterface *session_description(
               webrtc::CreateSessionDescription(GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("type")),
                                                GET_STRING_FROM_OBJECT(env, sdp, const_cast<char *>("sdp")), &error));
       peer_connection->SetRemoteDescription(set_session_description_observer, session_description);
       return error;
   }

在Java端一般来说我都是以如下方式交换SDP:

//添加Stream到PeerConnection之后
   sessionRTCMap.get(headerAccessor.getSessionId()).getPeerConnection().createOffer(sdp -> executor.submit(() -> {
       try {
           sessionRTCMap.get(headerAccessor.getSessionId()).getPeerConnection().setLocalDescription(sdp);
           sendMessage(headerAccessor.getSessionId(), SDP_DESTINATION, sdp);
       } catch (Exception e) {
           log.error("{}", e);
       }
   }));

   //接收到远端传过来的Answer SDP之后
   SessionDescription sessionDescription = JSON.parseObject((String) requestResponse.getData(), SessionDescription.class);
   sessionRTCMap.get(headerAccessor.getSessionId()).getPeerConnection().setRemoteDescription(sessionDescription);

走到这一步,正常来说,整个连接就已经连通了。接下来我会讲一下我是如何释放所有相关资源,作为正常使用场景的完结。这个部分也有不少坑,我当时由于对WebRTC指针管理机制的不熟悉,频繁出现泄露问题和操作非法指针问题,说出来都是泪啊T.T。

释放所有相关资源

我们以Java中的释放过程作为起点,来浏览一下整个资源释放的过程。

public void releaseResource() {
       lock.lock();
       try {
           //
           if (videoDataChannel != null) { //如果有使用DataChannel,先释放远端的DataChannel对象
               videoDataChannel.close();
               videoDataChannel = null;
           }
           log.info("Release remote video data channel");
           if (localVideoDataChannel != null) { //如果有使用DataChannel,接着释放本地的DataChannel对象
               localVideoDataChannel.close();
               localVideoDataChannel = null;
           }
           log.info("Release local video data channel");
           if (peerConnection != null) { //释放PeerConnection对象
               peerConnection.close();
               peerConnection = null;
           }
           log.info("Release peer connection");
           if (rtc != null) { //释放PeerConnectFactory相关对象
               rtc.close();
           }
           log.info("Release rtc");
       } catch (Exception ignored) {
       }finally {
           destroyed = true;
           lock.unlock();
       }
   }

然后是C++的相关释放代码:

DataChannel::~DataChannel() {
       data_channel->UnregisterObserver(); //先解除注册进去的观察者
       delete data_channel_observer; //销毁观察者对象
       data_channel->Close(); //关闭Data Channel
       //rtc::scoped_refptr<webrtc::DataChannelInterface> data_channel; (Created by webrtc::PeerConnectionInterface::CreateDataChannel)
       data_channel = nullptr; //销毁Data Channel对象(计数指针)
   }


   PeerConnection::~PeerConnection() {
       peer_connection->Close(); //关闭PeerConnection
       //rtc::scoped_refptr<webrtc::PeerConnectionInterface> peer_connection; (Created by webrtc::PeerConnectionFactoryInterface::CreatePeerConnection)
       peer_connection = nullptr; //销毁PeerConnection对象(计数指针)
       delete peer_connection_observer; //销毁使用过的观察者
       delete set_session_description_observer; //销毁使用过的观察者
       delete create_session_observer; //销毁使用过的观察者
   }

   RTC::~RTC() {
       //rtc::scoped_refptr<webrtc::PeerConnectionFactoryInterface> peer_connection_factory; (Created by webrtc::CreatePeerConnectionFactory)
       peer_connection_factory = nullptr; //释放PeerConnectionFactory
       WEBRTC_LOG("Destroy peer connection factory", INFO);
       worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::ReleaseAudioDeviceModule, this)); //在Worker Thread释放AudioDeviceModule,因为是在这个线程创建的
       signaling_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach signalling thread
       worker_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach worker thread
       network_thread->Invoke<void>(RTC_FROM_HERE, rtc::Bind(&RTC::DetachCurrentThread, this)); //Detach network thread
       worker_thread->Stop(); //停止线程
       signaling_thread->Stop(); //停止线程
       network_thread->Stop(); //停止线程
       worker_thread.reset(); //销毁线程(计数指针)
       signaling_thread.reset(); //销毁线程(计数指针)
       network_thread.reset(); //销毁线程(计数指针)
       network_manager = nullptr; //销毁Network Manager(计数指针)
       socket_factory = nullptr; //销毁Socket Factory(计数指针)
       WEBRTC_LOG("Stop threads", INFO);
       if (video_capturer) {
           JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
           env->DeleteGlobalRef(video_capturer); //销毁对VideoCapturer的Java对象引用,这个对象是我保存在RTC类下的全局引用env->NewGlobalRef(video_capturer)
           //这里没有销毁AudioCapturer的Java引用是因为我将其引用保存在AudioDeviceModule中了
       }
   }

至此,如果您只会涉及到正常WebRTC使用场景的话,那么我想您已经掌握了如何在Java中调用WebRTC的Native APIs。接下来的部分,是我针对业务场景进行的一些API改动,如果您对这部分也感兴趣,就请听我慢慢道来。

附加内容

从Java采集音频数据

接口介绍

之前在介绍如何创建PeerConnectionFactory时,我们提到了AudioDeviceModule这个接口,WebRTC捕捉音频数据就是通过它来完成的。而我们正是通过实现这个接口,将自定义的音频采集模块注入到WebRTC中的。接下来我们先简单的看一下这个接口都包含什么内容。

// 这里我只留下一些关键的内容
   class AudioDeviceModule : public rtc::RefCountInterface {
    public:

     // 该回调是音频采集的关键,当我们有新的音频数据时,需要将其封装成正确的形式,通过该回调传递音频数据
     // Full-duplex transportation of PCM audio
     virtual int32_t RegisterAudioCallback(AudioTransport* audioCallback) = 0;

     // 列出所有可使用的音频输入输出设备,因为我们要代理整个音频采集(输出)模块,所以这些函数只返回一个设备就行了
     // Device enumeration
     virtual int16_t PlayoutDevices() = 0;
     virtual int16_t RecordingDevices() = 0;
     virtual int32_t PlayoutDeviceName(uint16_t index,
                                       char name[kAdmMaxDeviceNameSize],
                                       char guid[kAdmMaxGuidSize]) = 0;
     virtual int32_t RecordingDeviceName(uint16_t index,
                                         char name[kAdmMaxDeviceNameSize],
                                         char guid[kAdmMaxGuidSize]) = 0;

     // 在需要进行音频采集和音频输出时,上层接口会通过下列函数指定想要使用的设备,因为前面几个函数我们只返回了一个设备,所有上层接口只会使用该设备
     // Device selection
     virtual int32_t SetPlayoutDevice(uint16_t index) = 0;
     virtual int32_t SetPlayoutDevice(WindowsDeviceType device) = 0;
     virtual int32_t SetRecordingDevice(uint16_t index) = 0;
     virtual int32_t SetRecordingDevice(WindowsDeviceType device) = 0;

     // 初始化内容
     // Audio transport initialization
     virtual int32_t PlayoutIsAvailable(bool* available) = 0;
     virtual int32_t InitPlayout() = 0;
     virtual bool PlayoutIsInitialized() const = 0;
     virtual int32_t RecordingIsAvailable(bool* available) = 0;
     virtual int32_t InitRecording() = 0;
     virtual bool RecordingIsInitialized() const = 0;

     // 开始录音/播放的接口
     // Audio transport control
     virtual int32_t StartPlayout() = 0;
     virtual int32_t StopPlayout() = 0;
     virtual bool Playing() const = 0;
     virtual int32_t StartRecording() = 0;
     virtual int32_t StopRecording() = 0;
     virtual bool Recording() const = 0;

     // 后面这部分是音频播放相关,我并没有使用到
     // Audio mixer initialization
     virtual int32_t InitSpeaker() = 0;
     virtual bool SpeakerIsInitialized() const = 0;
     virtual int32_t InitMicrophone() = 0;
     virtual bool MicrophoneIsInitialized() const = 0;

     // Speaker volume controls
     virtual int32_t SpeakerVolumeIsAvailable(bool* available) = 0;
     virtual int32_t SetSpeakerVolume(uint32_t volume) = 0;
     virtual int32_t SpeakerVolume(uint32_t* volume) const = 0;
     virtual int32_t MaxSpeakerVolume(uint32_t* maxVolume) const = 0;
     virtual int32_t MinSpeakerVolume(uint32_t* minVolume) const = 0;

     // Microphone volume controls
     virtual int32_t MicrophoneVolumeIsAvailable(bool* available) = 0;
     virtual int32_t SetMicrophoneVolume(uint32_t volume) = 0;
     virtual int32_t MicrophoneVolume(uint32_t* volume) const = 0;
     virtual int32_t MaxMicrophoneVolume(uint32_t* maxVolume) const = 0;
     virtual int32_t MinMicrophoneVolume(uint32_t* minVolume) const = 0;

     // Speaker mute control
     virtual int32_t SpeakerMuteIsAvailable(bool* available) = 0;
     virtual int32_t SetSpeakerMute(bool enable) = 0;
     virtual int32_t SpeakerMute(bool* enabled) const = 0;

     // Microphone mute control
     virtual int32_t MicrophoneMuteIsAvailable(bool* available) = 0;
     virtual int32_t SetMicrophoneMute(bool enable) = 0;
     virtual int32_t MicrophoneMute(bool* enabled) const = 0;

     // 多声道支持
     // Stereo support
     virtual int32_t StereoPlayoutIsAvailable(bool* available) const = 0;
     virtual int32_t SetStereoPlayout(bool enable) = 0;
     virtual int32_t StereoPlayout(bool* enabled) const = 0;
     virtual int32_t StereoRecordingIsAvailable(bool* available) const = 0;
     virtual int32_t SetStereoRecording(bool enable) = 0;
     virtual int32_t StereoRecording(bool* enabled) const = 0;

     // Playout delay
     virtual int32_t PlayoutDelay(uint16_t* delayMS) const = 0;

   };

实现内容

简单浏览完AudioDeviceModule之后,想必大家应该已经有思路了,我这里因为只涉及到音频采集,所以只实现了其中几个接口。简单的讲,我的思路就是在AudioDeviceModule中创建一个线程,当StartReCording被调用时,该线程开始以某一频率调用Java的相关代码来获取Audio PCM数据,然后以回调的形式上交数据。下面我就来介绍一下我实现的核心内容。

// 首先,我定了一个两个下级接口与Java端接口对应
   class Capturer {
       public:
           virtual bool isJavaWrapper() {
               return false;
           }

           virtual ~Capturer() {}

           // Returns the sampling frequency in Hz of the audio data that this
           // capturer produces.
           virtual int SamplingFrequency() = 0;

           // Replaces the contents of |buffer| with 10ms of captured audio data
           // (see FakeAudioDevice::SamplesPerFrame). Returns true if the capturer can
           // keep producing data, or false when the capture finishes.
           virtual bool Capture(rtc::BufferT<int16_t> *buffer) = 0;
   };

   class Renderer {
       public:
           virtual ~Renderer() {}

           // Returns the sampling frequency in Hz of the audio data that this
           // renderer receives.
           virtual int SamplingFrequency() const = 0;

           // Renders the passed audio data and returns true if the renderer wants
           // to keep receiving data, or false otherwise.
           virtual bool Render(rtc::ArrayView<const int16_t> data) = 0;
   };

   // 这两个下级接口的实现如下
   class JavaAudioCapturerWrapper final : public FakeAudioDeviceModule::Capturer {
       public:

           // 构造函数主要是保存Java音频采集类的全局引用,然后获取到需要的函数
           JavaAudioCapturerWrapper(jobject audio_capturer)
                   : java_audio_capturer(audio_capturer) {
               WEBRTC_LOG("Instance java audio capturer wrapper.", INFO);
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               audio_capture_class = env->GetObjectClass(java_audio_capturer);
               sampling_frequency_method = env->GetMethodID(audio_capture_class, "samplingFrequency", "()I");
               capture_method = env->GetMethodID(audio_capture_class, "capture", "(I)Ljava/nio/ByteBuffer;");
               WEBRTC_LOG("Instance java audio capturer wrapper end.", INFO);
           }

           // 析构函数释放Java引用
           ~JavaAudioCapturerWrapper() {
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               if (audio_capture_class != nullptr) {
                   env->DeleteLocalRef(audio_capture_class);
                   audio_capture_class = nullptr;
               }
               if (java_audio_capturer) {
                   env->DeleteGlobalRef(java_audio_capturer);
                   java_audio_capturer = nullptr;
               }
           }

           bool isJavaWrapper() override {
               return true;
           }

           // 调用Java端函数获取采样率,这里我是调用了一次Java函数之后,就讲该值缓存了起来
           int SamplingFrequency() override {
               if (sampling_frequency_in_hz == 0) {
                   JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
                   this->sampling_frequency_in_hz = env->CallIntMethod(java_audio_capturer, sampling_frequency_method);
               }
               return sampling_frequency_in_hz;
           }

           // 调用Java函数获取PCM数据,这里值得注意的是需要返回16-bit-小端序的PCM数据,
           bool Capture(rtc::BufferT<int16_t> *buffer) override {
               buffer->SetData(
                       FakeAudioDeviceModule::SamplesPerFrame(SamplingFrequency()), // 通过该函数计算data buffer的size
                       [&](rtc::ArrayView<int16_t> data) { // 得到前一个参数设置的指定大小的数据块
                           JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
                           size_t length;
                           jobject audio_data_buffer = env->CallObjectMethod(java_audio_capturer, capture_method,
                                                                             data.size() * 2);// 因为Java端操作的数据类型是Byte,所以这里size * 2
                           void *audio_data_address = env->GetDirectBufferAddress(audio_data_buffer);
                           jlong audio_data_size = env->GetDirectBufferCapacity(audio_data_buffer);
                           length = (size_t) audio_data_size / 2; // int16 等于 2个Byte
                           memcpy(data.data(), audio_data_address, length * 2);
                           env->DeleteLocalRef(audio_data_buffer);
                           return length;
                       });
               return buffer->size() == buffer->capacity();
           }

       private:
           jobject java_audio_capturer;
           jclass audio_capture_class;
           jmethodID sampling_frequency_method;
           jmethodID capture_method;
           int sampling_frequency_in_hz = 0;
   };

   size_t FakeAudioDeviceModule::SamplesPerFrame(int sampling_frequency_in_hz) {
       return rtc::CheckedDivExact(sampling_frequency_in_hz, kFramesPerSecond);
   }

   constexpr int kFrameLengthMs = 10; // 10ms采集一次数据
   constexpr int kFramesPerSecond = 1000 / kFrameLengthMs; //每秒采集的帧数

   // 播放器里其实什么也没干^.^
   class DiscardRenderer final : public FakeAudioDeviceModule::Renderer {
   public:
       explicit DiscardRenderer(int sampling_frequency_in_hz)
               : sampling_frequency_in_hz_(sampling_frequency_in_hz) {}

       int SamplingFrequency() const override {
           return sampling_frequency_in_hz_;
       }

       bool Render(rtc::ArrayView<const int16_t>) override {
           return true;
       }

   private:
       int sampling_frequency_in_hz_;
   };

   // 接下来是AudioDeviceModule的核心实现,我使用WebRTC提供的EventTimerWrapper和跨平台线程库来实现周期性Java采集函数调用
   std::unique_ptr<webrtc::EventTimerWrapper> tick_;
   rtc::PlatformThread thread_;

   // 构造函数
   FakeAudioDeviceModule::FakeAudioDeviceModule(std::unique_ptr<Capturer> capturer,
                                                std::unique_ptr<Renderer> renderer,
                                                float speed)
           : capturer_(std::move(capturer)),
             renderer_(std::move(renderer)),
             speed_(speed),
             audio_callback_(nullptr),
             rendering_(false),
             capturing_(false),
             done_rendering_(true, true),
             done_capturing_(true, true),
             tick_(webrtc::EventTimerWrapper::Create()),
             thread_(FakeAudioDeviceModule::Run, this, "FakeAudioDeviceModule") {
   }

   // 主要是将rendering_置为true
   int32_t FakeAudioDeviceModule::StartPlayout() {
       rtc::CritScope cs(&lock_);
       RTC_CHECK(renderer_);
       rendering_ = true;
       done_rendering_.Reset();
       return 0;
   }

   // 主要是将rendering_置为false
   int32_t FakeAudioDeviceModule::StopPlayout() {
       rtc::CritScope cs(&lock_);
       rendering_ = false;
       done_rendering_.Set();
       return 0;
   }

   // 主要是将capturing_置为true
   int32_t FakeAudioDeviceModule::StartRecording() {
       rtc::CritScope cs(&lock_);
       WEBRTC_LOG("Start audio recording", INFO);
       RTC_CHECK(capturer_);
       capturing_ = true;
       done_capturing_.Reset();
       return 0;
   }

   // 主要是将capturing_置为false
   int32_t FakeAudioDeviceModule::StopRecording() {
       rtc::CritScope cs(&lock_);
       WEBRTC_LOG("Stop audio recording", INFO);
       capturing_ = false;
       done_capturing_.Set();
       return 0;
   }

   // 设置EventTimer的频率,并开启线程
   int32_t FakeAudioDeviceModule::Init() {
       RTC_CHECK(tick_->StartTimer(true, kFrameLengthMs / speed_));
       thread_.Start();
       thread_.SetPriority(rtc::kHighPriority);
       return 0;
   }

   // 保存上层音频采集的回调函数,之后我们会用它上交音频数据
   int32_t FakeAudioDeviceModule::RegisterAudioCallback(webrtc::AudioTransport *callback) {
       rtc::CritScope cs(&lock_);
       RTC_DCHECK(callback || audio_callback_);
       audio_callback_ = callback;
       return 0;
   }

   bool FakeAudioDeviceModule::Run(void *obj) {
       static_cast<FakeAudioDeviceModule *>(obj)->ProcessAudio();
       return true;
   }

   void FakeAudioDeviceModule::ProcessAudio() {
       {
           rtc::CritScope cs(&lock_);
           if (needDetachJvm) {
               WEBRTC_LOG("In audio device module process audio", INFO);
           }
           auto start = std::chrono::steady_clock::now();
           if (capturing_) {
               // Capture 10ms of audio. 2 bytes per sample.
               // 获取音频数据
               const bool keep_capturing = capturer_->Capture(&recording_buffer_);
               uint32_t new_mic_level;
               if (keep_capturing) {
                   // 通过回调函数上交音频数据,这里包括:数据,数据大小,每次采样数据多少byte,声道数,采样率,延时等
                   audio_callback_->RecordedDataIsAvailable(
                           recording_buffer_.data(), recording_buffer_.size(), 2, 1,
                           static_cast<const uint32_t>(capturer_->SamplingFrequency()), 0, 0, 0, false, new_mic_level);
               }
               // 如果没有音频数据了,就停止采集
               if (!keep_capturing) {
                   capturing_ = false;
                   done_capturing_.Set();
               }
           }
           if (rendering_) {
               size_t samples_out;
               int64_t elapsed_time_ms;
               int64_t ntp_time_ms;
               const int sampling_frequency = renderer_->SamplingFrequency();
               // 从上层接口获取音频数据
               audio_callback_->NeedMorePlayData(
                       SamplesPerFrame(sampling_frequency), 2, 1, static_cast<const uint32_t>(sampling_frequency),
                       playout_buffer_.data(), samples_out, &elapsed_time_ms, &ntp_time_ms);
               // 播放音频数据
               const bool keep_rendering = renderer_->Render(
                       rtc::ArrayView<const int16_t>(playout_buffer_.data(), samples_out));
               if (!keep_rendering) {
                   rendering_ = false;
                   done_rendering_.Set();
               }
           }
           auto end = std::chrono::steady_clock::now();
           auto diff = std::chrono::duration<double, std::milli>(end - start).count();
           if (diff > kFrameLengthMs) {
               WEBRTC_LOG("JNI capture audio data timeout, real capture time is " + std::to_string(diff) + " ms", DEBUG);
           }
           // 如果AudioDeviceModule要被销毁了,就Detach Thread
           if (capturer_->isJavaWrapper() && needDetachJvm && !detached2Jvm) {
               DETACH_CURRENT_THREAD_IF_NEEDED();
               detached2Jvm = true;
           } else if (needDetachJvm) {
               detached2Jvm = true;
           }
       }
       // 时间没到就一直等,当够了10ms会触发下一次音频处理过程
       tick_->Wait(WEBRTC_EVENT_INFINITE);
   }

   // 析构函数
   FakeAudioDeviceModule::~FakeAudioDeviceModule() {
       WEBRTC_LOG("In audio device module FakeAudioDeviceModule", INFO);
       StopPlayout(); // 关闭播放
       StopRecording(); // 关闭采集
       needDetachJvm = true; // 触发工作线程的Detach
       while (!detached2Jvm) { // 等待工作线程Detach完毕
       }
       WEBRTC_LOG("In audio device module after detached2Jvm", INFO);
       thread_.Stop();// 关闭线程
       WEBRTC_LOG("In audio device module ~FakeAudioDeviceModule finished", INFO);
   }

顺便一提,在Java端我采用了直接内存来传递音频数据,主要是因为这样减少内存拷贝。

从Java采集视频数据

从Java采集视频数据和采集音频数据的过程十分相似,不过视频采集模块的注入是在创建VideoSource的时候,此外还有一个需要注意的点是,需要在SignallingThread创建VideoCapturer。

...
   video_source = rtc->CreateVideoSource(rtc->CreateFakeVideoCapturerInSignalingThread());
   ...

   FakeVideoCapturer *RTC::CreateFakeVideoCapturerInSignalingThread() {
       if (video_capturer) {
           return signaling_thread->Invoke<FakeVideoCapturer *>(RTC_FROM_HERE,
                                                                rtc::Bind(&RTC::CreateFakeVideoCapturer, this,
                                                                          video_capturer));
       } else {
           return nullptr;
       }
   }

VideoCapturer这个接口中需要我们实现的内容也并不多,关键的就是主循环,开始,关闭,接下来看一下我的实现吧。

// 构造函数
   FakeVideoCapturer::FakeVideoCapturer(jobject video_capturer)
           : running_(false),
             video_capturer(video_capturer),
             is_screen_cast(false),
             ticker(webrtc::EventTimerWrapper::Create()),
             thread(FakeVideoCapturer::Run, this, "FakeVideoCapturer") {
       // 保存会使用到的Java函数
       JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
       video_capture_class = env->GetObjectClass(video_capturer);
       get_width_method = env->GetMethodID(video_capture_class, "getWidth", "()I");
       get_height_method = env->GetMethodID(video_capture_class, "getHeight", "()I");
       get_fps_method = env->GetMethodID(video_capture_class, "getFps", "()I");
       capture_method = env->GetMethodID(video_capture_class, "capture", "()Lpackage/name/of/rtc4j/model/VideoFrame;");
       width = env->CallIntMethod(video_capturer, get_width_method);
       previous_width = width;
       height = env->CallIntMethod(video_capturer, get_height_method);
       previous_height = height;
       fps = env->CallIntMethod(video_capturer, get_fps_method);
       // 设置上交的数据格式YUV420
       static const cricket::VideoFormat formats[] = {
               {width, height, cricket::VideoFormat::FpsToInterval(fps), cricket::FOURCC_I420}
       };
       SetSupportedFormats({&formats[0], &formats[arraysize(formats)]});
       // 根据Java中反馈的FPS设置主循环执行间隔
       RTC_CHECK(ticker->StartTimer(true, rtc::kNumMillisecsPerSec / fps));
       thread.Start();
       thread.SetPriority(rtc::kHighPriority);
       // 因为Java端传输过来的时Jpg图片,所以我这里用libjpeg-turbo进行了解压,转成YUV420
       decompress_handle = tjInitDecompress();
       WEBRTC_LOG("Create fake video capturer, " + std::to_string(width) + ", " + std::to_string(height), INFO);
   }

   // 析构函数
   FakeVideoCapturer::~FakeVideoCapturer() {
       thread.Stop();
       SignalDestroyed(this);
       // 释放Java资源
       JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
       if (video_capture_class != nullptr) {
           env->DeleteLocalRef(video_capture_class);
           video_capture_class = nullptr;
       }
       // 释放解压器
       if (decompress_handle) {
           if (tjDestroy(decompress_handle) != 0) {
               WEBRTC_LOG("Release decompress handle failed, reason is: " + std::string(tjGetErrorStr2(decompress_handle)),
                          ERROR);
           }
       }
       WEBRTC_LOG("Free fake video capturer", INFO);
   }

   bool FakeVideoCapturer::Run(void *obj) {
       static_cast<FakeVideoCapturer *>(obj)->CaptureFrame();
       return true;
   }

   void FakeVideoCapturer::CaptureFrame() {
       {
           rtc::CritScope cs(&lock_);
           if (running_) {
               int64_t t0 = rtc::TimeMicros();
               JNIEnv *env = ATTACH_CURRENT_THREAD_IF_NEEDED();
               // 从Java端获取每一帧的图片,
               jobject java_video_frame = env->CallObjectMethod(video_capturer, capture_method);
               if (java_video_frame == nullptr) { // 如果返回的图片为空,就上交一张纯黑的图片
                   rtc::scoped_refptr<webrtc::I420Buffer> buffer = webrtc::I420Buffer::Create(previous_width,
                                                                                              previous_height);
                   webrtc::I420Buffer::SetBlack(buffer);
                   OnFrame(webrtc::VideoFrame(buffer, (webrtc::VideoRotation) previous_rotation, t0), previous_width,
                           previous_height);
                   return;
               }
               // Java中使用直接内存来传输图片
               jobject java_data_buffer = env->CallObjectMethod(java_video_frame, GET_VIDEO_FRAME_BUFFER_GETTER_METHOD());
               auto data_buffer = (unsigned char *) env->GetDirectBufferAddress(java_data_buffer);
               auto length = (unsigned long) env->CallIntMethod(java_video_frame, GET_VIDEO_FRAME_LENGTH_GETTER_METHOD());
               int rotation = env->CallIntMethod(java_video_frame, GET_VIDEO_FRAME_ROTATION_GETTER_METHOD());
               int width;
               int height;
               // 解压Jpeg头部信息,获取长宽
               tjDecompressHeader(decompress_handle, data_buffer, length, &width, &height);
               previous_width = width;
               previous_height = height;
               previous_rotation = rotation;
               // 以32对齐的方式解压并上交YUV420数据,这里采用32对齐是因为这样编码效率更高,此外mac上的videotoolbox编码要求必须使用32对齐
               rtc::scoped_refptr<webrtc::I420Buffer> buffer =
                       webrtc::I420Buffer::Create(width, height,
                                                  width % 32 == 0 ? width : width / 32 * 32 + 32,
                                                  (width / 2) % 32 == 0 ? (width / 2) : (width / 2) / 32 * 32 + 32,
                                                  (width / 2) % 32 == 0 ? (width / 2) : (width / 2) / 32 * 32 + 32);
               uint8_t *planes[] = {buffer->MutableDataY(), buffer->MutableDataU(), buffer->MutableDataV()};
               int strides[] = {buffer->StrideY(), buffer->StrideU(), buffer->StrideV()};
               tjDecompressToYUVPlanes(decompress_handle, data_buffer, length, planes, width, strides, height,
                                       TJFLAG_FASTDCT | TJFLAG_NOREALLOC);
               env->DeleteLocalRef(java_data_buffer);
               env->DeleteLocalRef(java_video_frame);
               // OnFrame 函数就是将数据递交给WebRTC的接口
               OnFrame(webrtc::VideoFrame(buffer, (webrtc::VideoRotation) rotation, t0), width, height);
           }
       }
       ticker->Wait(WEBRTC_EVENT_INFINITE);
   }

   // 开启
   cricket::CaptureState FakeVideoCapturer::Start(
           const cricket::VideoFormat &format) {
       //SetCaptureFormat(&format); This will cause crash in CentOS
       running_ = true;
       SetCaptureState(cricket::CS_RUNNING);
       WEBRTC_LOG("Start fake video capturing", INFO);
       return cricket::CS_RUNNING;
   }

   // 关闭
   void FakeVideoCapturer::Stop() {
       running_ = false;
       //SetCaptureFormat(nullptr); This will cause crash in CentOS
       SetCaptureState(cricket::CS_STOPPED);
       WEBRTC_LOG("Stop fake video capturing", INFO);
   }

   // YUV420
   bool FakeVideoCapturer::GetPreferredFourccs(std::vector<uint32_t> *fourccs) {
       fourccs->push_back(cricket::FOURCC_I420);
       return true;
   }

   // 调用默认实现
   void FakeVideoCapturer::AddOrUpdateSink(rtc::VideoSinkInterface<webrtc::VideoFrame> *sink,
                                           const rtc::VideoSinkWants &wants) {
       cricket::VideoCapturer::AddOrUpdateSink(sink, wants);
   }

   void FakeVideoCapturer::RemoveSink(rtc::VideoSinkInterface<webrtc::VideoFrame> *sink) {
       cricket::VideoCapturer::RemoveSink(sink);
   }

至此,如何从Java端获取音视频数据的部分就介绍完了,你会发现这个东西其实并不难,我这就算是抛砖引玉吧,大家可以通过我的实现,更快的理解这部分的流程。

限制连接端口

回顾一下之前进行端口限制的完成流程,在创建PeerConnectionFactory的时候,我们实例化了一个SocketFactory和一个默认的NetworkManager,随后在创建PeerConnection的时候,我们通过这两个实例创建了一个PortAllocator,并将这个PortAllocator注入到PeerConnection中。整个流程中,真正做端口限制的代码都在SocketFactory中,当然,也用到了PortAllocator的API。这里你可能会有疑问,PortAllocator中不是有接口可以限制端口范围吗,怎么还需要SocketFactory?

std::unique_ptr<cricket::PortAllocator> port_allocator(
   new cricket::BasicPortAllocator(network_manager.get(), socket_factory.get()));
   port_allocator->SetPortRange(this->min_port, this->max_port); // Port allocator的端口限制API

我当时也是只通过这个API设置了端口,但是我发现它还是会申请限制之外的端口来做一些别的事情,所以最后我直接复写了SocketFactory,将所有非法端口的申请都给禁掉了,此外因为我们的服务器上还有一些不能用的子网IP,我也在SocketFactory中进行了处理,我的实现内容如下:

rtc::AsyncPacketSocket *
   rtc::SocketFactoryWrapper::CreateUdpSocket(const rtc::SocketAddress &local_address, uint16_t min_port,
                                              uint16_t max_port) {
       // 端口非法判断
       if (min_port < this->min_port || max_port > this->max_port) {
           WEBRTC_LOG("Create udp socket cancelled, port out of range, expect port range is:" +
                      std::to_string(this->min_port) + "->" + std::to_string(this->max_port)
                      + "parameter port range is: " + std::to_string(min_port) + "->" + std::to_string(max_port),
                      LogLevel::INFO);
           return nullptr;
       }
       // IP非法判断
       if (!local_address.IsPrivateIP() || local_address.HostAsURIString().find(this->white_private_ip_prefix) == 0) {
           rtc::AsyncPacketSocket *result = BasicPacketSocketFactory::CreateUdpSocket(local_address, min_port, max_port);
           const auto *address = static_cast<const void *>(result);
           std::stringstream ss;
           ss << address;
           WEBRTC_LOG("Create udp socket, min port is:" + std::to_string(min_port) + ", max port is: " +
                      std::to_string(max_port) + ", result is: " + result->GetLocalAddress().ToString() + "->" +
                      result->GetRemoteAddress().ToString() + ", new socket address is: " + ss.str(), LogLevel::INFO);

           return result;
       } else {
           WEBRTC_LOG("Create udp socket cancelled, this ip is not in while list:" + local_address.HostAsURIString(),
                      LogLevel::INFO);
           return nullptr;
       }
   }

自定义视频编码

您可能已经知道了,WebRTC技术默认是使用VP8进行编码的,而普遍的观点是VP8并没有H264好。此外Safari是不支持VP8编码的,所以在与Safari进行通讯的时候WebRTC使用的是OpenH264进行视频编码,而OpenH264效率又没有libx264高,所以我对编码部分的改善主要就集中在: 1. 替换默认编码方案为H264 2. 基于FFmpeg使用libx264进行视频编码,并且当宿主机有较好的GPU时我会使用GPU进行加速(h264_nvenc) 3. 支持运行时修改传输比特率

替换默认编码

替换默认编码方案为H264比较简单,我们只需要复写VideoEncoderFactory的GetSupportedFormats

// Returns a list of supported video formats in order of preference, to use
   // for signaling etc.
   std::vector<webrtc::SdpVideoFormat> GetSupportedFormats() const override {
       return GetAllSupportedFormats();
   }

   // 这里我设置了只支持H264编码,打包模式为NonInterleaved
   std::vector<webrtc::SdpVideoFormat> GetAllSupportedFormats() {
       std::vector<webrtc::SdpVideoFormat> supported_codecs;
       supported_codecs.emplace_back(CreateH264Format(webrtc::H264::kProfileBaseline, webrtc::H264::kLevel3_1, "1"));
       return supported_codecs;
   }

   webrtc::SdpVideoFormat CreateH264Format(webrtc::H264::Profile profile,
                                           webrtc::H264::Level level,
                                           const std::string &packetization_mode) {
       const absl::optional<std::string> profile_string =
               webrtc::H264::ProfileLevelIdToString(webrtc::H264::ProfileLevelId(profile, level));
       RTC_CHECK(profile_string);
       return webrtc::SdpVideoFormat(cricket::kH264CodecName,
                                     {{cricket::kH264FmtpProfileLevelId,        *profile_string},
                                      {cricket::kH264FmtpLevelAsymmetryAllowed, "1"},
                                      {cricket::kH264FmtpPacketizationMode,     packetization_mode}});
   }

实现编码器

然后是基于FFmpeg对VideoEncoder接口的实现,对FFmpeg的使用我主要参考了官方Example。然后简单看看我们需要实现VideoEncoder的什么接口吧:

FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware_accelerate);

   ~FFmpegH264EncoderImpl() override;

   // |max_payload_size| is ignored.
   // The following members of |codec_settings| are used. The rest are ignored.
   // - codecType (must be kVideoCodecH264)
   // - targetBitrate
   // - maxFramerate
   // - width
   // - height
   // 初始化编码器
   int32_t InitEncode(const webrtc::VideoCodec *codec_settings,
                      int32_t number_of_cores,
                      size_t max_payload_size) override;

   // 释放资源
   int32_t Release() override;

   // 当我们编码完成时,通过该回调上交视频帧
   int32_t RegisterEncodeCompleteCallback(
           webrtc::EncodedImageCallback *callback) override;

   // WebRTC自己的码率控制器,它会根据当前网络情况,修改码率
   int32_t SetRateAllocation(const webrtc::VideoBitrateAllocation &bitrate_allocation,
                             uint32_t framerate) override;

   // The result of encoding - an EncodedImage and RTPFragmentationHeader - are
   // passed to the encode complete callback.
   int32_t Encode(const webrtc::VideoFrame &frame,
                  const webrtc::CodecSpecificInfo *codec_specific_info,
                  const std::vector<webrtc::FrameType> *frame_types) override;

在实现这个接口时,参考了WebRTC官方的OpenH264Encoder,需要注意的是WebRTC是能支持Simulcast的,所以这个的编码实例可能会有多个,也就是说一个Stream对应一个编码实例。接下来,我讲逐步讲解我的实现方案,因为这个地方比较复杂。 先介绍一下我这里定义的结构体和成员变量吧:

// 用该结构体保存一个编码实例的所有相关资源
   typedef struct {
       AVCodec *codec = nullptr;        //指向编解码器实例
       AVFrame *frame = nullptr;        //保存解码之后/编码之前的像素数据
       AVCodecContext *context = nullptr;    //编解码器上下文,保存编解码器的一些参数设置
       AVPacket *pkt = nullptr;        //码流包结构,包含编码码流数据
   } CodecCtx;

   // 编码器实例
   std::vector<CodecCtx *> encoders_;
   // 编码器参数
   std::vector<LayerConfig> configurations_;
   // 编码完成后的图片
   std::vector<webrtc::EncodedImage> encoded_images_;
   // 图片缓存部分
   std::vector<std::unique_ptr<uint8_t[]>> encoded_image_buffers_;
   // 编码相关配置
   webrtc::VideoCodec codec_;
   webrtc::H264PacketizationMode packetization_mode_;
   size_t max_payload_size_;
   int32_t number_of_cores_;
   // 编码完成后的回调
   webrtc::EncodedImageCallback *encoded_image_callback_;

构造函数部分比较简单,就是保存打包格式,以及申请空间:

FFmpegH264EncoderImpl::FFmpegH264EncoderImpl(const cricket::VideoCodec &codec, bool hardware)
           : packetization_mode_(webrtc::H264PacketizationMode::SingleNalUnit),
             max_payload_size_(0),
             hardware_accelerate(hardware),
             number_of_cores_(0),
             encoded_image_callback_(nullptr),
             has_reported_init_(false),
             has_reported_error_(false) {
       RTC_CHECK(cricket::CodecNamesEq(codec.name, cricket::kH264CodecName));
       std::string packetization_mode_string;
       if (codec.GetParam(cricket::kH264FmtpPacketizationMode,
                          &packetization_mode_string) &&
           packetization_mode_string == "1") {
           packetization_mode_ = webrtc::H264PacketizationMode::NonInterleaved;
       }
       encoded_images_.reserve(webrtc::kMaxSimulcastStreams);
       encoded_image_buffers_.reserve(webrtc::kMaxSimulcastStreams);
       encoders_.reserve(webrtc::kMaxSimulcastStreams);
       configurations_.reserve(webrtc::kMaxSimulcastStreams);
   }

然后是非常关键得初始化编码器过程,在这里我先是进行了一个检查,然后对每一个Stream创建相应的编码器实例:

int32_t FFmpegH264EncoderImpl::InitEncode(const webrtc::VideoCodec *inst,
                                             int32_t number_of_cores,
                                             size_t max_payload_size) {
       ReportInit();
       if (!inst || inst->codecType != webrtc::kVideoCodecH264) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
       }
       if (inst->maxFramerate == 0) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
       }
       if (inst->width < 1 || inst->height < 1) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;
       }

       int32_t release_ret = Release();
       if (release_ret != WEBRTC_VIDEO_CODEC_OK) {
           ReportError();
           return release_ret;
       }

       int number_of_streams = webrtc::SimulcastUtility::NumberOfSimulcastStreams(*inst);
       bool doing_simulcast = (number_of_streams > 1);

       if (doing_simulcast && (!webrtc::SimulcastUtility::ValidSimulcastResolutions(
               *inst, number_of_streams) ||
                               !webrtc::SimulcastUtility::ValidSimulcastTemporalLayers(
                                       *inst, number_of_streams))) {
           return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
       }
       encoded_images_.resize(static_cast<unsigned long>(number_of_streams));
       encoded_image_buffers_.resize(static_cast<unsigned long>(number_of_streams));
       encoders_.resize(static_cast<unsigned long>(number_of_streams));
       configurations_.resize(static_cast<unsigned long>(number_of_streams));
       for (int i = 0; i < number_of_streams; i++) {
           encoders_[i] = new CodecCtx();
       }
       number_of_cores_ = number_of_cores;
       max_payload_size_ = max_payload_size;
       codec_ = *inst;

       // Code expects simulcastStream resolutions to be correct, make sure they are
       // filled even when there are no simulcast layers.
       if (codec_.numberOfSimulcastStreams == 0) {
           codec_.simulcastStream[0].width = codec_.width;
           codec_.simulcastStream[0].height = codec_.height;
       }

       for (int i = 0, idx = number_of_streams - 1; i < number_of_streams;
            ++i, --idx) {
           // Temporal layers still not supported.
           if (inst->simulcastStream[i].numberOfTemporalLayers > 1) {
               Release();
               return WEBRTC_VIDEO_CODEC_ERR_SIMULCAST_PARAMETERS_NOT_SUPPORTED;
           }


           // Set internal settings from codec_settings
           configurations_[i].simulcast_idx = idx;
           configurations_[i].sending = false;
           configurations_[i].width = codec_.simulcastStream[idx].width;
           configurations_[i].height = codec_.simulcastStream[idx].height;
           configurations_[i].max_frame_rate = static_cast<float>(codec_.maxFramerate);
           configurations_[i].frame_dropping_on = codec_.H264()->frameDroppingOn;
           configurations_[i].key_frame_interval = codec_.H264()->keyFrameInterval;

           // Codec_settings uses kbits/second; encoder uses bits/second.
           configurations_[i].max_bps = codec_.maxBitrate * 1000;
           configurations_[i].target_bps = codec_.startBitrate * 1000;
           if (!OpenEncoder(encoders_[i], configurations_[i])) {
               Release();
               ReportError();
               return WEBRTC_VIDEO_CODEC_ERROR;
           }
           // Initialize encoded image. Default buffer size: size of unencoded data.
           encoded_images_[i]._size =
                   CalcBufferSize(webrtc::VideoType::kI420, codec_.simulcastStream[idx].width,
                                  codec_.simulcastStream[idx].height);
           encoded_images_[i]._buffer = new uint8_t[encoded_images_[i]._size];
           encoded_image_buffers_[i].reset(encoded_images_[i]._buffer);
           encoded_images_[i]._completeFrame = true;
           encoded_images_[i]._encodedWidth = codec_.simulcastStream[idx].width;
           encoded_images_[i]._encodedHeight = codec_.simulcastStream[idx].height;
           encoded_images_[i]._length = 0;
       }

       webrtc::SimulcastRateAllocator init_allocator(codec_);
       webrtc::BitrateAllocation allocation = init_allocator.GetAllocation(
               codec_.startBitrate * 1000, codec_.maxFramerate);
       return SetRateAllocation(allocation, codec_.maxFramerate);
   }

   // OpenEncoder函数是创建编码器的过程,这个函数中有一个隐晦的点是创建AVFrame时一定要记得设置为32内存对齐,这个之前我们在采集图像数据的时候提过
   bool FFmpegH264EncoderImpl::OpenEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx, H264Encoder::LayerConfig &config) {
       int ret;
       /* find the mpeg1 video encoder */
   #ifdef WEBRTC_LINUX
       if (hardware_accelerate) {
           ctx->codec = avcodec_find_encoder_by_name("h264_nvenc");
       }
   #endif
       if (!ctx->codec) {
           ctx->codec = avcodec_find_encoder_by_name("libx264");
       }
       if (!ctx->codec) {
           WEBRTC_LOG("Codec not found", ERROR);
           return false;
       }
       WEBRTC_LOG("Open encoder: " + std::string(ctx->codec->name) + ", and generate frame, packet", INFO);

       ctx->context = avcodec_alloc_context3(ctx->codec);
       if (!ctx->context) {
           WEBRTC_LOG("Could not allocate video codec context", ERROR);
           return false;
       }
       config.target_bps = config.max_bps;
       SetContext(ctx, config, true);
       /* open it */
       ret = avcodec_open2(ctx->context, ctx->codec, nullptr);
       if (ret < 0) {
           WEBRTC_LOG("Could not open codec, error code:" + std::to_string(ret), ERROR);
           avcodec_free_context(&(ctx->context));
           return false;
       }

       ctx->frame = av_frame_alloc();
       if (!ctx->frame) {
           WEBRTC_LOG("Could not allocate video frame", ERROR);
           return false;
       }
       ctx->frame->format = ctx->context->pix_fmt;
       ctx->frame->width = ctx->context->width;
       ctx->frame->height = ctx->context->height;
       ctx->frame->color_range = ctx->context->color_range;
       /* the image can be allocated by any means and av_image_alloc() is
        * just the most convenient way if av_malloc() is to be used */
       ret = av_image_alloc(ctx->frame->data, ctx->frame->linesize, ctx->context->width, ctx->context->height,
                            ctx->context->pix_fmt, 32);
       if (ret < 0) {
           WEBRTC_LOG("Could not allocate raw picture buffer", ERROR);
           return false;
       }
       ctx->frame->pts = 1;
       ctx->pkt = av_packet_alloc();
       return true;
   }

   // 设置FFmpeg编码器的参数
   void FFmpegH264EncoderImpl::SetContext(CodecCtx *ctx, H264Encoder::LayerConfig &config, bool init) {
       if (init) {
           AVRational rational = {1, 25};
           ctx->context->time_base = rational;
           ctx->context->max_b_frames = 0;
           ctx->context->pix_fmt = AV_PIX_FMT_YUV420P;
           ctx->context->codec_type = AVMEDIA_TYPE_VIDEO;
           ctx->context->codec_id = AV_CODEC_ID_H264;
           ctx->context->gop_size = config.key_frame_interval;
           ctx->context->color_range = AVCOL_RANGE_JPEG;
           // 设置两个参数让编码过程更快
           if (std::string(ctx->codec->name) == "libx264") {
               av_opt_set(ctx->context->priv_data, "preset", "ultrafast", 0);
               av_opt_set(ctx->context->priv_data, "tune", "zerolatency", 0);
           }
           av_log_set_level(AV_LOG_ERROR);
           WEBRTC_LOG("Init bitrate: " + std::to_string(config.target_bps), INFO);
       } else {
           WEBRTC_LOG("Change bitrate: " + std::to_string(config.target_bps), INFO);
       }
       config.key_frame_request = true;
       ctx->context->width = config.width;
       ctx->context->height = config.height;

       ctx->context->bit_rate = config.target_bps * 0.7;
       ctx->context->rc_max_rate = config.target_bps * 0.85;
       ctx->context->rc_min_rate = config.target_bps * 0.1;
       ctx->context->rc_buffer_size = config.target_bps * 2; // buffer_size变化,触发libx264的码率编码,如果不设置这个前几条不生效
   #ifdef WEBRTC_LINUX
       if (std::string(ctx->codec->name) == "h264_nvenc") { // 使用类似于Java反射的思想,设置h264_nvenc的码率
           NvencContext* nvenc_ctx = (NvencContext*)ctx->context->priv_data;
           nvenc_ctx->encode_config.rcParams.averageBitRate = ctx->context->bit_rate;
           nvenc_ctx->encode_config.rcParams.maxBitRate = ctx->context->rc_max_rate;
           return;
       }
   #endif
   }

SetContext中的最后几行,主要是关于如何动态设置编码器码率,这些内容应该是整个编码器设置过程中最硬核的部分了,我正是通过这些来实现libx264以及h264_nvenc的运行时码率控制。 讲完了初始化编码器这一大块内容,让我们来放松一下,先看两个简单的接口,一个是编码回调的注册,一个是WebRTC中码率控制模块的注入,前面提过WebRTC会根据网络情况设置编码的码率。

int32_t FFmpegH264EncoderImpl::RegisterEncodeCompleteCallback(
           webrtc::EncodedImageCallback *callback) {
       encoded_image_callback_ = callback;
       return WEBRTC_VIDEO_CODEC_OK;
   }

   int32_t FFmpegH264EncoderImpl::SetRateAllocation(
           const webrtc::BitrateAllocation &bitrate,
           uint32_t new_framerate) {
       if (encoders_.empty())
           return WEBRTC_VIDEO_CODEC_UNINITIALIZED;

       if (new_framerate < 1)
           return WEBRTC_VIDEO_CODEC_ERR_PARAMETER;

       if (bitrate.get_sum_bps() == 0) {
           // Encoder paused, turn off all encoding.
           for (auto &configuration : configurations_)
               configuration.SetStreamState(false);
           return WEBRTC_VIDEO_CODEC_OK;
       }

       // At this point, bitrate allocation should already match codec settings.
       if (codec_.maxBitrate > 0)
           RTC_DCHECK_LE(bitrate.get_sum_kbps(), codec_.maxBitrate);
       RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.minBitrate);
       if (codec_.numberOfSimulcastStreams > 0)
           RTC_DCHECK_GE(bitrate.get_sum_kbps(), codec_.simulcastStream[0].minBitrate);

       codec_.maxFramerate = new_framerate;

       size_t stream_idx = encoders_.size() - 1;
       for (size_t i = 0; i < encoders_.size(); ++i, --stream_idx) {
           // Update layer config.
           configurations_[i].target_bps = bitrate.GetSpatialLayerSum(stream_idx);
           configurations_[i].max_frame_rate = static_cast<float>(new_framerate);

           if (configurations_[i].target_bps) {
               configurations_[i].SetStreamState(true);
               SetContext(encoders_[i], configurations_[i], false);
           } else {
               configurations_[i].SetStreamState(false);
           }
       }

       return WEBRTC_VIDEO_CODEC_OK;
   }

放松完了,让我们来看看最后一块难啃的骨头吧,没错,就是编码过程了,这块看似简单实则有个大坑。

int32_t FFmpegH264EncoderImpl::Encode(const webrtc::VideoFrame &input_frame,
                                         const webrtc::CodecSpecificInfo *codec_specific_info,
                                         const std::vector<webrtc::FrameType> *frame_types) {
       // 先进行一些常规检查
       if (encoders_.empty()) {
           ReportError();
           return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
       }
       if (!encoded_image_callback_) {
           RTC_LOG(LS_WARNING)
               << "InitEncode() has been called, but a callback function "
               << "has not been set with RegisterEncodeCompleteCallback()";
           ReportError();
           return WEBRTC_VIDEO_CODEC_UNINITIALIZED;
       }

       // 获取视频帧
       webrtc::I420BufferInterface *frame_buffer = (webrtc::I420BufferInterface *) input_frame.video_frame_buffer().get();
       // 检查下一帧是否需要关键帧,一般进行码率变化时,会设定下一帧发送关键帧
       bool send_key_frame = false;
       for (auto &configuration : configurations_) {
           if (configuration.key_frame_request && configuration.sending) {
               send_key_frame = true;
               break;
           }
       }
       if (!send_key_frame && frame_types) {
           for (size_t i = 0; i < frame_types->size() && i < configurations_.size();
                ++i) {
               if ((*frame_types)[i] == webrtc::kVideoFrameKey && configurations_[i].sending) {
                   send_key_frame = true;
                   break;
               }
           }
       }

       RTC_DCHECK_EQ(configurations_[0].width, frame_buffer->width());
       RTC_DCHECK_EQ(configurations_[0].height, frame_buffer->height());

       // Encode image for each layer.
       for (size_t i = 0; i < encoders_.size(); ++i) {
           // EncodeFrame input.
           copyFrame(encoders_[i]->frame, frame_buffer);
           if (!configurations_[i].sending) {
               continue;
           }
           if (frame_types != nullptr) {
               // Skip frame?
               if ((*frame_types)[i] == webrtc::kEmptyFrame) {
                   continue;
               }
           }
           // 控制编码器发送关键帧
           if (send_key_frame || encoders_[i]->frame->pts % configurations_[i].key_frame_interval == 0) {
               // API doc says ForceIntraFrame(false) does nothing, but calling this
               // function forces a key frame regardless of the |bIDR| argument's value.
               // (If every frame is a key frame we get lag/delays.)
               encoders_[i]->frame->key_frame = 1;
               encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_I;
               configurations_[i].key_frame_request = false;
           } else {
               encoders_[i]->frame->key_frame = 0;
               encoders_[i]->frame->pict_type = AV_PICTURE_TYPE_P;
           }

           // Encode!编码过程
           int got_output;
           int enc_ret;
           // 给编码器喂图片
           enc_ret = avcodec_send_frame(encoders_[i]->context, encoders_[i]->frame);
           if (enc_ret != 0) {
               WEBRTC_LOG("FFMPEG send frame failed, returned " + std::to_string(enc_ret), ERROR);
               ReportError();
               return WEBRTC_VIDEO_CODEC_ERROR;
           }
           encoders_[i]->frame->pts++;
           while (enc_ret >= 0) {
               // 从编码器接受视频帧
               enc_ret = avcodec_receive_packet(encoders_[i]->context, encoders_[i]->pkt);
               if (enc_ret == AVERROR(EAGAIN) || enc_ret == AVERROR_EOF) {
                   break;
               } else if (enc_ret < 0) {
                   WEBRTC_LOG("FFMPEG receive frame failed, returned " + std::to_string(enc_ret), ERROR);
                   ReportError();
                   return WEBRTC_VIDEO_CODEC_ERROR;
               }

               // 将编码器返回的帧转化为WebRTC需要的帧类型
               encoded_images_[i]._encodedWidth = static_cast<uint32_t>(configurations_[i].width);
               encoded_images_[i]._encodedHeight = static_cast<uint32_t>(configurations_[i].height);
               encoded_images_[i].SetTimestamp(input_frame.timestamp());
               encoded_images_[i].ntp_time_ms_ = input_frame.ntp_time_ms();
               encoded_images_[i].capture_time_ms_ = input_frame.render_time_ms();
               encoded_images_[i].rotation_ = input_frame.rotation();
               encoded_images_[i].content_type_ =
                       (codec_.mode == webrtc::VideoCodecMode::kScreensharing)
                       ? webrtc::VideoContentType::SCREENSHARE
                       : webrtc::VideoContentType::UNSPECIFIED;
               encoded_images_[i].timing_.flags = webrtc::VideoSendTiming::kInvalid;
               encoded_images_[i]._frameType = ConvertToVideoFrameType(encoders_[i]->frame);

               // Split encoded image up into fragments. This also updates
               // |encoded_image_|.
               // 这里就是前面提到的大坑,FFmpeg编码出来的视频帧每个NALU之间可能以0001作为头,也会出现以001作为头的情况
               // 而WebRTC只识别以0001作为头的NALU
               // 所以我接下来要处理一下编码器输出的视频帧,并生成一个RTC报文的头部来描述该帧的数据
               webrtc::RTPFragmentationHeader frag_header;
               RtpFragmentize(&encoded_images_[i], &encoded_image_buffers_[i], *frame_buffer, encoders_[i]->pkt,
                              &frag_header);
               av_packet_unref(encoders_[i]->pkt);
               // Encoder can skip frames to save bandwidth in which case
               // |encoded_images_[i]._length| == 0.
               if (encoded_images_[i]._length > 0) {
                   // Parse QP.
                   h264_bitstream_parser_.ParseBitstream(encoded_images_[i]._buffer,
                                                         encoded_images_[i]._length);
                   h264_bitstream_parser_.GetLastSliceQp(&encoded_images_[i].qp_);

                   // Deliver encoded image.
                   webrtc::CodecSpecificInfo codec_specific;
                   codec_specific.codecType = webrtc::kVideoCodecH264;
                   codec_specific.codecSpecific.H264.packetization_mode =
                           packetization_mode_;
                   codec_specific.codecSpecific.H264.simulcast_idx = static_cast<uint8_t>(configurations_[i].simulcast_idx);
                   encoded_image_callback_->OnEncodedImage(encoded_images_[i],
                                                           &codec_specific, &frag_header);
               }
           }
       }

       return WEBRTC_VIDEO_CODEC_OK;
   }

下面就是进行NAL转换以及提取RTP头部信息的过程:

// Helper method used by FFmpegH264EncoderImpl::Encode.
   // Copies the encoded bytes from |info| to |encoded_image| and updates the
   // fragmentation information of |frag_header|. The |encoded_image->_buffer| may
   // be deleted and reallocated if a bigger buffer is required.
   //
   // After OpenH264 encoding, the encoded bytes are stored in |info| spread out
   // over a number of layers and "NAL units". Each NAL unit is a fragment starting
   // with the four-byte start code {0,0,0,1}. All of this data (including the
   // start codes) is copied to the |encoded_image->_buffer| and the |frag_header|
   // is updated to point to each fragment, with offsets and lengths set as to
   // exclude the start codes.
   void FFmpegH264EncoderImpl::RtpFragmentize(webrtc::EncodedImage *encoded_image,
                                              std::unique_ptr<uint8_t[]> *encoded_image_buffer,
                                              const webrtc::VideoFrameBuffer &frame_buffer, AVPacket *packet,
                                              webrtc::RTPFragmentationHeader *frag_header) {
       std::list<int> data_start_index;
       std::list<int> data_length;
       int payload_length = 0;
       // 以001 或者 0001 作为开头的情况下,遍历出所有的NAL并记录NALU数据开始的下标和NALU数据长度
       for (int i = 2; i < packet->size; i++) {
           if (i > 2
               && packet->data[i - 3] == start_code[0]
               && packet->data[i - 2] == start_code[1]
               && packet->data[i - 1] == start_code[2]
               && packet->data[i] == start_code[3]) {
               if (!data_start_index.empty()) {
                   data_length.push_back((i - 3 - data_start_index.back()));
               }
               data_start_index.push_back(i + 1);
           } else if (packet->data[i - 2] == start_code[1] &&
                      packet->data[i - 1] == start_code[2] &&
                      packet->data[i] == start_code[3]) {
               if (!data_start_index.empty()) {
                   data_length.push_back((i - 2 - data_start_index.back()));
               }
               data_start_index.push_back(i + 1);
           }
       }
       if (!data_start_index.empty()) {
           data_length.push_back((packet->size - data_start_index.back()));
       }

       for (auto &it : data_length) {
           payload_length += +it;
       }
       // Calculate minimum buffer size required to hold encoded data.
       auto required_size = payload_length + data_start_index.size() * 4;
       if (encoded_image->_size < required_size) {
           // Increase buffer size. Allocate enough to hold an unencoded image, this
           // should be more than enough to hold any encoded data of future frames of
           // the same size (avoiding possible future reallocation due to variations in
           // required size).
           encoded_image->_size = CalcBufferSize(
                   webrtc::VideoType::kI420, frame_buffer.width(), frame_buffer.height());
           if (encoded_image->_size < required_size) {
               // Encoded data > unencoded data. Allocate required bytes.
               WEBRTC_LOG("Encoding produced more bytes than the original image data! Original bytes: " +
                          std::to_string(encoded_image->_size) + ", encoded bytes: " + std::to_string(required_size) + ".",
                          WARNING);
               encoded_image->_size = required_size;
           }
           encoded_image->_buffer = new uint8_t[encoded_image->_size];
           encoded_image_buffer->reset(encoded_image->_buffer);
       }
       // Iterate layers and NAL units, note each NAL unit as a fragment and copy
       // the data to |encoded_image->_buffer|.
       int index = 0;
       encoded_image->_length = 0;
       frag_header->VerifyAndAllocateFragmentationHeader(data_start_index.size());
       for (auto it_start = data_start_index.begin(), it_length = data_length.begin();
            it_start != data_start_index.end(); ++it_start, ++it_length, ++index) {
           memcpy(encoded_image->_buffer + encoded_image->_length, start_code, sizeof(start_code));
           encoded_image->_length += sizeof(start_code);
           frag_header->fragmentationOffset[index] = encoded_image->_length;
           memcpy(encoded_image->_buffer + encoded_image->_length, packet->data + *it_start,
                  static_cast<size_t>(*it_length));
           encoded_image->_length += *it_length;
           frag_header->fragmentationLength[index] = static_cast<size_t>(*it_length);
       }
   }

最后,是非常简单的编码器释放的过程:

int32_t FFmpegH264EncoderImpl::Release() {
       while (!encoders_.empty()) {
           CodecCtx *encoder = encoders_.back();
           CloseEncoder(encoder);
           encoders_.pop_back();
       }
       configurations_.clear();
       encoded_images_.clear();
       encoded_image_buffers_.clear();
       return WEBRTC_VIDEO_CODEC_OK;
   }

   void FFmpegH264EncoderImpl::CloseEncoder(FFmpegH264EncoderImpl::CodecCtx *ctx) {
       if (ctx) {
           if (ctx->context) {
               avcodec_close(ctx->context);
               avcodec_free_context(&(ctx->context));
           }
           if (ctx->frame) {
               av_frame_free(&(ctx->frame));
           }
           if (ctx->pkt) {
               av_packet_free(&(ctx->pkt));
           }
           WEBRTC_LOG("Close encoder context and release context, frame, packet", INFO);
           delete ctx;
       }
   }

至此,我对WebRTC的使用经历就已经介绍完了,希望我的经验能帮到大家。能坚持看完的童鞋,我真的觉得很不容易,我都一度觉得这篇文章写的太冗长,涉及的内容太多了。但是,因为各个部分的内容环环相扣,拆开来描述又怕思路会断。所以是以一条常规使用流程为主,中间依次引入一些我的改动内容,最后以附加项的形式详细介绍我对WebRTC Native APIs的改动。 而且,我也是近期才开始写文章来分享经验,可能比较词穷描述的不是很到位,希望大家海涵。如果哪位童鞋发现我有什么说的不对的地方,希望能留言告诉我,我会尽可能地及时作出处理的。

Github

目前,我已经将本文描述的内容放在了 Github 中,其中包括一个简单的 Demo。

https://github.com/BeiKeJieDeLiuLangMao/WebRTC

参考内容

[1]http://www.cnblogs.com/lanxuezaipiao/p/3635556.html

[2]https://www.cnblogs.com/cswuyg/p/3830703.html

[3]http://blog.guorongfei.com/2017/01/24/android-jni-tips-md/

[4]https://github.com/FFmpeg/FFmpeg


 

 

 

    • 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
    • 创作声明: 本文基于上述所有参考内容进行创作,其中可能涉及复制、修改或者转换,图片均来自网络,如有侵权请联系我,我会第一时间进行删除。
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐