Multi-rate HEVC with x265 for adaptive HTTP streaming

Version 2.0 of x265 has recently been tagged. It has been more than two years of work since version 1.0. Since then, HEVC has become more visible with wide adoption on devices and in software. From a streaming perspective, adaptive HTTP streaming is now the most common streaming technology to watch live or on-demand content on the web, for example with the DASH standard.

Remember, adaptive HTTP streaming requires a video to be encoded at different representations, that is, different qualities, which is generally achieved by encoding the same video at different (spatial) resolutions and different signal qualities. Depending on the encoder and on the encoding mode, the signal quality can be tuned by varying the quantization parameter (QP), or varying the target bitrate when using rate-control. In the case of x265, the so-called constant rate factor (CRF) can also be used to tune the quality of the encoded video.

Let’s consider the task of encoding a video at four different representations having the same resolution. For example, the JCT-VC organization (which developed HEVC) proposes to use four different QPs: 22, 27, 32, and 37. If you encode these four representations independently, you can imagine that there is some redundancy in the four encoding processes.

Here comes the idea of multi-rate encoding. The multiple encoder processes are considered to be part of a multi-rate system, where you can pass encoding information from a reference encoding to dependent encodings.

multi_rate_system

The target is to decrease the overall computational complexity, while keeping a good rate-distortion performance.

The most computationally complex part of HEVC encoding is the rate-distortion-optimization (RDO). During the RDO, different encoding options are tested, and the option leading to the best rate-distortion performance is chosen. One of the novelties of HEVC compared to previous video coding standards is the use of a block structure based on a quadtree representation. In the most general case, the RDO process needs to traverse each quadtree (in HEVC terms: each coding tree unit (CTU)) in order to find the block structure (in HEVC terms: the coding unit (CU) structure) which leads to the best rate-distortion performance.

Using the CU structure from a high-quality reference encoding to constrain the RDO of lower-quality encodings is an idea that has been presented at the IEEE International Conference on Image Processing in September 2015. The experimental results are based on the reference HEVC software HM and show an overall encoding time decrease of 27% on average (for 5 representations), while the average bitrate for a given signal quality increases by 0.5% on average. However, the HM encoder is not optimized and extremely slow and thus not suitable for practical use.

The same idea has now been implemented in x265. Some results for a set of 13 videos at 1080p are presented below.

Video set

BasketballDrive, BQTerrace, Cactus, Kimono and ParkScene are test sequences from the JCT-VC.

BlueSky, CrowdRun, DucksTakeOff, ParkJoy, PedestrianArea, Riverbed, RushHour and Sunflower can be found on Xiph.org.

50 frames of each sequence are encoded for each representation.

Measurements are done on an Ubuntu server 14.04.3 with an Intel Core 2 Q9550 @ 2.83 GHz and 8 GB RAM. The gcc 4.8.4 compiler is used.

Fixed QP

The four representations are encoded at QP 22, 27, 32, and 37. Mostly default behavior of x265 is used. I-frames are inserted every 32 frames in order for the stream to be segmentable in the time domain for adaptive HTTP streaming. The number of frames per seconds (fps) depends on the source video.

./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 22 --mr-mode 1
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 27 --mr-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 32 --mr-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 37 --mr-mode 2

Results:

Sequence BD-rate (%) BD-PSNR (dB) Δ t (%)
basketdrive 0,3751 -0,0053 -4,4107
blue_sky 0,8274 -0,0334 -4,353
bqterrace 0,1737 -0,0023 -3,0566
cactus 1,1764 -0,0216 -6,6547
crowd_run 0,86 -0,0328 -1,4298
ducks -0,1204 0,0029 -9,3309
kimono 0,1031 -0,0035 -12,9815
park_joy 0,6391 -0,0251 -5,5624
parkscene 0,9783 -0,0278 -5,1624
pedestrian 0,9272 -0,0274 -8,6382
riverbed -0,2589 0,0101 -19,8336
rush 0,5663 -0,0143 -8,0541
sunflower 0,1036 -0,0052 -7,0116
average 0,48853077 -0,01428462 -7,4215

The encoding time gains are smaller than the time gains achieved with HM. This is due to the fact that HM is not optimized for encoding time. The results for the Cactus sequence can be visualized as an example:

By default, x265 uses the RDO level 3. For RDO levels between 0 and 4, x265 does not code CUs larger than the largest block in the colocated CTUs in lists L0 and L1. Cf. analysis.cpp:

uint32_t minDepth = topSkipMinDepth(parentCTU, cuGeom);

This means that the quadtree checking is already fastened, and the proposed multi-rate method only brings a minor speed improvement.

If we now use the RDO level 6:

./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 22 --rd 6 --mr-mode 1
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 27 --rd 6 --mr-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 32 --rd 6 --mr-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 37 --rd 6 --mr-mode 2

we get:

Sequence BD-rate (%) BD-PSNR (dB) Δ t (%)
basketdrive -0,1167 0,0013 -9,5058
blue_sky 0,6558 -0,026 -6,2989
bqterrace -0,137 0,0037 -2,1377
cactus 0,5147 -0,0103 -4,2888
crowd_run 0,1634 -0,0063 -1,6749
ducks -0,6057 0,0144 -8,233
kimono -0,103 0,0046 -19,1494
park_joy 0,1151 -0,0044 -5,0353
parkscene 0,5046 -0,0145 -4,0862
pedestrian 0,2213 -0,0066 -19,2613
riverbed -1,1385 0,0447 -29,3163
rush -0,2624 0,0058 -13,6358
sunflower 0,1633 -0,004 -9,1275
average -0,00193077 0,00018462 -10,1346846

Here, the overall encoding complexity for four representations is reduced by 10% on average. Interestingly, the rate-distortion performance is not degraded on average (even slightly improved!). Let’s have a look at the Riverbed sequence as an example:

Constant rate factor

Now, let’s try the constant rate factor mode of x265.

./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --crf 22 --rd 6 --mr-mode 1
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --crf 27 --rd 6 --mr-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --crf 32 --rd 6 --mr-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --crf 37 --rd 6 --mr-mode 2

We get

Sequence BD-rate (%) BD-PSNR (dB) Δ t (%)
basketdrive 1,5922 -0,0457 -19,8388
blue_sky 0,6163 -0,0264 -14,9098
bqterrace -0,3327 0,0072 -13,1692
cactus 0,9295 -0,0269 -15,6393
crowd_run 0,6918 -0,0196 -7,7275
ducks 0,0153 -0,00074 -23,425
kimono 1,24 -0,0381 -24,5241
park_joy 0,9364 -0,0244 -13,7816
parkscene 0,9252 -0,0251 -9,266
pedestrian 2,1326 -0,0723 -25,8937
riverbed -0,0755 0,0031 -39,4619
rush 1,7297 -0,0465 -20,1121
sunflower 1,8804 -0,0613 -19,871
average 0,94470769 -0,02898 -19,0476923

The average encoding time is reduced by almost 20% for four representations, but the average bitrate increases by 0.94%. As an example, let’s have a look at the Cactus sequence:

Analysis Mode

x265 provides an option called analysis-mode which is comparable to the proposed multi-rate method, as it outputs analysis info from an encoding into an external file, which can be loaded in a later encoding to speed up the RDO. However, this option is not targeted at different qualities. In fact, if we use this analysis-mode option on a setting with different QPs,

./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 22 --analysis-mode 1
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 27 --analysis-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 32 --analysis-mode 2
./x265 --input-res 1920x1080 --fps ? video.yuv -f 50 -o bitstream.bin --keyint 32 --psnr --qp 37 --analysis-mode 2

we get

Sequence BD-rate (%) BD-PSNR (dB) Δ t (%)
basketdrive 11,9971 -0,2265 -23,7769
blue_sky 8,3242 -0,2892 -28,3889
bqterrace 8,7864 -0,1721 -22,5094
cactus 8,2874 -0,1612 -23,2014
crowd_run 3,6859 -0,1402 -18,4856
ducks 2,2238 -0,0556 -18,1571
kimono 2,6269 -0,0753 -22,0783
park_joy 2,6517 -0,1036 -21,4131
parkscene 8,2029 -0,2211 -24,5764
pedestrian 5,6666 -0,1539 -23,8077
riverbed 0,4814 -0,0188 -18,5079
rush 3,7981 -0,0811 -24,3288
sunflower 9,4021 -0,2496 -31,2379
average 5,8565 -0,14986154 -23,1130308

The rate-distortion performance is clearly degraded, as the average bitrate increases by 5.9%.

Conclusion

The CU structure reuse for multi-rate encoding has been implemented in x265. Encoding results show that the overall encoding time for four representations can be reduced, while the rate-distortion performance is only slightly degraded. The multi-rate method is more effective for high x265 RDO levels (5 and 6).

Literature

[1] D.Schroeder, A. Ilangovan, M. Reisslein, E. Steinbach, “Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming”, IEEE Transactions on Circuits and Systems for Video Technology (accepted for publication), Aug. 2016 [PDF]

[2] D. Schroeder, A. Ilangovan, E. Steinbach, “Multi-rate encoding for HEVC-based adaptive HTTP streaming with multiple resolutions”, IEEE International Workshop on Multimedia Signal Processing, Xiamen, China, Oct. 2015 [PDF]

[3] J. De Praeter, A. J. Diaz-Honrubia, N. Van Kets, G. Van Wallendael, J. De Cock, P. Lambert, R. Van de Walle, “Fast simultaneous video encoder for adaptive streaming”, IEEE International Workshop on Multimedia Signal Processing, Xiamen, China, Oct. 2015

[4] D. Schroeder, P. Rehm, E. Steinbach, “Block structure reuse for multi-rate High Efficiency Video Coding”, IEEE International Conference on Image Processing, Québec City, Canada, Sep. 2015 [PDF]

[5] D. H. Finstad, H. K. Stensland, H. Espeland, P. Halvorsen, “Improved multi-rate video encoding”, IEEE International Symposium on Multimedia”, Dana Point, CA, USA, Dec. 2011

Advertisements

One thought on “Multi-rate HEVC with x265 for adaptive HTTP streaming

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s