H.265 interlace encoding quality improvement in low delay scenarios

October 28, 2021

Interlaced video (sometimes also called interlaced scan) is a technique for increasing perceived frame rate without the bandwidth extension. Interlaced signals usually contain two fields. One consumes all odd rows of a picture, the other all even. In the video coding industry, they are commonly referred to as Top and Bottom fields. First commercial implementations were made in 1934 and standardized in 1936.

Benefits of interlaced video

Signal bandwidth is the most important factor in analogue television. The greater bandwidth (measured in megahertz), the better quality of video signal may be passed through it, but also the production and broadcasting chains become more expensive. The higher refresh rate (or frame rate in terms of video encoding) improves the image quality of objects in motion, due to more frequent update of their position. Human vision combines fields to produce the same perceived resolution as for progressive video. Note that this technique is useful for high refresh rate source content only. Interlaced video is successfully used for analogue television and works fine for it or for uncompressed digital signals, but it is not so efficient for digital video compression.

Interlaced video problems

In case when an object on captured video moves fast enough to be in different places when each of the two fields was captured, it can produce some moving artifacts known as interlacing effects or combining. Those artifacts are more visible on slow playback. Also, it is much harder to edit (zooming, for example) interlaced images.

Low delay video encoding

Low delay video encoding assumes minimal latency between capturing an image and its display after the encoding-broadcasting-decoding pipeline. It may be achieved by avoiding Intra and Bidirectional frames in GOP structure. Also, low delay encoding usually assumes only one reference.

H.265 low delay interlaced content encoding

The first version of the H.265 standard (also known as HEVC) was approved in 2013. Initially, interlace encoding support was not assumed in H.265 standard and was added merely as SEI messages later. So some pretty efficient encoding tools, like special algorithms for reference lists building, didn't shift from the previous H.264 standard. In terms of H.265, interlace encoding is a simple progressive encoding with special SEI indicating that the encoded access unit is a field of fields pair but not a progressive frame.

Our improvements for H.265 interlace encoding

In this article, we will suggest two improvements for standard HEVC encoding. All our experiments will be run on the Tigerlake system using intel hardware HEVC encoder from open source MediaSDK.

We will encode several streams with several bitrates to build plots with RD (Rate-Distortion) curves for the first step. Quality metrics values from this step will be used as "gold" values for comparing them with suggested improvements results. For this step, we will use some additional options (restrictions) to create an interlace low delay stream:

tff - to create an interlace stream with top-field-first field order
x 1 - to create a stream with only one reference frame
r 1 - to create a stream without bidirectional fields (B-fields)
g 0 - to create “infinity” GOP (stream with only one I-frame)

So the field order of the encoded stream should look like this:

TF and BF letters above bars indicate field top or bottom. Arrows indicate referencing. “I” and “P” letters under bars indicate the type of encoded access unit (intra or predicted). Digits under bars indicate the display order of encoded fields. So we need to encode several streams with different content and different bitrates for future quality measurement. For the purity of the experiment, we will use interlaced content with a high display rate.

Let's look at the one produced stream in VQ Analyzer:

As you can see in the hierarchy window (on top), the biggest problem of this stream is that each field is referenced to the previous one, so the top field uses the bottom as a reference, and otherwise the bottom field uses the top.

Our first improvement will be to reference the field to the previous one with the same parity. This technique is a standard for H.264 and shows pretty good results for interlaced content, so let's try it for H.265. For this experiment, we will set some other options for encoder:

tff - to create an interlace stream with top-field-first field order
x 2 - to create a stream with two reference frames in DPB (decoded pictures buffer). One for the top field and one for the bottom.
r 1 - to create a stream without bidirectional fields (B-fields)
g 0 - to create “infinity” GOP (a stream with only one I-frame)
num_active_P 1 - to create a stream with only one reference for every field
use_interlace_refs - to add external reference lists modifications.

Also, we will need to update the sample code to pass external structure with reference list modifications to the encoder. Intel MediaSDK encoder has API functions to do it. So the field order with references in the encoded stream should look like this:

Let's check one of the produced streams in the analyzer:

As you can see, reference lists look like suggested now, so let's get the quality measurement metrics and compare streams from the first and the second steps:

bdRate, %	gold	improved
gold	0	3.9
improved	-3.75	0

bdRate, %	gold	improved
gold	0	-33.97
improved	51.45	0

bdRate, %	gold	improved
gold	0	-13.85
improved	16.08	0

Thus as you may see, some content does not produce better quality, but most cases do. Performance tests show the same results for standard encoding and improved one, so we will not show a detailed histogram with FPS in this article.

The most efficient coding tool for encoding using temporal prediction is a reordering with bidirectional frames. However, B-fields usage is not part of the low delay conception. But at present, most of the interlaced streams are deinterlacing (a technique to produce a progressive picture from interlaced fields pair), so the only thing we should care about low delay encoding today is the latency between fields pairs. It allows us to use reordering inside one fields pair and encode one of the fields as a B-field for using its benefits.

Intel MediaSDK does not allow reordering inside one field pair, so we had to change the reordering algorithm in the core of MediaSDK. For this experiment, we will set some other options for encoder:

tff - to create an interlace stream with top-field-first field order
x 3 - to create a stream with two reference frames in DPB (decoded pictures buffer). Two for the top field (past and future reference) and one for the bottom.
r 2 - to create a stream with one bidirectional field (B-field)
g 0 - to create “infinity” GOP (a stream with only one I-frame)
num_active_P 1 - to create a stream with only one reference for every field
num_active_Bl0 1 - to create a stream with only one past reference for every B-field
num_active_Bl1 1 - to create a stream with only one future reference for every B-field

So we suggest encoding every top field as B-field, and the field order of the encoded stream should look like:

Thus as you may see, for allowing this structure, we should swap fields in all pairs except the first one.

Let's check how it looks in the analyzer:

and how it looks on hierarchy mode:

Now we can check the quality and compare it with the results from the previous step:

bdRate, %	improved	improved + B-field
improved	0	-13.26
improved + B-field	15.29	0

bdRate, %	improved	improved + B-field
improved	0	-27.96
improved + B-field	38.8	0

bdRate, %	improved	improved + B-field
improved	0	-6.13
improved + B-field	6.53	0

As you can see, in some cases, B-fields provide even more benefits than the first suggested improvement.

Conclusions

At present, the interlace encoding is not as common as it was before, but there are certain instances when it is still necessary to make interlaced streams with new codecs not actually designed for it. This article showed some improvements from the previous standards, which may be very useful and efficient in modern codecs.