I was invited to talk on “Video Coding for HTTP Adaptive Streaming” on the Research@Lunch which is a research webinar series by Humanitarian Technology (HuT) Labs, Amrita Vishwa Vidyapeetham University, India, exclusively for Ph.D. Scholars, UG, and PG Researchers in India conducted on August 14, 2021. This talk introduced the basics of video codecs and highlighted the scope of HAS-related research on video encoding.
The method of arithmetic coding was suggested by Elias, and presented by Abramson in his text on Information Theory. It is a form of entropy encoding used in lossless data compression. Normally, a string of characters such as the words “hello there” is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic encoding, frequently used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding, such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where [0.0 ≤ n < 1.0).
Here a source ensemble is represented by an interval between 0 and 1 on the number line. Each symbol of the ensemble narrows the interval. As the interval becomes smaller, the number of bits needed to specify it grows, Arithmetic coding assumes an explicit probabilistic model of the source. A higher probability message narrows the interval less than a low probability message so that high probability messages contribute fewer bits to the coded ensemble. The method with an unordered list of sources and their probabilities. The number line is partitioned into sub-intervals based on cumulative probabilities.
Huffman coding is an entropy coding algorithm used for lossless data compression, developed by David Huffman in 1952. The idea is to assign variable-length codes to input characters, the most frequent character gets the smallest code and the least frequent character gets the largest code. The Huffman algorithm is a so-called “greedy” approach to solving this problem in the sense that at each step, the algorithm chooses the best available option. It turns out that this is sufficient for finding the best encoding.
Prefix Property: The variable-length codes assigned are Prefix codes, means the data sequence are assigned such that the code assigned to one character is not the prefix code of any other character. This helps to avoid the ambiguity when decoding the generated bitstream. Let us understand prefix codes with a counterexample. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the decoded output may be “cccd” or “ccb” or “acd” or “ab”.
There are mainly two major parts in Huffman coding:
- Building the Huffman tree
- Traverse the Huffman tree and assign codes to the characters.
Procedure to be followed:
- The input characters are sorted in the descending or ascending order.
- Two characters of lowest probability are assigned a one and a zero.
- The sum of these two probabilities gives the new probability.
- This probability is placed in the sorted list and accordingly assigning ones and zeros.
- This procedure is repeated until all the probabilities get added up.
- The codeword of the input character is found by tracing back the ones and zeros assigned.
The featured image is from here.
The conventional way of acquiring a signal is to sample the data at the Nyquist rate. It results in a huge amount of data which consumes large storage space. So a compression is inevitable for storage and transmission. The fundamental question is “instead of creating too many data and compress it, can we directly acquire compressed measurements”. This is the key idea in Compressive Sensing. So the whole part of Compressed Sensing is to directly acquire compressed data. Compressive Sensing is based on the fact that the acquired signal ‘x’ has a sparse representation in some basis. Compressed Sensing enables us to beat Nyquist Sampling theorem. As stated above, sampling and compression are now performed in one step. The whole theory can be described using the following matrix notation:
y= Φ x
Here x is a column vector (Nx1) and only a few of them are non-zero. There are only K non-zero entries. Φ is a random matrix (MxN) matrix. Therefore y is a column vector (Mx1). So the goal is to recover x from y. The idea basically is that the φ matrix will preserve the sparse signal.
The input sparse vector x is in RN and the output vector y is in RM. The sensing matrix which maps the space RN to RM, has a Rank less than or equal to M (which is << N).
Properties to be satisfied by Φ:
- Random Gaussian matrix (M x N) such that:
Φ x1 ≠ Φ x2 for all K-sparse x1 ≠ x2.
This property is to make sure that the mapping is invertible so that no two output vectors are mapped to the same vector, while recovery. In cases where the above equation fails during recovery, Φ will not yield a unique signal, ‘x’.
- There are K unknown values and K unknown positions in x. So Φ must have an at-least 2K number of rows. Otherwise, there exist K-sparse x1 and x2, such that x1 – x2 = 0.
- The Restricted Isomeric Property (RIP), can be defined as:
In order to recover the signal back from a fewer number of measurements, L1 minimization is used. Here ‘y’ is an M x 1 measurement vector and ‘x’, is an N x 1 vector, where M << N. Since it is very difficult to reconstruct the original data from fewer observations, we need to go for some kind of minimization techniques. With a prior knowledge that our signal is sparse in some domain, we can use the method of Compressed Sensing along with an appropriate minimization technique to recover the original signal. L1 minimization will give us the required solution.
For any discrete signal x[n] of length N, Lp norm can be defined as:
The procedure we followed:
- Consider an image which is sparse in the spatial domain. Sparsity is the fundamental hypothesis in Compressive Sensing.
- Find the number of non-zero elements in the image. Let it be named as count.
- Generate a random matrix with numbers of rows = 4 * count and number of columns same as that of the image. Let this random matrix be named A.
- y=A * x, y=Measurement vector, A= Gaussian random vector, x=input image reshaped as a 1D vector.
- Reconstruction of x from y: Here we have used L1 minimization and this is done with the help of cvx (convex optimization package). The reconstructed value is called as x^
- Find the PSNR and MSE of x and x^.
An under-determined system of linear equations has more unknowns than equations and generally has an infinite number of solutions. In order to choose a solution to such a system, one must impose extra constraints or conditions (such as smoothness) as appropriate. In compressed sensing, one adds the constraint of sparsity, allowing only solutions which have a small number of nonzero coefficients. Not all under-determined systems of linear equations have a sparse solution. However, if there is a unique sparse solution to the under-determined system, then the compressed sensing framework allows the recovery of that solution.
The field of compressive sensing is related to several topics in signal processing and computational mathematics, such as under-determined linear-systems, group testing, heavy hitters, sparse coding, multiplexing, sparse sampling, and finite rate of innovation. Its broad scope and generality have enabled several innovative CS-enhanced approaches in signal processing and compression, the solution of inverse problems, the design of radiating systems, radar and through-the-wall imaging, and antenna characterization. Imaging techniques having a strong affinity with compressive sensing include the coded aperture and computational photography. Implementations of compressive sensing in hardware at different technology readiness levels are available.
When we seek the definitions of Brightness and Contrast, we get the following answers from Wikipedia:
“Brightness is an attribute of visual perception in which a source appears to be radiating or reflecting light. In other words, brightness is the perception elicited by the luminance of a visual target. This is a subjective attribute/property of an object being observed.”
“Contrast is the difference in luminance or color that makes an object (or its representation in an image or display) distinguishable to the human eye. In terms of visual perception, contrast is determined by the difference in the color and brightness of the object and other objects within the same field of view.”
So, the phrase “brightness of an image” does not provide sufficient inference. The brightness of every pixel in the image is its intensity value. We can define the brightness of the image to be the mean intensity value but it can always be changed by adding or subtracting a fixed number to/from each pixel intensity.
Contrast is a construct of visual perception. We can always define the contrast of an image to be one of the several ways mentioned here. In fact, different books provide different definitions. But the problem is that each of these definitions makes sense only for very simple images. We can’t usually numerically define the contrast of a complex image. Sure, you can use the, say, RMS contrast to be the value but I am not sure whether it provides any useful information for complex images. However, we can easily change the contrast of an image by multiplying/dividing the intensity values of the image by a fixed number. (Eg: Dividing by the standard deviation of the pixel values will be interesting)
Anyway, if you have to know the brightness and contrast values of an image, take the mean intensity value as brightness and the RMS contrast as the contrast value. Brightness refers to the overall lightness or darkness of the image. Increasing the brightness of every pixel in the frame gets lighter. Contrast is the difference in brightness between objects in the image. Increasing the contrast makes light areas lighter and dark area in the frame becomes much darker. Hence, brightness gives an understanding of the mean of pixel intensities while the contrast gives that of the standard deviation of pixel intensities in an image.
Thanks to my dear friend Aruna for asking this probing question.
The successive elimination algorithm (SEA) is one of the excellent lossless fast algorithms that effectively accelerates the search speed without loss of performance compared to that of brute force FULL search. SEA reduces the number of SAD calculations of the match point using a concise inequality, avoiding further calculation of points that will not become the best match point. A point (i, j) that could be the best match point in the SEA has the following prerequisite:
∇ Sum0 – MinSAD ≤ ∇ Sum(i, j) ≤ ∇ Sum0 + MinSAD
where ∇ Sum0 is the sum of the grayscale absolute value of all coding block points, MinSAD is the minimum SAD of the searched points, and ∇ Sum(i, j) is the sum of the grayscale absolute value at all points of the block that matches (i, j). Whether the matching calculation process of a point (i, j) is terminated depends on whether the above formula is met. When considering a new match point, we first determine whether it meets the formula. If it does, SAD(i, j) is calculated, otherwise the process skips this point. If SAD(i, j) is smaller than the current MinSAD, then MinSAD = SAD(i, j) and the corresponding vector is updated. The final search result is MinSAD and its corresponding motion vector.