The computational power at our fingertips is transforming our world faster than ever before. Each year around one hundred students begin a ten-month endeavor at the Institute for Advanced Analytics to sharpen their skills and learn about the technologies at the core of this transformation. One such technology is deep learning, which has carried artificial intelligence to new heights in natural language processing, speech recognition, and is recently showing potential as a tool for image compression [1].
Each image shared online can be represented as three grids of integer values ranging from 0-255, where each grid represents the amount of red, green, or blue in the image. In an uncompressed image, it would take three bytes to represent one pixel, resulting in a file size of around 24,883,200 bytes or 24.88Mb to represent one 4K image. Fortunately, clever compression algorithms such as JPEG and PNG exist, allowing images to be distributed much more efficiently. However, with the growing amount of photo and video media being shared on the internet each year, any improvement on an image compression algorithm would save tremendous amounts of time, energy, and storage.
Let’s talk a little bit about what a digital image is. Each grid represents the red, green, or blue (RGB) color components of the image. The magnitude of the number corresponds to the intensity of the color in that pixel. When these three grids are stacked on top of each other, the picture appears to us in its natural form. When capturing an image with a camera, we are discretizing a continuous signal. In this sense, the image on our screen is an approximation of the original signal the camera interpreted to create the image. At this point the image is a finite set of pixels that represents an approximation of a signal. JPEG uses a Discrete Cosine Transform (DCT) to take advantage of this underlying signal structure, compressing the image by removing the most redundant and least informative parts of the image. What’s left is an image that is much smaller in size with the sacrifice of some visual distortions compared to the original image. The smaller the compressed size, the more distortions will be present.
Now that we have covered a common compression tool that we are all familiar with, it is time to discuss how the field of machine learning, specifically deep learning, is making progress in the field of image compression. Deep learning is a field in machine learning that consists of artificial neural networks that have three or more layers of nodes, excluding the input layer, that are tasked with minimizing the objective function. These neural networks are designed to mimic the structures found in our brains, hence the term neurons. In the case of image compression, the neural network’s task is to map the pixel coordinates of an image to RGB values that are as close as possible to the original image. This is a form of lossy compression (like JPEG), which means that some of the original data from the image will be lost when reconstructing it using its Implicit Neural Representation (INR). The concept of an INR is that the weights of the neural network hold all the information necessary to recreate the image. Evaluating the neural network on pixel coordinates produces an approximation of the image it was originally trained on [1].
This technique might sound surprising because we are intentionally creating a situation of overfitting, one of the biggest problems in data science. In this case, overfitting is our friend because the model is able to capture the high frequency data represented in an image. Without a reasonably large model in terms of parameters, there would not be enough model capacity to handle the large variation present in images when moving from pixel to pixel. On the other hand, increasing the width of the network layers and the total number of layers, increases the amount of space the network takes to store and thus results in a smaller compression ratio of the original image. Additionally, to compress images using INRs, there is a specific activation function that has risen above the rest [2]. I’ve alluded to the structure of the data in an image as a form of discrete representation or approximation of a continuous signal. Perhaps it is unsurprising that a sinusoidal activation function is the most effective at capturing the high frequency data present in these images [3].
The practice of using INRs for image compression is in its infancy. Improvements have been made to the original model that improve its performance. When initializing a network, it is common to set the network weights randomly since the optimal weights are unknown. However, using a technique called meta-learning, the model’s training time has been significantly reduced [1]. Even with these new advances, this approach still lags behind other major methods of image compression, such as JPEG and other machine learning based approaches, such as auto-encoders [2]. Looking forward, this approach shows promise as a great tool for learning and pushing our understanding of the tasks deep learning is capable of succeeding in.
Researchers are hopeful that this strategy can generalize to data structures other than just images. Discovering these tools and algorithms capable of extracting structure across many different media forms is further inspiring humanity’s pursuit of creating an intelligence that compares to our own.
Students at the Institute for Advanced Analytics are among those inspired individuals. Over the course of the Master of Science in Analytics (MSA) program, we have studied intensive courses in statistics, data mining, machine learning and an eight month practicum project. The program revolves around a team work based environment, which comes together to forge the highest quality data professionals.
References
- Yannick Str ̈umpler, Janis Postels, Ren Yang, Luc van Gool, and Federico Tombari. Implicit neural representations for image compression, 2021.
- Emilien Dupont, Adam Goli ́nski, Milad Alizadeh, Yee Whye Teh, and Arnaud Doucet. Coin: Compression with implicit neural representations, 2021.
- Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. CoRR, abs/2006.09661, 2020.
Columnist: Brendan Bammer