Go: Video To ASCII

13 Jul, 2024

So, this is a register about a simple project that I made using Go to understand a little bit more about Go channels.

Project Idea

The main goal of the project was to transform (in real time) the image from my webcam into an ASCII equivalent. The purpose was to learn more about Go channels, so I didn't put much effort into the graphical aspect of the project (skill issue). The code is available here.

Here I will explain briefly the steps that I took to achieve my goal, if you want to see more in depth, please take a look at the code.

Steps

TLDR:

Load Webcam Image
Transform Frames to Grayscale
Character Substitution
Display Frame
Repeat Process

Load Webcam Image

This step was about finding a way to access the Linux Video API (V4L2) through Go, after some research I found Vladmir Vivien implementation called Go4VL and just used it.

The general idea is that Go4VL creates a process to connect with your webcam device and streams each frame in real time for the program to access. However, this processing is done in a blocking manner. So, it's the perfect scenario for using a goroutine to listen to the webcam and send all the frames into a channel, allowing other goroutines to access the captured frames.

Note for myself: a channel is a safe communication tunnel between multiple goroutines.

With that in mind, we would have a frames channel as our highway to pass frames between goroutines. So, every frame coming from our webcam will be available in our frames channel to be used.

With that, we have each frame available in real-time through a channel, but let’s focus on the processing steps for just one frame.

Transform to Grayscale

The second step was to understand in which format the frames where being represented as. Specifically, this meant figuring out how each frame was structured inside the program.

In my case, each frame was being loaded as an array (slice) of bytes, and the pixels were in the format V4L2_PIX_FMT_RGB24, in the end I just used a JPEG decoder to get an Image object.

At this point, we need to have an ideia of what an Image is, so we can work with it. Here is a simple and general explanation:

An image can be understood as an array of pixels. Each pixel is represented as a tuple of numbers, with each number defining the value for a specific color channel, such as (R, G, B).

We know that each pixel in an image is represented by three numbers: R (Red), G (Green), and B (Blue). To map these to characters or simplify them, we first need to transform the three numbers into a single value. The simplest way to achieve this is by using a common grayscale equation:

Y = 0.299 R + 0.587 G + 0.114 B

Here is a classic example of the result of this transformation:

LenaGrayScaleTransform

After creating a function that performs this transformation on a pixel and applying it to all pixels in the frame, we now have, instead of 3 values representing a pixel, only one value. The next step is to map each pixel of the frame from its grayscale value to its equivalent character representation.

Character Substitution

In this step, the idea is to map a list of characters with different "black density" to our grayscale values. Let's suppose our grayscale values are in the interval [0,255]. I used the following characters list to map those values:

'@', '%', '#', '*', '+', '=', '-', ':', '.', ' '

Each character represents a different level of black density, where @ is the darkest and (space) is the lightest. The mapping equation can be defined as something like:

scalingFactor = \frac{256.0}{numChars}

index = [\frac{grayValue}{scalingFactor}]

With this approach, we map a grayscale value of 0 to the character @, a grayscale value of 255 to a (space), and calculate the corresponding characters for values in between.

With our mapping function defined, we simply need to process each pixel of the frame and transform each grayscale value into a character to create the new ASCII image.

At the end, we would have a big array of characters, that when rendered with a mono-spaced font and a good proportion between font size and line height would look like this:

LenaAscii

Display Frame

This step was pretty tedious. I chose to send each array of characters, representing an ASCII frame, via WebSocket to a simple HTML page, where it is rendered as text. As new frames arrive through the WebSocket, I replace the old frame with the new one, creating a video effect.

Repeat Process

Repeat, fast, for all the frames.