Armed with the knowledge gathered by making the python-pinecamera implementation I started making a GTK3 version. This time using C instead of python, partially for performance, partially because docs are hard enough for C and the python abstraction makes it way harder.

The app is far from perfect and still has a bunch of low hanging fruit for performance improvements. At this moment it's still decoding the images in the UI thread making the whole app not feel very responsive, it's also decoding at another resolution than it's displayed, which can probably be sped up by decoding at exactly the right resolution and rotating while decoding.

One of the ways the image quality is increased on the sensors is by using a raw pixelformat instead of a subsampled one. Resolution on cameras is basically a lie. This is because while displays have subpixels, sensors don't. When you display an image on a 1080p monitor, you're actually sending 5760x1080 subpixels to the display and those subpixels are neatly in red/green/blue order.

On camera sensors when you have a 1080p sensor, you actually only have 1920x1080 subpixels. These subpixels are laid out out in a bayer matrix where you have (in the ov5640 case) on the first line a row of blue,green,blue,green... pixels and on the next row green,red,green,red,green,red. Then camera software takes every subpixel, gets the nearest other colored pixels and make that one "real" pixel. This makes for a slightly blurrier image than a 4k sensor being downscaled to 1080p.

To make the quality issue worse the image will be subsampled to have a 1080p grayscale image and the color channel will be quarter resolution. To make the quality even worse the camera will do quite agressive noise reduction, causing the oil paint look in bad pictures.

The fun part of debayering the image in the sensor and then subsampleing it is that the bandwidth required for the final picture is about twice as much as the original raw sensor data. On higher resolutions this causes the sensor to hit a bandwidth limit when sending the video stream at a reasonable framerate. This is why it wasn't possible before to get a 5 megapixel picture from the 5 megapixel sensor. But by using a raw mode you can get the full sensor resolution at 15fps. This is what Megapixels does.

Speed

One of the major improvements of the previous solutions for the cameras is that the preview has a way higher framerate. The ffmpeg based solutions could run in the 0.5-2 FPS region based on rotation and resolution the sensor was running at. These were also running in YUV mode instead of raw. The issue with YUV for the preview is that the sensor takes the RGB subpixels and returns the image in YUV pixels. The display in the phone is RGB again thus this has to be converted again, which takes quite some CPU power.

The easiest way to speed up data processing in computers is just process less data, so that's what it does. instead of debayering a full frame properly at 5 megapixels, it just does a quick'n'bad debayer by taking a single red, green and blue pixel from every 12x12 block of raw pixels and discarding the rest, no interpolation at all.

This is technically not a correct solution, it also produces way worse image quality than proper debayering. That's why when you take a picture the whole raw frame is run through a proper debayering implementation.