BrixIT BlogRandom blog posts from Martijn Braam Blog, 04 Jun 2023 12:12:37 -000060Developers are lazy, thus Flatpak BraamSat, 03 Jun 2023 15:58:47 -0000<p>In the last decade I have seen a very slow but steady shift to solutions for packaging software that try to isolate the software from host systems to supposedly make things easier. My first experience with this was Docker, now Flatpak is the thing for desktop applications.</p> <h2>The promise of Flatpak</h2> <p>So the thing Flatpak is supposed to fix for me as developer is that I don't need to care about distributions anymore. I can bolt on whatever dependencies I want to my app and it's dealt with. I also don't need to worry about having software in distributions, if it's in Flatpak it's everywhere. Flatpak gives me that unified base to work on and everything will be perfect. World hunger will be solved. Finally peace on earth.</p> <p>Sadly there's reality. The reality is to get away from the evil distributions the Flatpak creators have made... another distribution. It is not a particularly good distribution, it doesn't have a decent package manager. It doesn't have a system that makes it easy to do packaging. The developer interface is painfully shoehorned into Github workflows and it adds all the downsides of containerisation.</p> <h3>Flatpak is a distribution</h3> <p>While the developers like to pretend real hard that Flatpak is not a distribution, it's still suspiciously close to one. It lacks a kernel and a few services and it lacks the standard Linux base directory specification but it's still a distribution you need to target. Instead of providing seperate packages with a package manager it provides a runtime that comes with a bunch of dependencies. Conveniently it also provides multiple runtimes to make sure there's not actually a single base to work on. Because sometimes you need Gnome libraries, sometimes you need KDE libraries. Since there's no package manager those will be in seperate runtimes.</p> <p>While Flatpak breaks most expectations of a distribution it's still a collection of software and libraries build together to make a system to run software in, thus it's a distribution. A really weird one.</p> <h3>No built in package manager</h3> <p>If you need a dependency that's not in the runtime there's no package manager to pull in that dependency. The solution is to also package the dependencies you need yourself and let the flatpak tooling build this into the flatpak of your application. So now instead of being the developer for your application you're also the maintainer of all the dependencies in this semi-distribution you're shipping under the disguise of an application. And one thing is for sure, I don't trust application developers to maintain dependencies.</p> <p>This gets really nuts by looking at some software that deals with multimedia. Lets look at the Audacity flatpak. It builds as dependency:</p> <ul><li>wxwidgets</li> <li>ffmpeg</li> <li>sqlite</li> <li>chrpath</li> <li>portaudio</li> <li>portmidi</li> </ul> <p>So lets look at how well dependencies are managed here. Since we're now almost exactly half a year into 2023 I'll look at the updates for the last 6 months and compare it to the same dependencies in Alpine Linux.</p> <ul><li>audacity has been updated 4 times in the flatpak. It has been updated 5 times on Alpine.</li> <li>ffmpeg has been updated to 6.0 in both the flatpak and Alpine, but the ffmpeg package has had 9 updates because if codecs that have been updated.</li> <li>sqlite hasn&#x27;t been updated in the flatpak and has been updated 4 times in Alpine</li> <li>wxwidgets hasn&#x27;t been updated in the flatpak and has been updated 2 times in Alpine</li> <li>chrpath hasn&#x27;t had updates</li> <li>portaudio hasn&#x27;t had updates in flatpak and Alpine.</li> <li>portmidi hasn&#x27;t had updates</li> </ul> <p>This is just a random package I picked and it already had a lot more maintainance of the dependencies than the flatpak has. It most likely doesn't scale to have all developers keep track of all the dependencies of all their software.</p> <h3>The idea of isolation</h3> <p>One of the big pros that's always mentioned with Flatpak is that the applications run in a sandbox. The idea is that this sandbox will shield you from all the evil applications can do so it's totally safe to trust random developers to push random Flatpaks. First of all this sandbox has the same issue any permission system that exists also has. It needs to tell the user about the specific holes that have been poked in the sandbox to make the application work in a way that end users <i>understand</i> what the security implications of those permissions are.</p> <p>For example here's Gnome Software ready to install the flatpak for Edge:</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>I find the permission handleing implemented here very interesting. There's absolutely no warning whatsoever about the bypassed security in this Flatpak untill you scroll down. The install button will immediately install it without warning about all the bypassed sandboxing features.</p> <p>So if you <i>do scroll down there's more details right? Sure there is!</i></p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>There's a nice red triangle with the words Unsafe! pfew, everyone is fine now. So this uses a legacy windowing system which probably means it uses X11 which is not secure and breaks the sandbox. Well if that's the only security issue then it <i>might</i> be acceptable? Let's click that button.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>Well yeah... let's hide that from users. Of course the browser needs to write to /etc. This is all unimportant to end users.</p> <p>The even worse news is that since this is proprietary software it's not really possible to audit what this would do, and even if it's audited it's ridiculously easy to push a new more evil version to Flathub since practically only the first version of the app you push is thorougly looked at by the Flathub maintainers.</p> <p>Even if there weren't so many holes in the sandbox. This does not stop applications from doing more evil things that are not directly related to filesystem and daemon access. You want analytics on your users? Just requirest the internet permission and send off all the tracking data you want.</p> <h2>So what about traditional distributions</h2> <p>I've heard many argument for Flatpaks by users and developers but in the end I can't really say the pros outweigh the cons.</p> <p>I think it's very important that developers do not have the permissions to push whatever code they want to everyone under the disguise of a secure system. And that's <i>my opinion as a software developer</i>.</p> <p>Software packaged by distributions has at least some degree of scrutiny and it often results in at least making sure build flags are set to disable user tracking and such features.</p> <p>I also believe software in general is better if it's made with the expectation that it will run outside of Flatpak. It's not that hard to make sure you don't depend on bleeding edge versions of libraries while that's not needed. It's not that hard to have optional dependencies in software. It's not that hard to actually follow XDG specifications instead of hardcoding paths.</p> <h2>But packaging for distributions is hard</h2> <p>That's the best thing! Developers are not supposed to be the ones packaging software so it's not hard at all. It's not your task to get your software in all the distributions, if your software is useful to people it tends to get pulled in. I have software that's packaged in Alpine Linux, ALT Linux, Archlinux AUR, Debian, Devuan, Fedora, Gentoo, Kali, LiGurOS, Nix, OpenMandriva, postmarketOS, Raspbian, Rosa, Trisquel, Ubuntu and Void. I did not have to package most of this.</p> <p>The most I notice from other distributions packaging my software is patches from maintainers that improve the software, usually in dealing with some edge case I forgot with a hardcoded path somewhere.</p> <p>The most time I've ever spent on distribution packaging is actually the few pieces of software I've managed to push to Flathub. Dealing with differences between distributions is easy, dealing with differences between runing inside and outside Flatpak is hard.</p> <h2>But Flatpaks are easier for end users</h2> <p>I've ran into enough issues as end user of flatpaks. A package being on Flathub does not mean that it will be installable for an end user. I've ran into this by installing packages on the PineBook Pro which generated some rather confusing error messages about the repositories missing. It turns out that the Aarch64 architecture was missing for those flatpaks so the software was just not available. Linux distributions generally try to enable as much architectures as possible when packaging, not just x86_64.</p> <p>A second issue I've had on my Pinebook Pro is that it has a 64GB rootfs. Using too many flatpaks is just very wasteful of space. In theory you have a runtime that has your major dependencies and then a few Megabytes of stuff in your application flatpak. In practice I nearly have an unique platform per flatpak installed because the flatpaks depend on different versions of that platform or just on different platforms.</p> <p>Another issue is with end users of some of my Flatpaks. Flatpak does not deal well with software that communicates with actual hardware. A bunch of my software uses libusb to communicate with sepecific devices as a replacement for some Windows applications and Android apps I would otherwise need. The issue end users will run in to is that they first need to install the udev rules in their distribution to make sure Flatpak can access those USB devices. For the distribution packaged version of my software it Just Works(tm)</p> <h2>Flatpak does have it's uses</h2> <p>I wouldn't say Flatpak is completely useless. For certain usecases it is great to have available. It think Flatpak makes most sense for when closed source software would need to be distributed.</p> <p>I would like to see this be more strict though. I wouldn't want to have flatpaks with holes in the sandbox with a proprietary license for example. Which is exactly what the Edge flatpak is.</p> <p>It's quite sad that Flatpak madness has gone so deep into the Gnome ecosystem that it's now impossible to run the nice Gnome Builder IDE without having your application in a flatpak. (EDIT: Turns out that using Builder without Flatpak is possible again)</p> <p>I don't think having every app on a Linux machine being Flatpak is anything I'd want, If I wanted to give developers that much power to push updates to anywhere in my system without accountability I'd just go run Windows.</p> Can't get blinky on the BL602 BraamMon, 01 May 2023 16:46:41 -0000<p>One of the boards I have in my parts box is the PINE64 BL602 evaluation board. The "PineCone". It's a competitor to WiFi enabled microcontrollers like the Espressif products.</p> <p>I put it aside since when I received it there was not really any software to go along with it and I don't have anywhere near the experience to bring up a board like this.</p> <h2>Getting blinky running 3 years laters</h2> <p>This product has been out for 3 years now so lets see if it has formed a community.</p> <p>So if you are not familiar with blinky, it's the Hello World! of the microcontroller world. The goal is have a led connected to one pin of the board and toggle that pin on and off every second to make a blinking led.</p> <p>Let's check the website first. The official page for this from PINE64 itself is <a href=""></a>. Since the only official thing is the webshop which doesn't list any documentation or links to progress further on how to use this thing I consider it a dead end. Let's move on to the side.</p> <p>The community side is the wiki page at <a href=""></a>. This has the specs again and a block diagram. It also has the datasheets for the components on the board and the schematics. Not anything I actually need to get started using this product.</p> <p>Let's compare to the Raspberry Pi Pico, a competitor that's doing quite well. When you search for this you land on the official product page <a href=""></a> that has the shop link, the specifications but more importantly it's a hub linking to the "Getting started" guide and the documentation for users.</p> <p>So lets scroll down on the PineCone wiki page since it's my only hope for docs. Thee quarters down this long wiki page is a list of flashing utilities. I don't have anything to flash yet but I look forward to figuring which of the 6 listed flashers fits my personality best /s.</p> <p>The rest of this page is links to random github repositories and blog articles. The top link being the link to the SDK for this board. It links to <a href=""></a> and is labeled as "compilers, linkers, and all the code to build on Windows, Linux (x86_64), and MacOS". The last activity here was 2 years ago. This seems mostly focussed on the reverse engineering effort but lets have a look.</p> <h2>bl_iot_sdk</h2> <p>This repository has a README with again more links to a lot of different places. One of these is linking to <a href=""></a> which is a sphinx generated documentation site. This is several pages too deep from the official pine64 website but I forgive that since it has the magical words I've been looking for:</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>So I'm going down this path, it almost seems familiar to the way the Pi Pico sdk is used. Clone the repository and create an environment variable pointing to the checkout folder.</p> <p>Next it tells me how to compile the example apps, but I don't want to compile the example apps, those probably work and they are in-tree so those probaby Just Work(tm) without setting up the scaffolding you should do for your own projects. I just want to build a main.c that does blinky. This part of the documentation sadly stops after compiling and flashing the built-in apps.</p> <p>I have continued browsing this documentation for a bit but it does not show at all how to make a minimal blinky example and build it. The rest of the documentation is mainly using the advanced features, various flashing systems and debugging systems and image formats.</p> <p>So to compare this with the Pi Pico documentation: The first thing I find when I read the documentation of the SDK is:</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Blinky in the Pi Pico docs</figcaption></figure> <p>Chapter 1.2 is the first chapter with actual documentation and it starts with all the things I want for getting started with yet another microcontroller board:</p> <ul><li>A very simple .c file that implements blinky without using any advanced features of the hardware.</li> <li>The output of <code>tree</code> showing exactly which folder structure I should have and where the files go</li> <li>The CMakeLists.txt that builds this application</li> </ul> <p>The BL602 documentation has shown me none of the above yet. At this point I give up and just open 20 tabs in my browser with all the linked pages from the wiki.</p> <p>All these links are random flashing utilities, porting new programming languages and porting new frameworks. Also a lot of them showing how to flash the example apps to the thing.</p> <p>PEOPLE DON'T BUY MICROCONTROLLERS TO FLASH EXISTING EXAMPLES</p> <h2>Getting ANYTHING to run</h2> <p>So there is no documentation on how to make blinky for this platform whatsoever. I give up and figure it out myself from the sdk example apps.</p> <p>There's no blinky in the examples, the simplest application I could find was the helloworld one that outputs some text over uart. This uses FreeRTOS to get that working so there's a complete lack of basic single .c file demo applications for this.</p> <h2>More issues</h2> <p>This SDK has multiple things I don't like. The most important one is that it ships prebuilt compilers in the source tree. This is not a proper SDK, this is some Android-level build mess.</p> <p>The whole point of using nice RISC-V boards is that this is a standard architecture, all Linux distributions already ship a maintained RISC-V toolchain so why would I ever use some random precompiled binaries shipped by a Chinese vendor.</p> <p>Why is this using Makefiles to build the thing instead of cmake which the Raspberry Pi foundation have already proven you can integrate way neater. This is just throwing hardware over the wall and linking to the chip vendor documentation, hoping the community can match a proper made tutorial, proper documentation and make a new SDK that's not a mess.</p> <p>I guess I just dump this thing back in the parts bin and wait some years. Quite sad since this seems to be some really nice hardware and a great alternative to the existing microcontroller platforms.</p> Digital Aerochrome BraamSun, 30 Apr 2023 11:11:55 -0000<p>A long long time ago in the 70s there was a lot of interesting film being made. At this time film was used for everything, professional and consumer cameras, movies, aerial photography. A lot of time has gone into recreating film looks for digital cameras now to reproduce the color response of the old film cameras.</p> <p>One of the more popular examples of this is the Fujifilm X-series digital cameras that have film simulations built in. I've heard they are quite good but those cameras are quite expensive and I find it a bit useless to get a seperate camera for just that trick.</p> <p>One legendary film stock you won't see simulated on anything like this is Kodak Aerochrome III Color Infrared. This film was originally produced for aerial photography. Most infrared film is black and white film that's also sensitive to infrared. What makes Aerochrome so special is that it's a color film that is infrared sensitive and uses false color to visualize the infrared spectrum.</p> <p>While it's not possible to get real Aerochrome anymore it is possible to get the datasheet for it. This old datasheet has a great explanation of why this film exists, how it works and has color spectrum sensitivity data.</p> <p>Color film is made from multiple layers of photosensitive material. Each layer has a different sensitivity spectrum and will produce a specific color when developed. For example one layer will be sensitive to red light and when developed will produce a cyan dye. This will create the negative image on the color negative. When this negative is inverted it the cyan will become the red part of the image again.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Sensitivity spectrum of Kodak Ektar 100 color negative film</figcaption></figure> <p>But the color produced on the negative doesn't have to correlate with the original color that was recorded. When the spectrum of the sensitive layer doesn't correlate with the produced color dye it's called false color film. </p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Sensitivity spectrum for Kodak Aerochrome III Infrared color negative film</figcaption></figure> <p>The scale for this datasheet is quite different because it has to include the infrared spectrum. One important thing to note is that the Aerochrome film is normally used in combination with a yellow filter on the lens, this will remove most of the blue light from the photograph. Then the false color is mapped so that infrared light will create cyan which inverts back to red. The red sensitive layer will produce magenta and develop to green. The green sensitive layer will produce yellow dye which produces blue on the final image. All the three layers are also sensitive to blue light which is why the filter is used together with this film.</p> <h2>Reproducing on a digital camera</h2> <p>The difficulty with reproducing this film is that cameras are designed to be not sensitive to infrared light. Even if the infrared light filter is removed from a camera the infrared light will just be detected as regular red light. This is not very useful because for the proper false color the visible red light should be shifted to green while the infrared should stay red. My solution to this is taking two pictures. One regular picture to capture the normal color spectrum and one with a R72 filter. This filter in front of the lens will let only light above 720nm pass which is exactly the part of the infrared spectrum I need.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>To the naked eye this filter just looks black since human eyes aren't sensitive to infrared light. On my camera it produces a way more interesting result though.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>This is the infrared light stored into the red channel of the picture. Now the trick to make this looking like aerochrome is combining the R and G channels from the normal picture and the R channel from the picture with the IR filter. This is annoyingly hard to do correctly with most software, especially if you need to align the images.</p> <h2>Custom software</h2> <p>So aligning two images and some very easy color manipulation is something I had already written before in postprocessd. I made a copy of this software and changed the pipeline to align the IR image on top of the RGB image. Since this is using libraw you can just feed it any supported camera raw file.</p> <p>After stacking the images are combined into a single 4-channel RGBI image. Then a 4x3 matrix is run over this image to get the final 3-channel RGB output.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>Yep, doing it this way is probably overkill compared to scripting a series of imagemagick commands. But it does allow the flexibilty of doing more complicated transforms than just moving the color channels around. The datasheet says to use a yellow filter because all the layers are sensitive to blue light, so what happens if I modify this matrix to leak blue light into it?</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Yellow filter on the left, No yellow filter on the right</figcaption></figure> <p>Mixing in the blue channel into the three other channels will lower the contrast of the image a bit, it also makes the sky a lot lighter again since... that was quite blue.</p> <p>Since I have this option I added the <code>-y</code> argument to the tool to disable the yellow filter.</p> <p>My software for this is available on <a href=""></a></p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <figure class="kg-card kg-gallery-card"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="451"></div><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="451"></div><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="451"></div></div><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="451"></div><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="451"></div><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="451"></div></div></div></figure> <h2>Can this be done realtime?</h2> <p>One neat feature of the Fujifilm X-series cameras is that it gives a live preview of the film look while taking the picture. Since this process needs two pictures that would be impossible unless Fujifilm develops a sensor where IR cells are added to the bayer matrix.</p> <p>Another option is hacking up somthing yourself. This is probably doable with a raspberry pi 4 compute module and two sensors, one regular IMX219 sensor and one NOIR option with the visible light filter added. To make it always align this can be shot through a beamsplitter prism which can be very small with these sensor sizes.</p> <p></p> NitroKey disappoints me BraamTue, 25 Apr 2023 18:39:47 -0000<p>There's an article making the rounds from NitroKey named "<a href="">Smartphones With Popular Qualcomm Chip Secretly Share Private Information With US Chip-Maker</a>".</p> <p>This article is a marketing piece for selling their rebadged Pixel phones by picking a random phone and pointing at network traffic. It takes a look at a Sony Xperia XA2 and for some reason calls out Fairphone in particular.</p> <p>The brand of the device should not really matter if this is a chipset issue as the article claims but it goes even further than just calling out other brands, it also additionally uses a custom rom to check these things instead of software supplied by those brands.</p> <p>The second thing the article does is point out that WiFi geolocation exists and is done by Google and Apple by showing a screenshot from the Wiggle service that has nothing to do with that. Phones use Cell network, WiFi and network geolocation to get a rough location of a device, not for evil but for saving power. This prevents the need to run the GPS receiver 24/7 since most features don't need an exact location. There's no claims being made by NitroKey that their phone doesn't provide any of this.</p> <p>After this we get to the main claim in the title of the article. The Qualcomm 630 chipset supposedly sharing private information with the manufacturer. The author of the article has found that the device connects to and instead of doing the logical thing and opening <a href=""></a> in a browser they do a whois request and then figure out it's from Qualcomm, They also proceed to contact Qualcomm lawyers instead of following the link on this page. The webpage hosted on this domain does conveniently explain who owns the domain and what it's purpose is and it's associated privacy policy. But that doesn't sound nearly as spooky.</p> <p>The next section makes the claim that this traffic is HTTP traffic and is not encrypted. It proceeds to not show the contents of this HTTP request because it would show that it's not at all interesting. It does not contain any private data. It's just downloading an GPS almanac from Qualcomm for A-GPS.</p> <p>The A-GPS data is only there to make getting a GPS fix quicker and more reliable. GPS signals are pretty weak and getting a lock indoors from a cold start (the device has been off for some time) is hard. Inside the GPS signal sent by the satellites there's occasional almanac data that compensates for things like atmospheric distortions, without the almanac your GPS position wouldn't even get within a few hundred meters of your actual position. Since this signal is only occasionally broadcast and you need to listen to a GPS sattelite for an extended amount of time (the broadcast takes around 10 minutes) it's easier for these modern devices to just fetch this information from the internet. Qualcomm provides this as a static file for their modems.</p> <p>This feature isn't even only in the Qualcomm 630 chipset, it's in practically all Qualcomm devices. Some third party Android roms go as far as to obscure the IP address of your phone by proxying this http request with another server. The rom they have tested obviously didn't.</p> <p>This feature is not even limited to Qualcomm devices, this practice happens in practically all devices that have both GPS and internet because people don't like waiting very long for their position when launching their navigation software. The NitroPhone has their GPS provided by Broadcom chips instead of Qualcomm ones so obviously it won't make the same HTTP requests, doesn't make it any more or less secure though.</p> <p>Now the main issue, is this personal data? The thing that gets leaked is your IP address which is required because that's how you connect to things on the internet. This system does not actually send any of your private information like the title of the article claims. </p> <h2>I'm disappointed</h2> <p>The reason for articles like this is pretty obvious. They want to sell more of their phones for a massive profit margin. The sad thing about making these "Oh no all your data is leaking!!!" articles is that when there's actual leaks it won't stand out between all the marketing bullshit. The painful part is that it's actually working. See the outrage about companies not having ethics and not following laws.</p> <p>This feature is not breaking laws, it's not unethical, it's not even made for eeeevill.</p> Reverse engineering the BMD camera HDMI control, part 2 BraamFri, 14 Apr 2023 11:32:24 -0000<p>In the <a href="">previous part</a> of this series I cut open an HDMI cable between a Blackmagic Design ATEM Mini and a Pocket 4k camera to attempt to figure out how the ATEM Mini can control the camera over the HDMI cable only.</p> <p>While reverse-engineering that I looked into the CEC lines and was unable to get the DDC data at the same time because cutting up a HDMI cable is not a reliable way to probe data. The first thing I did was make a neat PCB for debugging this further.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>HDMI Passthrough PCB</figcaption></figure> <p>This is a very simple board that has two HDMI connectors with all the high speed pairs connected straight accross and all the slow speed signals broken out to two pin headers. One 7-pin header gives probing access to all the pairs, voltages and ground and a 2x6-pin header allows breaking the connection between the two HDMI ports for any of the data signals by removing the jumpers.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>PulseView result of plugging in the camera</figcaption></figure> <p>With this setup I'm able to dump all the traffic I'm interested in. Plugging in the camera shows up as a glitch in the HPD line, then the SDA/SCL lines are busy for a bit communicating the EDID for the "display" implemented in the ATEM Mini HDMI input. Then the CEC line starts showing activity with the port number the camera is plugged into and some vendor specific commands.</p> <p>Looking back at my previous post after I've figured out more of this and after reading parts of the HDMI and EDID specs it turned out that I already had all the data, I just didn't recognize it yet.</p> <p>On my initial look at the CEC data I did not know which bytes are transmitted by which device. With just removing the CEC jumper from by board it became quite visible which data was sent by the camera and even without the CEC line being connected to the ATEM Mini at all. Also I noticed that the camera still knew what camera number it had. I initially assumed the first bytes containing the camera number were coming from the ATEM side. Since that connection is severed it must be getting this data from the EDID blob.</p> <h2>The EDID specifications</h2> <p>PulseView only shows annotations for half the EDID blob it has received. So I turned to the suprisingly great EDID Wikipedia page which documents all the bytes. My first try to figure things out whas the <code>parse-edid</code> command on Linux from the <code>read-edid</code> package. This does parse all the monitor and resolution data I don't want but does not seem to decode all of it. I pasted the EDID blob in my <a href="">hex dump slicer</a> and started annotating the bytes according to the Wikipedia page.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <p>The annotated part covers the initial 128 bytes of the EDID blob with the basic monitor information. The num_exts byte here is set to <code>0x01</code> so immediately following it is more data, in this case a CEA-861 extension block. This block can contain more detailed resolution info (and more importantly, resolutions for more modern monitors). It also has space for custom data blocks. The first blocks are the well documented Video block, Audio block and Speaker block. The fourth block that exists is the Vendor block.</p> <p>I made a wrong assumption here. I thought since this is a vendor block it would be a block with undefined data from BlackMagic Design. This block also contains the only byte that changes between the 4 ports of the ATEM Mini. The fourth byte of this block was <code>0x10</code>, <code>0x20</code>, <code>0x30</code> and <code>0x40</code> for the four ports which confused me even further, why is this using the high four bits.</p> <p>After having another coffee and reading the Wikipedia page on EDID a bit further I found out that the first three bytes of the vendor block are the vendor identification which makes sense if you can have multiple of these vendor blocks. To my suprise the example value of a vendor id is <code>0x000c03</code> which is in my packet dump.</p> <p>Turns out I was reverse engineering the HDMI 1.4 specification here. It's even possible to just download this specification! The most useful part of that PDF is this:</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>The changing 4 bits is the one marked <code>A</code> in this table. And... the A,B,C,D fields are for setting the CEC physical address for the device connected to that port.</p> <p>So suprisingly so far everything here is just in-spec and documented and I now learned more on how CEC works. The camera reads the EDID block to know the output timings and also reads the CEC physical address from the ATEM. In my case the port number wasn't <code>0x40</code>, it was the <code>0x40 0x00</code> which translates to CEC address <code></code></p> <p>So if I want to remotely control my camera I need to do EDID emulation and insert this data block if the upsteam device did not already set it.</p> <h2>The CEC part</h2> <p>So lets have a look at the CEC communication after the EDID exchange has happened. First quickly after connecting there's two very short bursts of CEC data. PulseView decodes this as CEC pings so I will ignore those for now. This is an export of the rest of the CEC data in this recording:</p> <pre><code>8185-700390 CEC: Frames: 1f:84:20:00 712370-745164 CEC: Frames: 1f:a0:7c:2e:0d:01:01 748596-757793 CEC: Frames: 01:01 761361-775144 CEC: Frames: 10:02:01 778761-802480 CEC: Frames: 01:50:3c:00:00 806749-864354 CEC: Frames: 01:70:25:70:20:43:41:4d:20:25:69:00 868623-906865 CEC: Frames: 0f:a0:7c:2e:0d:02:02:10 911133-939694 CEC: Frames: 01:68:70:74:73:00 943963-962842 CEC: Frames: 01:65:32:00</code></pre> <p>The CEC protocol uses addressing with 4 bits for everything. In my packet dump for this trace the camera was connected to port <code></code> of the ATEM. The first byte of the CEC frame is also address info, but this is the logical address instead of the physical address. This is split in 4 bits for the sender and 4 bits for the receiver. Address 0 is always the CEC root which is the ATEM and Address F is the broadcast address. The camera uses address 1 for its communication. The second byte if the packet is the CEC opcode.</p> <p>The first packet the camera sends is opcode <code>0x84</code>. This is a mandatory packet that is broadcasted from the device that's connected to tell the CEC network about the mapping between the physical address and logical address. In this case logical device <code>0x1</code> is broadcasting that the physical address is <code>0x2000</code> which is <code></code>. </p> <p>The second packet is opcode <code>0xa0</code> which is "Vendor Command With ID". Now I've entered the reverse engineering area again. The next 3 bytes is <code>0x7c 0x2e 0x0d</code> which corresponds to the OID for BlackMagic Design and the vendor data is <code>0x01 0x01</code>. After this packet has been sent the communication starts breaking the CEC specification and is now just sending BMD specific data. All the data PulseView is trying to decipher from the bytes are just red herrings.</p> <h2>Emulating the ATEM</h2> <p>So now the basics of the communication are known the next part is emulating an EDID and seeing what happens on the CEC line to get more information. For this I'm using a Raspberry Pi Pico hooked up to my HDMI passthrough board.</p> <p>I removed all the jumpers from the passthrough board to isolate the slow signals in the HDMI connection and hooked them all up to the Pico. On the initial tests I could not get any signals from the camera this way, I was expecting just pulling the hot-plug-detect pin high would be enough to start the HDMI connection. It turns out that I need to have a monitor connected to the high speed pairs to make the camera happy. </p> <p>The first thing the camera does is reading the EDID so I started with implementing EDID emulation. For this the Pico should act as an I2C slave which is suprisingly undocumented in the SDK manual. The only thing the manual says about it is using the <code>i2c_set_slave_mode(i2c0, true, 0x50)</code> command to put the hardware in the right mode. The rest is undocumented and requires poking registers outside of the SDK for now, hopefully that will get fixed in the future. With this I implemented an I2C device that responds on address <code>0x50</code> and has a block of 256 bytes of data. This just contains a copy of the EDID of one of the ATEM ports for now.</p> <p>The harder part is doing CEC on the Pico. So far it seems like nobody has made a library for this and due to the existence of CEC on the Raspberry Pi SBCs it makes searching pretty hard. In theory it should be possible to implement this using a PIO block to handle the bitbanging part. For now I'm just using one of the cores to bitbang the protocol.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <p>This implementation just supports receiving CEC data and sending the ACK back to make the camera continue sending data. The debug pin in this trace is used to debug the sample point and packet detection in my code. The bit is sampled on the rising edge and the start of a bit is on the falling edge. During the ACK bit the gpio on the Pico is switched to output to drive the CEC line low.</p> <p>The packet shown above is the first packet the camera sends after sending the BMD vendor comand. When connected to the ATEM the next thing that happens is that the ATEM sends a few unknown bytes back. If I don't reply on this packet the camera will restart the whole CEC session after 1.5 seconds by broadcasting it's physical address again.</p> <p>So the first thing I tried sending back to the camera is CEC operation <code>0x9F</code>. This is requesting the CEC version of the device. It turns out the CEC implementation in Blackmagic Design cameras is quite out of spec. It acks my data and then proceeds to never respond to CEC data again. Technically following the CEC 1.4 specification it was already out-of-spec because of sending a vendor command without first requesting the vendor id of the device it's connected to.</p> <p>So since it's no longer really speaking CEC at this point I started looking into replaying some of the messages I had captured to see how the camera behaves. There's a few things that are sent from the ATEM to the camera directly after startup that don't seem to be correlated to any camera config.</p> <p>The first thing is operation <code>0x01</code> directly after sending the vendor command. Then operation <code>0x50</code> and <code>0x70</code> and lastly another full CEC vendor command but to the broadcast address instead. After some testing it looks like operation <code>0x01</code> is required to make the camera "accept" the connection. It stops the restart of the session after 1.5 seconds. I can't figure out what operation 50 and 70 do but leaving those out does not seem to change anything.</p> <p>The broadcasted vendor command is the tally data which I also can ignore for now. The next command I sent is <code>0x01 0x52 0x00</code> which sets the gain of the camera. By sending this directly after receiving the <code>0x02</code> the camera sends on startup the gain on the camera display changes!</p> <h2>Figuring out more opcodes</h2> <p>Now I have a somewhat working setup I tried once again changing settings in Atem software control and observing the CEC data. With this process I figured out 35 opcodes.</p> <p>The reference document for this is "<a href="">Blackmagic Camera Control Developer Guide</a>". This document does not have any information on the HDMI protocol but it does document how to control the camera over SDI. The most important thing is the list of parameters that can be sent to the device. I was hoping the bytes in the HDMI protocol relate to information in that document in any way but it seems not.</p> <p>It looks like the developers at Blackmagic Design created a completely new protocol for the CEC communication to deal with the update speed. The CEC bus is roughly 300 baud and the commands that are sent over the very fast SDI interface are just too long to have reasonable response times. The gain command in the CEC protocol has only a single data byte and that takes nearly a tenth of a second to transmit, the same command over SDI is 8 bytes long already.</p> <p>While dumping the CEC traffic I also noticed some commands are significantly longer. An int16 value being 10 bytes long in this case. All these long commands for various features are also all on opcode <code>0x04</code>. After looking at the similarities and differences on these long packets I noticed that this is just a passthrough command for the older SDI commands.</p> <pre><code># Parameter 0.4 on HDMI (Aperture ordinal) 01:04:81:03:ff:00:04:02:00:10:00 # Parameter 0.4 on SDI 03:05:00:00:00:04:02:10:00</code></pre> <p>The byes are in different order and there's some extra static bytes for some reason but it does seem to map to the documentation. Sending one of these packets takes roughly 300ms, which is why this is not used for parameters you control using sliders or wheels in the control interface.</p> <p>The whole list of parameters I found is:</p> <pre><code>05 00 00 00 00 00 00 00 00 # Reset lift 06 00 00 00 00 00 00 00 00 # Reset gamma 07 00 00 00 00 00 00 00 00 # Reset gain 0D xx xx # Lift R 0E xx xx # Gamma R 0F xx xx # Gain R 11 xx xx # Lift G 12 xx xx # Gamma G 13 xx xx # Gain G 15 xx xx # Lift B 16 xx xx # Gamma B 17 xx xx # Gain B 19 xx # Lift W 1A xx # Gamma W 1B xx # Gain W 1D xx xx # Hue 1E xx xx # Saturation 1F xx xx # Pivot 20 xx xx # Contrast 21 xx xx # Luma mix 33 1E # Show bars 33 00 # Hide bars 40 xx xx # Absolute focus (ID 0.0) 41 xx xx # Focus distance 42 XX XX # Iris (ID 0.2) 43 # Trigger instant auto-iris (ID 0.5) 44 XX XX # Set absolute zoom in mm (ID 0.7) 46 xx xx # Zoom 47 # Trigger autofocus 52 xx # Gain 54 xx # Temperature 55 xx # Tint 56 # Trigger AWB (ID 1.3) 57 xx xx xx xx # Shutter (ID 1.5) 58 xx # Detail 0, 1, 2, 3</code></pre> <p>The exact encoding for the bytes still need to be figured out but it seems to mostly follow the encoding described in the SDI command manual.</p> <p>All that's left is implementing a nice API for this in the Pi Pico to automate camera control :)</p> Mobile Linux camera pt6 BraamWed, 08 Mar 2023 15:57:25 -0000<p>The processing with postprocessd has been working pretty well for me on the PinePhone. After I released it I had someone test it with the dng files from a Librem 5 to see how it deals with a completely different input.</p> <p>To my suprise the answer was: not well. With the same postprocessing for the PinePhone and the Librem 5 the Librem 5 pictures are turning out way too dark and contrasty. The postprocessd code is supposed to be generic and has no PinePhone specific code in it.</p> <p>Fast forward to some time later I now have a Librem 5 so I can do more camera development. The first thing to do is the sensor calibration process I did with the PinePhone in <a href="">part 4</a> of this blog series. This involves taking some pictures of a proper calibration target which in my case is an X-rite ColorChecker Passport and feeding that into some calibration software.</p> <p>Because aligning color charts and making sure all the file format conversions with the DCamProf calibration suite from RawTherapee is quite annoying I got the paid graphical utility from the developers. By analyzing the pictures the software will generate a lot of calibration data. From that currently only a small part is used by Megapixels: the ColorMatrix and ForwardMatrix.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Calibration output snippet</figcaption></figure> <p>These are 3x3 matrices that do the colorspace conversion for the sensor. I originally just added these two to Megapixels because these have the least amount of values so they can fit in the camera config file and they have a reasonable impact on image quality.</p> <p>The file contains two more important things though. The ToneCurve which converts the brightness data from the sensor to linear space and the HueSatMap which contains three correction curves in a 3 dimensional space of hue, saturation and brightness, this obviously is the most data.</p> <h2>What is a raw photo?</h2> <p>The whole purpose of Megapixels and postprocessd is take the raw sensor data and postprocess that with a lot of cpu power after taking the picture to produce the best picture possible. The processing of this is built on top of existing open source photo processing libraries like libraw.</p> <p>The expectations this software has for "raw" image data is that it's high bit depth linear-light sensor data that has not been debayered yet. The data from the Librem 5 is exactly this, the PinePhone sensor data is weirder.</p> <p>Unlike most phones that have the camera connected over MIPI-CSI which gives a nice high speed serial connection to push image data, the PinePhone is connected over a parallel bus. </p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Rear camera connection from the PinePhone 1.2 schematic</figcaption></figure> <p>This parallel bus provides hsync/vsync/clock and 8 data lines for the image data. The ov5640 sensor itself has a 10-bit interface though:</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>The D[9:0] is the 10 image data lines from the sensor</figcaption></figure> <p>Since only 8 of the 10 lines are available in the flatflex from the sensor module that has the ov5640 in it the camera has to be configured to output 8-bit data. I made the assumption the sensor just truncates two bits from the image data but from the big difference in the brightness response I have the suspicion that the image data is no longer linear in this case. It might actually be outputing an image that's not debayered but <i>does</i> have an sRGB gamma curve.</p> <p>This is not really a case that raw image libraries deal with and it would not traditionally be labelled "raw sensor data". But it's what we have. But instead of making assumptions again lets just look at the data.</p> <p>I have pictures of the colorchecker for both cameras and the colorchecker contains a strip of grayscale patches. With this it's possible to make a very rough estimation of the gamma curve of the picture. I cropped out that strip of patches from both calibration pictures and put them in the same image but with different colors. I also made sure to rescale the data to hit 0% and 100% with the darkest and brightest patch.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>Waveform for the neutral patches, green is the PinePhone and pink is the Librem 5</figcaption></figure> <p>The result clearly shows that the the data from the PinePhone is not linear. It also shows that the Librem 5 is also not linear but in the opposite direction.</p> <p>These issues can be fixed though with the tonecurve calibration that's missing from the current Megapixels pictures</p> <h2>postprocessd is not generic after all</h2> <p>So what happened is that I saw the output of postprocessd while developing it and saw that my resulting pictures were way too bright. I thought I must've had a gamma issue and added a gamma correction to the code.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>With this code added it looks way better for the PinePhone, it looks way worse for the Librem 5. This is all a side effect of developing it with the input of only one camera. The correct solution for this is not having this gamma correction and have the libraw step before it correct the raw data according to the tonecurve that's stored in the file.</p> <h2>Storing more metadata</h2> <p>The issue with adding more calibration metadata to the files is that it doesn't really fit in the camera ini file. I have debated just adding a quick hack to it and make a setting that generates a specific gamma curve to add as the tone curve. This will fix it for my current issue but to fix it once and for all it's way better to include <i>all</i> the curves generated by the calibration software.</p> <p>So what is the output of this software? Lumariver Profiler outputs .dcp files which are "Adobe Digital Negative Camera Profile" files. I have used the profile inspection output that turns this binary file into readable json and extracted the matrices before. It would be way easier to just include the .dcp file alongside the camera configuration files to store the calibration data.</p> <p>I have not been able to find any official file format specification for this DCP file but I saw something very familiar throwing the file in a hex editor... The file starts with <code>II</code>. This is the byte order mark for a TIFF file. The field directly after it is not 0x42 though which makes this an invalid TIFF file. It turns out that a DCP file is just a TIFF file with a modified header that does not have any image data in it. This makes the Megapixels implementation pretty easy: read the TIFF tags from the DCP and save them in the DNG (which is also TIFF).</p> <p>In practice this was not that easy. Mainly because I'm using libtiff and DCP is <i>almost</i> a TIFF file. Using libtiff for DNG files works pretty well since DNG is a superset of the TIFF specification. The only thing I have to do is add a few unknown TIFF tags to the libtiff library at runtime to use it. DCP is a subset of the TIFF specification instead and it is missing some of the tags that are required by the TIFF specification. There's also no way in libtiff to ignore the invalid version number in the header.</p> <p>So I wrote my own tiff parser for this. Tiff parsers are quite hard since there's an enormous amount of possiblities to store things in TIFF files. Since DCP is a smaller subset of TIFF it's quite reasonable to parse it manually instead. A parser for the DCP metadata is around 160 lines of plain C, so that is now embedded in Megapixels. The code searches for a .dcp files associated with a specific sensor and then embeds the calibration data into the generated DNG files. If the matrices are also defined in the camera ini files then those are overwritten by the ones from the DCP file.</p> <h2>Results</h2> <p>The new calibration work is now in <a href="">megapixels#30</a> and needs to go through the testing and release process now. There's also a release for postprocessd upcoming that removes the gamma correction.</p> <p>For the Librem 5 there's <a href="">millipixels#88</a> that adds correct color matrices for now until that has the DCP code added. </p> Sensors and PCB design BraamThu, 02 Mar 2023 20:09:12 -0000<p>I do a lot of software development, I do enough of it to be comfortable with the process. I have worked on enough projects and have made enough pieces of software that it's quite easy to quickly spin up a new project to fix a specific issue I'm having.</p> <p>Hardware design is a whole other can of worms. I have played with things like the Arduino and later the STM32 chips, ESP8266 and RP2040. These things are quite neat from a programmers perspective. You write your code in c++ and the included toolchain figures out all the hard parts and flashes the board. Hardware design is also quite simple since it's mostly putting together different hardware modules and breakout boards. Only a basic understanding of the hardware busses is required to get stuff up and running.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>Projects usually look like the picture above. No resistors, capacitors or other components are needed. Just a breadboard (also optional), jumper wires and the IO modules you need to make your idea work.</p> <p>Power is dealt with by the microcontroller board, the modules already include the pull-up resistors and power regulators required and the programmer is also included already.</p> <p>The software side is also simplified and abstracted to the point where it's harder to get your average javascript project working than the embedded projects. You'd never have to open the 440 page atmega 328p manual to write the firmware. You don't even need to know about the register map for GPIO pins since things like digitalWrite(pin, state) exist.</p> <p>It's easy to get a complete prototype running like this but then what?</p> <h2>Putting things in production</h2> <p>By far the easiest way to put the project in production is just shoving the breadboard in a project box and declaring it done. It works but it's not <i>neat</i>. This works if you only need one and it's for yourself.</p> <p>When you need 5? 10? 30? of the same thing then breadboarding it will become a lot less practical. In some cases it's easy enough to just stick the modules on a piece of protoboard and solder it together. When that doesn't scale another option is putting the modules on a custom PCB.</p> <p>This is exactly what I did for a project. This is the first PCB I have ever designed and put into production:</p> <figure class="kg-card kg-gallery-card"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="450"></div><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="450"></div><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="450"></div></div></div></figure> <p> It's a PCB that <i>only</i> has pinheaders and connectors on it. This board would have a nodeMCU module and an off-the-shelf rs485 module on it. The board itself is incredibly simple but it did give me a chance to run through the complete process of going from a schematic to an actual physical product. Since it has no actual components on it it has removed a lot of worry if the board will work at all.</p> <p>In this case the board was designed because I needed 15 of this design and this makes a lot more reliable and easy to maintain with the possibility to have neat screw terminals on it for the external connections.</p> <h2>Optimizing the design more</h2> <p>The above design was simple enough to not worry about optimizing it, but another project I wanted to take on is replacing my various sensor nodes with a neater custom design.</p> <figure class="kg-card kg-gallery-card"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="450"></div><div class="kg-gallery-image"><img src="" class="kg-image" width="600" height="423"></div></div></div></figure> <p>These boards have been put together around various ESP8266 dev boards, most of them are nodeMCU v3 boards, some are off-the-shelf boards that I receive data from with an rtl-sdr.</p> <p>My plan is to make the one board to rule them all. A custom designed board that makes it easy to connect my existing sensors and allows a bit of extension. The design is based around an ESP-12F module since all my existing sensor code is already written for the various ESP8266 boards. It's also in stock at LCSC so that makes it a lot easier to get fabricated.</p> <p>The design goals for this custom board is:</p> <ul><li>On-board programmer for development</li> <li>Have the ESP module hooked up so it can deep sleep properly</li> <li>Have a built in temperature sensor. Most of my external sensor boards are already temperature/humidity sensors so it makes sense to just include it since it&#x27;s a cheap chip.</li> <li>Have a screw terminal for hooking up onewire devices</li> <li>Expose a silkscreen marked SPI and I2C bus</li> <li>Be able to be powered with an USB Type-C cable or a single lithium cell.</li> <li>Have a built-in charger for the lithium cell.</li> <li>The board should be able to be assembled by <a href="">JLCPCB</a> and <a href="">Aisler</a></li> </ul> <p>For implementing this I mostly looked at the nodeMCU v3 schematic as reference and also made the basic layout of the PCB similar to that board. Initially I wanted to add the very popular DS18B20 onewire temperature sensor on-board but that got replaced with the SHT30 module that also includes a humidity sensor and connects to the I2C bus instead.</p> <p>For the programming and USB part I included the CP2102 USB-to-serial converter chip that was also included on the nodeMCU board. It is probably overkill for this application though since I'm not using any of the functionality of that chip beside the RX/TX pins. I dropped the auto-reset circuit since I was worried it would interfere with the deep sleep reset circuit that I found more important.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>One simple change from the nodeMCU design is that I swapped the USB Micro-B connector for a Type-C connector which requires a bit more PCB routing to deal with the reversable plug and two extra resistors to make it up to USB spec. The Type-C spec requires a 5.1kΩ resistor to ground on both CC pins (individually, don't cheap out and connect both CC pins to the same resistor like on the Raspberry Pi)</p> <p>Most of this design work is simply reading the datasheets for the components and following the recommendations there. Some datasheets even have bits of recommended PCB layout in it to make the PCB layout easier.</p> <p>Since all the complicated ESP8266 circuitry is already handled on the SoM I used and the USB-to-serial converter is laid out, the rest is simple connectors. Except... there's power stuff to deal with.</p> <p>Most of the difficulty with this design is figuring out the power design. If the battery connection was not included it would've been relatively straightforward. Just have a power regulator that converts the 5V from the USB connector to the 3.3V for the rest of the board. This is exactly what the nodeMCU board does.</p> <p>In my case the regulator has to deal with power coming from either USB or a single Lithium bringing the input voltage range from 3.4-ish to 5V. There's also a charger chip included to charge the connected cell and a bit of circuitry to switch between the battery and USB power.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>The block diagram for the sensor board</figcaption></figure> <p>The diagram above shows the final power design. It has the MCP1700 3.3V regulator to replace the LM1117 regulator from the nodeMCU design, the MCP part has a way lower quiescent current which helps with the battery life. The MCP73832 is a single cell li-ion/li-po charger that requires very few external components. In this design it's programmed to charge at 400mA which is not super fast but it should last some months on a single charge anyway.</p> <p>The magic part is this circuit for the power switching:</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>This receives 5V from the USB connector or the battery power (or both) and feeds them into the regulator to generate the 3.3v rail. The MOSFET will disconnect the battery power when 5V is supplied and the diode makes sure that the power of the battery won't ever flow back into the 5V part of the schematic.</p> <p>Since there's also a single analog input on the ESP8266 I thought it would be a good idea to use it to read the battery level. This is yet again a decision that added a bunch more complexity. The theory for this is pretty simple: you add a resistor divider to bring the 0V-4.3V of the battery to the 0V-1V range of the analog input and then you'd read the battery voltage with a bit of code. The issue of this is that the resistor divider will not only do the voltage scaling for the ADC but will also be a path for the battery to leak power.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>This is the part that's added to fix that. With a few extra parts the resistor divider can be completely disconnected from the battery stopping the power leakage. The software on the ESP will have to first enable the measuring circuit by toggleing a gpio, reading the value with the ADC and then disableing the gpio again. Since there's already 5.1kΩ resistors on the board I'm using another one here together with a 100kΩ. This re-use of resistor values is done since some manufacturers charge extra per distinct part model used. This brings the voltage range of the battery down to about 0-0.2v. This seemed fine at the time since the ESP has a 10-bit ADC.</p> <p>With using only a fifth of the ADC range the ADC resolution is already brought down to slightly less than 8 bits. But batteries usually don't go down to zero volts. The normal voltage range is 3-4.2V giving a voltage difference of 0.06V after the voltage divider making the result slightly less than 6 bits of resolution. This does not account for ADC noise in the system yet. The result is that the final battery level graph is a bit choppy but at least it functions.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <p>The last thing left is connectors. I added a footprint for some screw terminals for the OneWire connection. This footprint is basically the same as a pinheader except it has a fancy outline. For the battery I added a JST-PH connector because it seemed the most common one for batteries.</p> <p>The I2C and SPI bus don't have standardized connectors though. But, through a bit of searching I found the <a href="">pmod spec</a>. This is an open specification made by Digilent to connect modules to their FPGA development boards. It's perfect for this usecase since the interface is just pinheaders with a defined pinout and it defines the power supply as 3.3V which I already have on the board. Some of the sensors I want are even available as pmod modules.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>The two pmod headers on the board</figcaption></figure> <h2>Getting the boards made</h2> <p>After laying out the PCB another complicated process begins. Getting it actually fabricated and assembled. The board has been designed that it can be hand-soldered if needed. No BGA parts are used and the <code>_handsolder</code> variants of the footprints are used for the common parts, which are slightly larger.</p> <p>To not have vendor lock-in the board is designed to be assembled by both JLCPCB and Aisler. JLCPCB is the very cheap option in this case and Aisler is quite neat since it's actually in The Netherlands. The Aisler design rules are less forgiving than JLCPCB so I used those for the board. It mostly mean that I can't use the absolutely tiny vias that JLCPCB can drill.</p> <p>For the assembly service metadata has been added to the schematic. For JLCPCB a <code>LCSC</code> column was added to the part data that contains the part code for ordering from LCSC. For Aisler the parts come from the other big parts warehouses instead like Mouser. For that the <code>MPN</code> column is added that contains the manufacturer part number.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>With this data and the selected footprints the board can be assembled. I left out all the part numberes for the through-hole parts since those are either more expensive or impossible to assemble and also pretty easy to solder manually if required on that specific board.</p> <p>To actually get it made Aisler has a nice plugin for the PCB editor in Kicad: <a href="">Aisler Push</a>. With this it's just a single button and the Kicad file will be sent off to Aisler where their servers will generate the necessary fabrication files from it. From there it's using the Aisler website to fix up the MPN matching with actual parts from various suppliers and pressing order.</p> <p>For JLCPCB the process is more complicated. The fabrication files have to be generated manually. There's <a href="">a tutorial</a> for going through the various steps of generating the ~10 files you need and then those can be zipped up and uploaded to the website. Since the LCSC part codes are completely unique the assembly step of this process Just Works(tm) without having to adjust anything on the website.</p> <p>If Aisler had an option to specify the exact part match in the schematic metadata instead it would probably be the easiest option in this case so I hope that gets figured out in the future. Two weeks later I had the board :)</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <h2>The Firmware</h2> <p>For the firmware of this board I just re-used the one I had previously written for one of the nodeMCU based sensor boards and extended it for the sleep features. It's pretty basic firmware that does the standard WiFi connection things for the ESP module and connects to my MQTT server to push the sensor readings in a loop.</p> <p>For the battery operated boards that wouldn't be enough though. With that firmware the battery would run out in hours. There's a great series of blog posts from <a href="">Oppoverbakke </a>that go into great detail on how to optimize deep sleep on the ESP module. Especially the last post on <a href="">avoiding WiFi scanning</a> is very helpful for reducing power use.</p> <p>To avoid as many delays as possible the WiFi accesspoint mac address and channel are saved into the memory of the RTC inside the module which is the only part of the chip left powered in the deepest sleep mode. Then with that information the module will save around one second of time being awake and connecting. Another optimisation is using a static ip address instead of DHCP to save another second.</p> <p>I don't like having static ip addresses in my network. I have static DHCP leases for everything instead. This is why I extended the RTC connection code to do a DHCP request the first time and then save the received DHCP data in the RTC memory together with a counter to re-do DHCP after some time. This means that my board is still a fully DHCP compliant device without making a DHCP request every time it wakes up, this is what the lease time is for after all. Thanks to this I only have to power down the board when I need to change the IP address instead of reflashing it or building in some configuration web interface.</p> <p>During deep sleep the whole board uses 19.1µA with a half charged battery in my tests and the transmission is down to to a bit less than a second at 70mA, but the power use of the transmissions varies quite a lot.</p> <p>I had one module in production outdoors quickly to test. It sleeps for one minute in deep sleep and doesn't have any other optimisations. Looking at the data from that module it looks like it would last for about two weeks, even with the sub-zero temperatures during the night.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <p>The battery life for the optimized version has yet to be seen. I have calculated it to be around 3 months but it would take a while to have conclusive results on that. This board uses a "3500mAh" 18650 cell which I assume in the calculations is actually 2000mAh.</p> <h2>Next revisions</h2> <p>During testing and programming I found out I had made a few mistakes with the board design. The most problematic one is that I put zener diodes on the UART lines between the USB-to-serial converter and the ESP module. This was to prevent power from the serial lines to flow into the CP2102 module and powering up that chip partially.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>This did not work. When I got the boards I spend a bit of time figuring out why the programmer couldn't connect. The voltage drop through the diodes I picked is probably too much for the serial communication to work. To make things worse the diodes are completely unnecessary for this since the uart lines won't leak power in deep sleep anyway. Luckily this is easily fixable on my boards by making a solder bridge across the diodes.</p> <p>Another issue on the board is that the battery charger chip gets very hot and is too close to the temperature sensor. I also forgot to add the thermal relief specified in the layout suggestions of the part so while charging that chip is practically constantly at 80°C making the value of the temperature sensor raise by around 5°C. Since this only affects the board when the battery is charging it's not a critical fault.</p> <p>For cost optimization I skipped battery protection on the board. For my uses I just ordered a protected 18650 cell so it should be fine. But since I'm making a second revision board to fix these issues anyway I decided to include a battery protection chip this time around.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>The protection chip sits between the ground terminal of the battery and the actual board ground and disconnects the battery pack in case of overcurrent, short circuits, overcharge or undercharge.</p> <p>Another small change I did is add an extra pinheader to the board to access the UART lines directly, since pinheaders are basically free anyway.</p> <h2>You should make your own boards</h2> <p>Making boards is fun. I learned a lot during the process and now have some important skills to use in my other projects. Can you really call yourself a full stack programmer if you haven't designed the hardware yourself? :D</p> Alpine Linux is pretty neat BraamWed, 01 Feb 2023 15:44:57 -0000<p>I've used various Linux distributions in the past, starting with a Knoppix live CD a long time ago. For a long time I was an Ubuntu user (with compiz-fusion ofcourse), then I used Arch Linux for years thinking it was the perfect distribution. Due to postmarketOS I found out about Alpine Linux and now after using that for some years I think I should write a post about it.</p> <h2>Installing</h2> <p>Ubuntu has the easy graphical installer of course. Installing Arch Linux the first time is quite an experience the first time. I believe Arch since has added a setup wizard now but I have not tried it.</p> <p>Installing Alpine Linux is done by booting a live cd into a shell and installing from there just like Arch but it provides the <code>setup-alpine</code> shell script that runs you through the installation steps. It's about as easy as using the Ubuntu installer if you can look past the fact that it's text on a black screen.</p> <p>A minimal Alpine installation is quite small, that combined with the fast package manager makes the install process really really quick.</p> <h2>Package management</h2> <p>The package management is always one of the big differentiators between distributions. Alpine has it's own package manager called APK, the Alpine Package Keeper. While it's now confused with the android .apk format it predates Android by two years.</p> <p>The package management is pretty similar to Archlinux in some aspects. The APKBUILD package format is very similar to the pkgbuild files in Arch and the packages support similar features. The larger difference is the packaging mentality: Archlinux prefers to never split packages, just one .pkg.tar.zst file that contains all the features of the application and all the docs and development headers. Alpine splits out all these things to subpackages and the build system warns when the main package contains any documentation or development files.</p> <p>For a minimal example of this let's compare the tiff library. In Alpine Linux this is split into 5 packages:</p> <ul><li><code>tiff</code>, the main package that contains [460 kB]</li> <li><code>tiff-dev</code>, the development headers [144 kB]</li> <li><code>libtiffxx</code>, the c++ bindings [28 kB]</li> <li><code>tiff-doc</code>, the documentation files [5.21 MB]</li> <li><code>tiff-tools</code>, command line tools like ppm2tiff [544 kB]</li> </ul> <p>In Arch Linux this is a single package called <code>libtiff</code> that's 6.2 MB. For most Linux users you'd never need the library documentation which takes the most space in this example.</p> <p>The end result is that my Archlinux installations are using around 10x the disk space my Alpine installations use if I ignore the home directories.</p> <p>Some more differences are that Alpine provides stable releases on top of the rolling <code>edge</code> release branch. This improves reliablity a lot for my machines. You wouldn't normally put Arch Linux on a production server but I found Alpine to be almost perfect for that usecase. Things like the <code>/etc/apk/world</code> file makes management machines easier. It's basically the <code>requirements.txt</code> file for your Linux installation and you don't even need to use any extra configuration management tools to get that functionality.</p> <p>There's also some downsides to <code>apk</code> though. Things I'm missing is optional packages and when things go wrong it has some of the most useless error messages I've encountered in software: <code><code>temporary error (try again later)</code></code> . Throwing away the original error and showing "user friendly" messages usually does not improve the situation.</p> <h2>Glibc or not to glibc</h2> <p>One of the main "issues" that get raised with Alpine is that it does not use glibc. Alpine Linux is a musl-libc based distribution. In practice I don't have many problems with this since most my software is just packaged by in the distribution so I wouldn't ever see that it's a musl distribution. </p> <p>Issues appear mostly when trying to run proprietary software on top of Alpine or software that's so hard to build that you're in practice just getting the prebuilds. The solution to proprietary software is... don't use proprietary software :)</p> <p>For the cases where that's not possible there's always either Flatpak or making a chroot with a glibc distribution in it.</p> <h2>Systemd</h2> <p>Beside not using glibc there's also no systemd in Alpine. This is one of the things I miss the most actually. I don't enjoy the enormous amount of different "best practices" for specifying init scripts and the bad documentation surrounding it. So far by best solution for creating idiomatic init scripts for alpine is just submitting something to the repository and wait until someone complains about style issues.</p> <p>Beside that I'm pretty happy with the tools openrc provides for manageing services. The <code>rc-update</code> tool gives a nice consise overview of enabled boot services and the <code>service</code> tool just does what I expect. It seems like software is starting to depend on systemd restarting it instead of fixing memory leaks which causes me some issues sometimes.</p> <h2>Conclusion</h2> <p>Alpine Linux is neato. I try to use it everywhere I can.</p> Rewriting my blog again BraamMon, 05 Dec 2022 18:17:29 -0000<p>For quite a while my blog has run on top of Ghost. Ghost is a blog hosting system built on top of nodejs. It provides a nice admin backend for editing your posts and it comes with a default theme that's good enough to not care about theming.</p> <p>My blog has been through some rewrites, from Drupal to Jekyll to various PHP based CMS platforms and finally it ended up on Ghost. I like Ghost quite a lot but it has a few flaws that made me decide to replace it.</p> <ul><li>It&#x27;s nodejs. I know a lot of people like javascript for some reason but it&#x27;s quite horrific for system administration. I do not like having to add third party repositories to my system to get a recent enough nodejs build to host a simple website. I&#x27;m pretty sure this dependency on the latest and greatest is the main reason why everyone wants to stuff their project in docker and not care about the dependencies.</li> <li>It only officially supports Ubuntu to install on (aside from Docker). It was already annoying to get it installed on Debian but now I&#x27;ve moved over most of my servers to Alpine this has become even more of a pain. There&#x27;s no reason this has to integrate so deeply with the distribution that it matters what I&#x27;m running.</li> <li>It depends on systemd, this is largely part of the Ubuntu dependency above and not needing to implement something else. But from running Ghost in openrc I&#x27;ve noticed that it depends on the automatic restarts done by systemd when the nodejs process crashes.</li> <li>I&#x27;m just running a simple blog which is fully static content. this should not require running a permanent application server for availability of the blog and the dependency on Mysql is also very much overkill for something that&#x27;s so read-heavy.</li> <li>I don&#x27;t particularly like the templating system for adjusting the theme. I&#x27;ve been running a fork of the casper default theme for a while that just rips out the javascript files. This breaks almost nothing on the default theme except the infinite scroll on the homepage.</li> </ul> <h2>Switching to something different</h2> <p>I could try to switch to yet another blogging platform. The issue is that I really really like writing posts in the Ghost admin backend. With a casual look at the browser dev console another solution became obvious...</p> <p>The whole Ghost admin backend is neatly packaged into four files:</p> <ul><li><code>ghost.min.js</code></li> <li><code>vendor.min.js</code></li> <li><code>ghost.min.css</code></li> <li><code>vendor.min.css</code></li> </ul> <p>Since this is a fully standalone client side webapp thing I can just host those four files and then write a Python Flask application that implements the same REST api that the nodejs backend provided.</p> <p>This is exactly what I did, the whole backend is now a small Flask application that implements the Ghost API in ~500 lines of Python. It's called Spook (since this is dutch for ghost).</p> <p>This also uses an SQLite database as storage backend for the whole system. This Flask application also does not implement any of the website rendering. It just stores the state of the Ghost backend and does media uploads. To actually get a website from this it implements a hook script that gets called whenever anything on the website changes. With that it's possible to use any static site generator to generate a static website from this dataset.</p> <h2>Reimplementing casper</h2> <p>As static site generator I used a second Flask webapplication. This one uses the Flask-Frozen module to generate a static website, this is why this module has the name <code>frozen_casper</code>. This generator reads the SQLite database from Spook and generates static HTML compatible with the stylesheet from the stock casper theme from Ghost. It also generates an identical RSS feed so the RSS subscribers can be migrated over with identical post unique ids.</p> <p>I did make some modifications to the generated html like implementing very basic paging on the posts list pages instead of relying on javascript infinite scrolling.</p> <p>The <code>spook</code> and <code>frozen_casper</code> module together replaces a complete Ubuntu Server install, a mysql instance I had to keep giving RAM and a crashy nodejs webapplication. Since today this new system is hosting this blog :)</p> Taking a good picture of a PCB BraamSun, 27 Nov 2022 10:49:01 -0000<p>Pictures of boards are everywhere when you work in IT. Lots of computer components come as (partially) bare PCBs and single board computers are also a popular target. Taking a clear picture of a PCB is not trivial though. I suspect a lot of these product pictures are actually 3d renders.</p> <p>While updating <a href="">hackerboards</a> I noticed not all boards have great pictures available. I have some of them laying around and I have a camera... how hard could it be?</p> <p>Definitely the worst picture is the one in the header above. Taken with a phone at an angle with the flash in bad lighting conditions. I've taken quite a bunch of pictures of PINE64 boards, like some of the pictures in the header of the pine64 subreddit and the picture in the sidebar. I've had mixed results with taking the pictures but the best results I've had with taking board pictures is using an external flash unit. </p> <h2>The ideal setup</h2> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>So to create a great picture I've decided to make a better setup. I've used several components for this. The most important one is two external flashes controlled with a wireless transmitter. I've added softboxes to the flashers to minimize the sharp shadows usually created when using a flash. This produces quite nice board pictures with even lighting. </p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>All dust looks 1000x worse on pictures :(</figcaption></figure> <p>For all the pictures from this setup I've used a 50mm macro lens. Not only is it great for getting detail pictures of the small components, it's also the lens with the least distortion I have. Having less distortion in the lens is required to have a sharp picture all the way to the edges of the board and not have the edges of the board look curved. The curvature can be fixed in software but the focal plane not being straight to the corners can't be fixed.</p> <p>It's possible to get even less shadows on the board by using something like a ring light, while this gets slightly more clarity I also find this just makes the pictures less aesthetically pleasing.</p> <p>So how to deal with the edges of the board? For a lot of website pictures you'd want a white background. I have done this by just using a sheet of paper and cleaning up the background using photo editing software. This is quite time consuming though. The usual issues with this is that the background is white but not perfectly clipped to 100% pure white in the resulting picture. There's also the issue of the board itself casting a slight shadow.</p> <p>I took my solution for this from my 3D rendering knowledge (which is not much). You can't have a shadow on an emitter. To do this in the real world I used a lightbox.</p> <p>Lightboxes are normally for tracing pictures and are quite easy to get. It doesn't give me a perfectly white background but it gets rid of the shadows at least.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>To get this from good to perfect there's another trick though. If I take a picture without the flashes turned on but everything else on the same settings I get a terribly underexposed picture... except for the background.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>You never notice how many holes there are in a PCB until you put it on a lightbox</figcaption></figure> <p>All I need to do to get a clean background is increasing the contrast of this picture to get a perfect mask. Then in gimp I can just overlay this on the picture with the layer mode set to lighten only.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>The final composite</figcaption></figure> <p>It's also possible to use the mask picture as the alpha channel for the color picture instead. This works great if there's a light background on the website, it shows the flaws though when the website has a dark background.</p> <p>Let's create the worst-case scenario and use a pure black background:</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>Now edges are visible on the cutouts. Due to the mismatch in light color temperature with the light box the edges are also blue here. A lot of edges can be fixed by running the dilate filter in gimp on the mask layer to make the mask crop into the board by one pixel. it makes the holes in the board too large though. To get this perfect manual touchup is still required.</p> <h2>Automating it further</h2> <p>Now the input data is perfect enough that I can make the cutout with a few steps in gimp it's also possible to automate this further with the magic of ImageMagic.</p> <pre><code>$ convert board.jpg \( mask.jpg \ -colorspace gray \ -negate \ -brightness-contrast 0x20 \ \) \ -compose copy-opacity \ -composite board.png</code></pre> <p>This loads the normal picture from board.jpg and the backlit picture as mask.jpg and composites them together into a .png with transparency.</p> <p>But it can be automated even further! I still have a bit of camera shake from manually touching the shutter button on the camera and I need to remember to take both pictures every time I slightly nudge the device I'm taking a picture of.</p> <p>The camera I'm using here is the Panasonic Lumix GX7. One of the features of this camera is the built-in wifi. Using this wifi connection it's possible to use the atrocious Android application to take pictures and change a few settings.</p> <p>After a bit of reverse engineering I managed to create a <a href="">Python module</a> for communicating with this camera. Now I can just script these actions:</p> <pre><code>import time from remotecamera.lumix import Lumix # My camera has a static DHCP lease camera = Lumix(&quot;;) camera.init() camera.change_setting(&#x27;flash&#x27;, &#x27;forcedflashon&#x27;) camera.capture() time.sleep(1) camera.change_setting(&#x27;flash&#x27;, &#x27;forcedflashoff&#x27;) camera.capture()</code></pre> <p>Now I can just run the script and it will take the two pictures I need. It's probably also possible to fetch the images over wifi and automatically trigger the compositing but it sadly requires changing the wifi mode on the camera itself between remote control and file transfer.</p> <h2>Not just for PCBs</h2> <p>This setup is not only useful for PCB pictures. This is pretty great for any product picture where the product fits on the lightbox.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>Here's the composite of a SHIFT 6mq. Pictures of the screen itself are still difficult due to the pixel pattern interfering with the pixels of the camera sensor and the display reflecting the light of the flashes. This can probably be partially fixed once I get a polarizing filter that fits this lens.</p> Finding an SBC BraamTue, 15 Nov 2022 23:24:56 -0000<p>A long long time ago on a server far away there was a website called Board-DB. This website made in 2014 was a list of single board computers that became popular after the whole Raspberry Pi thing happened.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Board-DB in 2014</figcaption></figure> <p>The website itself was pretty simple, just a list of devices and a simple search box with a few filters. The magic was in the well maintained dataset behind it though. After some time this website seemed to have gone away and I had not thought about it for quite a bit until I happened to learn that the owner of <a href=""></a> originally ran the Board-DB website.</p> <h2>Rewriting</h2> <p>So a plan formed together with the old owner of the website. A good board search website is still after all these years a gap in the market. There's some websites that have these features but I keep running into missing filters. One of the things that annoy me the most about webshops and comparison sites is a lack of good filters and then including the most useless ones like a what color the product is.</p> <p>To redesign this I wanted to go to the other extreme. If you compare two boards and there's not a single different listed spec, then something is missing. The schema for the rewritten database has 165 columns now and I still found some things that are missing in this dataset like detailed PCI-e data.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>The 2022 version of the Board-DB website</figcaption></figure> <p>This new website was written in a timespan of 10 months in some spare time together with Raffaele. At the time of writing it contains data about 466 boards. Not all data is perfect yet since this is a huge research and cleanup job but the website is certainly useful already. As the data in the backend gets more cleaned up more filters in the board list can be added to drill down on the specs you need.</p> <h2>Architecture</h2> <p>The website is a Python Flask webapplication that uses SQLite as storage backend. SQLite is perfect in this case since it's mostly a read-heavy website. The website is designed from the start to run behind a caching reverse proxy like Varnish and makes sure that all the URLs in the site generate the exact same content unless boards are changed.</p> <p>The site is also designed to work without having javascript enabled but does use javascript to improve the experience. It even works in text-mode browsers. Due to it not using any javascript or css frameworks the website is also very light.</p> <h2>The dataset</h2> <p>At the start of the project I thought implementing the facetted search would be a large technical problem to solve but it that turned out to be relatively easy. By far hardest part of this system is adding and cleaning up the data. The original dataset is imported from a huge spreadsheet that with a bunch of heuristics gets converted into more specific data columns in the SQLite database that runs the website. The original dataset did not separate out specific details like which cores a SoC has and how many and what speed the ethernet ports were.</p> <p>Parts of this data was recoverable by searching for keywords in the original description fields in the spreadsheet and other data is manually fixed up with some bulk editing tools.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>One of the hacks in the CSV importer</figcaption></figure> <p>This all ends up getting rendered nicely again on the board detail page</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>I'm quite happy with how the web frontend turned out. It has a decently detailed filter list, an usable spec listing page and a board comparison feature. Getting the project this far would've been impossible without the help of Raffaele.</p> <p>The website is now online on <a href=""></a> </p> Automated Phone Testing pt.5 setupMartijn BraamTue, 25 Oct 2022 00:55:14 -0000<p>Now I've written all the parts for the Phone Test Setup in the previous parts, it's time to make the first deployment.</p> <p>For the postmarketOS deployment I'm running the central controller and mosquitto in a container on the postmarketOS server. This will communicate with a low-power server in a test rack in my office. The controller hardware is an old passively cooled AMD board in a 1U rack case. I sadly don't have any part numbers for the rack case since I got a few of these cases second hand with VIA EPIA boards in them.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>Ignore the Arduino on the left, that&#x27;s for the extra blinkenlights in the case that are not needed for this setup</figcaption></figure> <p>The specs for my controller machine are:</p> <ul><li>AMD A4-5000 APU</li> <li>4GB DDR3-1600 memory</li> <li>250GB SSD</li> <li>PicoPSU</li> </ul> <p>The specifications for the controller machine are not super important, the controller software does not do any cpu-heavy or memory-heavy tasks. The important thing is that it has some cache space for unpacking downloaded OS images and it needs to have reliable USB and ethernet. For a small setup a Raspberry Pi would be enough for example.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Ignore the Aperture sticker, this case is from a previous project</figcaption></figure> <p>This server now sits in my temporary test rack for development. This rack will hold the controller PC and one case of test devices. This rack case can hold 8 phones in total by using 4 rack units.</p> <h2>Deploying the software</h2> <p>After running all the software for the test setup on my laptop for months I now started installing the components on the final hardware. This is also a great moment to fix up all the installation documentation.</p> <p>I spend about a day dealing with new bugs I found while deploying the software. I found a few hardcoded values that had to be replaced with actual configuration and found a few places where error logging needed to be improved a lot. One thing that also took a bit of time is setting up Mosquitto behind an Nginx reverse proxy.</p> <p>The MQTT protocol normally runs on plain TCP on port 1883 but since this involves sending login credentials its better to use TLS instead. The Mosquitto daemon can handle TLS itself and with some extra certificates will run on port 8883. This has the downside that Mosquitto needs to have access to the certificates for the domain and it needs to be restarted after certbot does its thing.</p> <p>Since The TLS for the webapplication is already handled by Nginx running in reverse proxy mode it's easier to just set up Nginx to do reverse proxying for a plain TCP connection. This is the config that I ended up with:</p> <div class="highlight"><pre><span></span><span class="k">stream</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kn">upstream</span><span class="w"> </span><span class="s">mqtt_servers</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="n"></span><span class="p">:</span><span class="mi">1883</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">8883</span><span class="w"> </span><span class="s">ssl</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">mqtt_servers</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kn">proxy_connect_timeout</span><span class="w"> </span><span class="s">1s</span><span class="p">;</span><span class="w"></span> <span class="w"> </span> <span class="w"> </span><span class="kn">ssl_certificate</span><span class="w"> </span><span class="s">/etc/letsencrypt/.../fullchain.pem</span><span class="p">;</span><span class="w"> </span> <span class="w"> </span><span class="kn">ssl_certificate_key</span><span class="w"> </span><span class="s">/etc/letsencrypt/.../privkey.pem</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="kn">ssl_dhparam</span><span class="w"> </span><span class="s">/etc/letsencrypt/ssl-dhparams.pem</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="p">}</span><span class="w"></span> </pre></div> <p>Another thing that had to be done for the deployment is writing actual service files for the components. The init files now sets up openrc in my Alpine installations to supervise the components, deal with logging and make sure the database schema migrations are run on restart.</p> <h2>Putting together the phone case</h2> <p>To neatly store the phones for the test setup I decided to use a 2U rack case since that's just high enough to store modern phones sideways. For this I'm using a generic 2U rackmount project box with the very easy to remember product code G17082UBK. This is a very cheap plastic case with an actual datasheet.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>All the internal and external dimensions for this case are documented</figcaption></figure> <p>I used this documentation to design a tray that fits between the screw posts in this case. The tray is printed in three identical parts and each tray has three slots. I use dovetail mounts to have the phone holders slide on this tray.</p> <p>All this is designed using OpenSCAD. I never liked 3D modelling software but with this it's more like programming the shapes you want. This appeals a lot to me since... I'm a software developer.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>The tray design in OpenSCAD</figcaption></figure> <p>From this design I can generate an .stl file and send it to the 3D printer. The tray can print without any supports and takes about an hour on my printer. So 3 hours later I have the full base of the phone holder in my rack case.</p> <p>To actually mount the phones there can be phone-specific models that grip into this baseplate and hold the phone and extra electronics. I made a somewhat generic phone mount that just needs two measurements entered to get the correct device thickness at two points. This is the holder I'm currently using for the PinePhone and the Oneplus 6.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>The baseplates here are in blue and the phone holder is the green print. This version of the phone holder is designed to hold the Raspberry Pi Pico and has a ring for managing the soldered cables to the device. The size of the PinePhone is about the largest this case can hold. It will fill up the full depth of the case when the USB-C cable is inserted and it also almost hits the top of the case.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>The PinePhone and Oneplus 6 in the test rack case</figcaption></figure> <p>In this case I can hold 8 phones on the trays and have one of the slots on the tray left over to hold a future USB hub board that will have the 16 required USB ports to use all the devices in the case.</p> <p>For small desk setups the a single tray is also pretty nice to hold devices and the phone holder itself will also stand on its own. This is great if you have one to three devices you want to hook up to your laptop for local development.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>The tray is a bit flimsy without being screwed in the rack case but holds up phones great</figcaption></figure> <h2>Running the first test job</h2> <p>So now the case is together and the controller is deployed it's time to run an actual test. For this first test I'll be using Jumpdrive as the test image. This is by far the smallest image available for the PinePhone which makes testing a lot easier. It just boots a minimal kernel and initramfs and the greatest feature for this test: it spawns a shell on the serial console.</p> <p>Since GitHub is not a great hosting platform and the bandwidth limit for the <a href=""></a> repository has been reached it's not possible to fetch the raw file from github without being logged in so this script uses my own mirror of the latest PinePhone jumpdrive image.</p> <pre><code>devicetype: pine64-pinephone boot { rootfs: } shell { prompt: / # success: Linux script { ip a uname -a } }</code></pre> <p>This will boot Jumpdrive and then on the root console that's available over the serial port it will run <code>ip a</code> to show the networking info and then <code>uname -a</code> to get the kernel info. Because the success condition is <code>Linux</code> it will mark the job successful if the uname output is printed.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <p>The serial output also shows another feature of the controller: it will set the environment before executing the script, which in this case is just the <code>CI</code> variable and it will add <code>|| echo "PTS-FAIL"</code> to the commands so non-zero exit codes of the commands can be detected. When <code>PTS-FAIL</code> is in the serial output the task will be marked as failed. Using the <code>success:</code> and <code>fail:</code> variables in the script the success and failure text can be set.</p> <p>With these building blocks for test scripts it's now possible to implement a wide variety of test cases. Due to Jumpdrive not having enough networking features enabled in the kernel it's not yet possible to upgrade from the serial connection to a telnet connection in the test setup, this would make the test cases a bit more reliable since there's a very real possibility that a few bits are flipped right in the success or failure string for the testjob marking a successful job as failed due to timeouts.</p> <p>Getting a generic test initramfs to combine with a kernel to do basic testing will be a good thing to figure out for part 6, Jumpdrive only has busybox utilities available and only has very limited platform support in the first place. </p> Trying Plasma Desktop again BraamTue, 18 Oct 2022 23:51:04 -0000<p>So I'm trying KDE Plasma again, I hear 5.26 has many great improvements and the last time I ran KDE for more than a day was in 2014. I mainly run Gnome and Phosh and Sway on my devices and I feel like I don't use KDE enough for the amount of times I complain about it.</p> <p>So I decided to put postmarketOS Plasma Desktop on my PineBook Pro. Mainly because one of my issues with KDE has been performance. I know that Gnome runs on the edge of smooth on this hardware so it's easy to compare if KDE will be slower or faster for me there. Testing on faster hardware would only hide performance issues.</p> <h2>Installation</h2> <p>So I installed Plasma with the postmarketOS installer on the Pinebook Pro, I don't have an nvme drive in this device, just running it from the stock 64GB eMMC module. </p> <p>The only installation issue I had was the disk not resizing on first boot to fill the full eMMMC size, but that's a postmarketOS bug I need to fix, not a KDE issue.</p> <hr> <p>So that continues me writing the rest of this blog post from my Plasma installation :)</p> <h2>First use</h2> <p>My issue with KDE Mostly is that it has many papercuts that I just don't have to deal with in the Gnome ecosystem. I don't want to make this sound like a big hate post on KDE so here's some good things about it first:</p> <ul><li>I like how the default theme looks in light and dark mode, I&#x27;ll probably will swap out the cursor theme since it doesn&#x27;t really fit but the rest looks nice and consistent</li> <li>The notification system is quite an improvement. Gnome has the annoying tendency to pop up the notification right at the place where I&#x27;m working: the top center of the screen. If I get a few chat messages I have to manually click a few stacked notifications away unless I want to wait a few minutes for them to time out. The KDE notifications pop up in the bottom right corner and move out of the way, also the visual timeout indicator is great to have.</li> <li>I like the set of stock wallpapers in Plasma a lot.</li> <li>The default application launcher looks way more polished than I remember. Also I&#x27;m always happy when application launchers just pop up when you tap the meta key, which is still not a feature in all desktop environments.</li> <li>That lockscreen fade in animation is nice</li> <li>The performance has improved quite a lot. I feel like the performance of Plasma on this hardware was absolutely painful a year back when I ran it to get some photos and now it feels quite smooth in mosts places.</li> </ul> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Nice notification, ignore inconsistent padding</figcaption></figure> <p>Not everything is perfect but I'm quite pleased with how this is running. I know a lot of KDE is about everything being customizable, but the thing I like with Gnome is that it's already setup the way I want out of the box. Well mostly... For some reason on every desktop I install I have to re-add the ctrl+alt+t keybind to open a terminal. If I remember correctly this was a thing in older Ubuntu installations and nobody ships this keybind anymore.</p> <p>Now for the papercuts I hit in the first hour:</p> <ul><li>The application launcher is slow to pop up. Feels like this should just always be there. But I have the feeling that a lot of animations have a half second delay before they start animating.</li> <li>The terminal has way too many buttons and toolbars by default, luckily easy to remove again.</li> <li>meta+up doesn&#x27;t fullscreen the application, it moves it to the upper half of the screen instead. All the other environments I use fullscreen using this keyboard combination.</li> <li>partitionmanager doesn&#x27;t seem to have a function for &quot;extend partition in all the free space after it&quot;, instead I have to calculate the new size. Not sure if this counts as an KDE issue or third party applications though :)</li> <li>My wifi does not reconnect on reboot for me, but I believe this is a postmarketOS bug.</li> <li>The large &quot;peek at desktop&quot; button in the taskbar is pretty useless and way too big for its function. Windows at least made that only 5px wide.</li> <li>Performance of Firefox on Plasma seems a bit worse than when running it on Gnome (with wayland)</li> <li>The screenshot utility is quite clunky compared to what I&#x27;m used to in Gnome. I can only select what to screenshot after the fact it seems. It&#x27;s quite good as far as screenshot UIs go but not perfect yet.</li> <li>The single-click-to-activate in Dolpin, I&#x27;d rather double click on items</li> <li>Also the active control blue border around the file list in Dolphin. It does not really need to be there and it makes the interface so much more messy. It shows there&#x27;s focus on the filebrowser widget, but what does that mean? You either have focus on a file in it and that should have the highlight or you don&#x27;t and then the widget should not have focus.</li> </ul> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>The Dolpin File manager with the blue border around the files</figcaption></figure> <h2>After some more use</h2> <p>One of the things I had not used yet is Discover, I'm used to installing packages using the command line. I barely use Gnome Software, when I do it's mostly because the flatpak commandline utility is a mess to use. I tried some packagemanagement using Discover and ran into a few pros and cons. This is not only on the KDE side ofcourse because this heavily integrates with postmarketOS specific things.</p> <p>The main thing I noticed is that Discover is faster than Gnome Software, it's not even close. The speed at which Discover search works makes it actually a realistic option compared to launching a terminal. There's a few paper cuts too ofcourse. If the internet is down and you launch Discover it will show a notification for every repository configured in apk, which is 3 notifications on stock postmarketOS edge. </p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"></figure> <p>This notification is almost completely meaningless, It says transferring not what it's transferring and shows no actual progress.</p> <p>If the internet is up it works pretty smoothly though. The only thing I miss in the rest of the UI is the "Launch" button that's present in Gnome Software to start the software you just installed instead of going through the app launcher.</p> <h2>Customization</h2> <p>Since Plasma is supposedly very customizable, I've been going through the settings to make it work closer to the way I want.</p> <p>First thing is fixing up some hardware configuration. The touchpad on the Pinebook Pro does not have separate buttons so tap-to-click needs to be enabled, and more importantly, two-finger-tap to right click. The touchpad configuration is nicely set-up in the settings utility in Plasma, this was an easy change to do.</p> <p>Changing the window management behavior for the meta+up shortcut was also pretty simple, I only was slightly confused by the shortcuts settings a bit because unchecking the checkbox below "Default shortcut" makes it not respond to my custom shortcut. But leaving the default enabled is fine sincec I would never hit meta+PgUp on a laptop keyboard by accident.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <p>Changing up the cursor theme has been proven to be the most annoying. The options for this by default is Breeze and Breeze-light. Which is not what I want, and not what I want in white. This means it's down to the button I always dislike in these environments, the user content browser.</p> <p>The one in this case is no exception. User content is a huge mess. Names mean nothing, screenshots are mostly useless, it's hard to find some "normal" things beside all the really weird options. Usually the service that hosts the data is also pretty unreliable. After 3 search queries I started to get this error window that immediately reappears when I close it with the button while also showing it has sucessfully fetched the results of the query behind it anyway.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"></figure> <p>After installing some cursor themes I found that the rendering of anything that's not Breeze is just bad. The preview in the settings window look fine but when I actually apply my configuration the cursor is way smaller. It also only applies when hovering over the window decoration, I still get the old cursor theme on the window itself any any newly launched windows. Deleting the currently active cursor theme also crashes the settings application.</p> <p>I gave up on this when I remembered that icon themes are standardized. So I installed the <code>adwaita-icon-theme</code> package from Alpine Linux and then I got the option for the adwaita cursor theme. After a reboot the new cursor was consistently applied.</p> <p>Other customisation things I wanted to try is the general system theming. There's quite a bit of different theming settings available. The first one I messed with is the "Application style"</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>The preinstalled application styles</figcaption></figure> <p>By default the theme is set to Breeze, I switched it to Fusion because it has actual old school 3D controls like GTK used to have until they decided that custom options are forbidden on libadwaita-enabled applications. Maybe I like this theme because it's closest to old-school clearlooks :)</p> <p>Changing this theme also fixes the blue border focus weirdness in Dolphin.</p> <p>For the GTK apps I use there's a seperate settings page.</p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>Gtk theme settings</figcaption></figure> <p>For GTK there's no preview thumbnails and the settings are very incomplete. With the default Breeze settings the GTK apps integrate more into the general theming of Plasma, with Default it will let the apps use Adwaita. With the Emacs theme... it also just uses Adwaita. <s>The option I'm missing here though is the dark variant of Adwaita</s>. After installing the gnome-themes-extra package I have the Adwaita Dark theme available that I wanted.</p> <p>I think making GTK look like Breeze is nice from a technical standpoint but by doing that the app will not look like a proper Plasma application, it will just look like it has the same colors and widgets but it will still follow the Gnome HIG, not the Plasma one. Also changing the widget theme to Fusion like I did will not change that for the Breeze GTK theme ofcourse so that introduces more inconsistency. I'd rather have the GTK apps look the way it was intended instead of half-following the Plasma styles.</p> <h2>Finding more workarounds</h2> <p>After some use I have found workarounds for some of my bigger issues. The wifi connection issue can be solved by setting the connection profile to "All users" makes auto-connecting on startup work again.</p> <p>The annoying widgets I did not want in various places could be removed with the customisation features of Plasma, so not really a workaround but really intended functionality :D</p> <p>The only large issue I have left is the performance of the application launcher, I'm pretty sure that won't be an easy fix.</p> <h2>Conclusion</h2> <p>I've been running Plasma for a bit now on the Pinebook Pro and I think I will leave it on. It fulfills it's purpose of being the frame around my terminal windows and the browser and hits most of the performance goals I have for this hardware.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>Green accent color! without hacking up the theme!</figcaption></figure> <p></p> Automated Phone Testing pt.4 setupMartijn BraamFri, 14 Oct 2022 17:20:51 -0000<p>To execute CI jobs on the hardware there needs to be a format to specify the commands to run. Every CI platform has its own custom format for this, most of them are based on YAML.</p> <p>My initial plan was to use YAML too for this since it's so common. YAML works <i>just</i> good enough to make it work on platforms like GitHub Actions and Gitlab CI. One thing that's quite apparent though is that YAML is just not a great language to put blocks of shell scripting into.</p> <div class="highlight"><pre><span></span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">busybox:latest</span><span class="w"></span> <span class="nt">before_script</span><span class="p">:</span><span class="w"></span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">echo &quot;Before script section&quot;</span><span class="w"></span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">echo &quot;For example you might run an update here or install a build dependency&quot;</span><span class="w"></span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">echo &quot;Or perhaps you might print out some debugging details&quot;</span><span class="w"></span> <span class="nt">after_script</span><span class="p">:</span><span class="w"></span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">echo &quot;After script section&quot;</span><span class="w"></span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">echo &quot;For example you might do some cleanup here&quot;</span><span class="w"></span> <span class="nt">build1</span><span class="p">:</span><span class="w"></span> <span class="w"> </span><span class="nt">stage</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">build</span><span class="w"></span> <span class="w"> </span><span class="nt">script</span><span class="p">:</span><span class="w"></span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">echo &quot;Do your build here&quot;</span><span class="w"></span> </pre></div> <p>Blocks of shell script are either defined as lists like the example above or using one of the multiline string formats in YAML. This works but is not very convenient.</p> <h2>Design constraints</h2> <p>There is a few things that the job format for PTS needs to solve. The main thing being that jobs are submitted for multiple devices that might behave slightly differently. Of course it's possible to use some conditionals in bash to solve this but leaning on shell scripting to fix this is a workaround at best.</p> <p>The things that I needed from the common job execution formats is a way to specify metadata about the job and a way to specify commands to run. One major difference with other CI systems is that a test job running on the target hardware involves rebooting the hardware into various modes and checking if the hardware behaves correctly. </p> <p>Here is an example of the job description language I've come up with:</p> <pre><code>device: hammerhead env { ARCH: aarch64 CODENAME: lg-hammerhead BOOTLOADER: fastboot PMOS_CATEGORY: testing NUMBER: 432 } power { reset } fastboot (BOOTLOADER==&quot;fastboot&quot;) { flash userdata${CODENAME}/userdata.img boot${CODENAME}/boot.img } heimdall (BOOTLOADER==&quot;heimdall&quot;) { flash userdata${CODENAME}/userdata.img flash boot${CODENAME}/boot.img continue } shell { username: root password: 1234 script { uname -a } }</code></pre> <p>This format accepts indentation but it is not required. All nesting is controlled by braces.</p> <p>The top level data structure for this format is the Block. The whole contents of the file is a single block and in the example above the <code>env</code>, <code>power</code> etc blocks are.. Blocks. </p> <p>Blocks can contain three things:</p> <ul><li>Another nested block</li> <li>A Definition, which is a key/value pair</li> <li>A Statement, which is a regular line of text</li> </ul> <p>In the example above definitions are used to specify metadata and environment variables and the script itself is defined as statements.</p> <p>Blocks also have the option to add a condition on them. The conditions are used by the controller daemon to select the right blocks to execute.</p> <p>This is just the syntax though, to make this actually work I wrote a lexer and parser for this format in Python. This produces the following debug output:</p> <pre><code>&lt;ActBlock act &lt;ActDefinition device: hammerhead&gt; &lt;ActBlock env &lt;ActDefinition ARCH: aarch64&gt; &lt;ActDefinition CODENAME: lg-hammerhead&gt; &lt;ActDefinition BOOTLOADER: fastboot&gt; &lt;ActDefinition PMOS_CATEGORY: testing&gt; &lt;ActDefinition NUMBER: 432&gt; &gt; &lt;ActBlock power &lt;ActStatement reset&gt; &gt; &lt;ActBlock fastboot: &lt;ActCondition &lt;ActReference BOOTLOADER&gt; == fastboot&gt; &lt;ActStatement flash userdata${CODENAME}/userdata.img&gt; &lt;ActStatement boot${CODENAME}/boot.img&gt; &gt; &lt;ActBlock heimdall: &lt;ActCondition &lt;ActReference BOOTLOADER&gt; == heimdall&gt; &lt;ActStatement flash userdata${CODENAME}/userdata.img&gt; &lt;ActStatement flash boot${CODENAME}/boot.img&gt; &lt;ActStatement continue&gt; &gt; &lt;ActBlock shell &lt;ActDefinition username: root&gt; &lt;ActDefinition password: 1234&gt; &lt;ActBlock script &lt;ActStatement uname -a&gt; &gt; &gt; &gt;</code></pre> <p>Now the controller needs to actually use the parsed act file to execute the task. After parsing this is reasonably simple. Just iterate over the top level blocks and have a module in the controller that executes that specific task. A <code>power</code> module that takes the contents of the power block and sends the commands to the pi. Some flasher modules to handle the flashing process.</p> <h2>Developer friendliness</h2> <p>The method to execute the blocks as modules is simple to implement, but something that's very important for this part is the developer friendliness. Testing is difficult enough and you don't want to have to deal with overly verbose specification languages.</p> <p>It's great that with conditions and flashing-protocol specific blocks an act can describe how to flash on multiple devices depending on variables. But... that's a level of precision that's not needed for most cases. The <code>fastboot</code> module would give you access to run arbitrary fastboot commands which is great for debugging but for most testjobs you just want to get a kernel/initramfs running on whatever method the specific device supports. So one additional module is needed:</p> <pre><code>boot { # Define a rootfs to flash on the default partition rootfs: something/rootfs.gz # For android devices specify a boot.img bootimg: something/boot.img }</code></pre> <p>This takes the available image for the device and then flashes it to the most default locations for the device. This is something that's defined by the controller configuration. On the non-android devices specifying the rootfs would be enough, it would write the image to the whole target disk. This would be enough for the PinePhone for example.</p> <p>For Android devices things are different ofcourse. There most devices need to have the boot.img and rootfs.img produced by postmarketOS to boot. For those the rootfs can be written to either the system or userdata partition and in many cases boot.img can be loaded to ram instead of flashing. For test cases that can run from initramfs this would mean no wear on the eMMC of the device at all.</p> <p>With this together a minimal test case would be something like this:</p> <pre><code>device: pine64-pinephone boot { rootfs: } shell { username: user password: 147147 script { uname -a } }</code></pre> <p>The <code>boot</code> module will take care of resetting the device, powering on, getting into the correct flasher mode, flashing the images and rebooting the device.</p> <p>After that it's just <code>shell</code> blocks to run actual test scripts.</p> <p>After implementing all of this in the PTS controller I made a small testjob exactly like the one above that loads the PinePhone jumpdrive image. Since that image is small and easy to test with.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>Jumpdrive booted on the PinePhone in the test setup</figcaption></figure> <h2>Detecting success and failure</h2> <p>One thing that's a lot easier in container test scripts than on serial-controlled real hardware is detecting the result of test jobs. There's not multiple streams like stdout/stderr anymore, there's no exit codes anymore. The only thing there is is a string of text.</p> <p>There's two solutions to this and I'm implementing both. The first one is specifying a string to search for to mark a test as sucessful or failed, this is the easiest solution and should work great for output of testsuites.</p> <pre><code>shell { username: user password: 147147 success: linux script { uname -a } }</code></pre> <p>The other one is moving from uart to telnet as soon as possible. If a test image just has telnetd in the initramfs and IPv6 connectivity then the controller can automatically figure out the IPv6 link local address on the phone side and connect to the device with telnet and a preset username/password. This is a bit harder with IPv4 connectivity due to multiple devices being connected and they might have overlapping addresses.</p> <p>One a telnet session is established something closer to a traditional CI suite can function by sending over the script and just executing it instead of automating a shell.</p> <pre><code>telnet { username: user password: 147147 script { uname -a } }</code></pre> <p>Beside this the other blocks can signal a failure. The <code>boot</code> block will mark the test as failed when the device could not recognize the boot image for example.</p> <h2>Next up</h2> <p>With this all the major components have their minimal required functionality working. The next steps is building up the first rack case and deploying an instance of the central controller for postmarketOS. I've already been printing more 3D phone holders for the system. One of the prototypes of this is teased in the top image in the article :)</p> Automated Phone Testing pt.3 setupMartijn BraamTue, 27 Sep 2022 12:32:24 -0000<p>So in the previous post I mentioned the next step was figuring out the job description language... Instead of that I implemented the daemon that sits between the hardware and the central controller.</p> <p>The original design has a daemon that connects to the webinterface and hooks up to all the devices connected to the computer. This works fine for most things but it also means that to restart this daemon in production all the connected devices have to be idle or all the jobs have to be aborted. This can be worked around by having a method to hot-reload configuration for the daemon and deal with the other cases that would require a restart. I opted for the simpeler option of just running one instance of the daemon for every connected device.</p> <p>The daemon is also written in Python, like the other tools. It runs a networking thread, hardware monitoring thread and queue runner thread.</p> <h2>Message queues</h2> <p>In order to not have to poll the webinterface for new tasks a message queue or message bus is required. There are a lot of options available to do this so I limited myself to two options I had already used. Mosquitto and RabbitMQ. These have slightly different feature sets but basically do the same thing. The main difference is that RabbitMQ actually implements a queue system where tasks are loaded into and can be picked from the queue by multiple clients. Clients then ack and nack tasks and tasks get re-queued when something goes wrong. This essentially duplicates quite a few parts of the existing queue functions already in the central controller. Mosquitto is way simpler, it deals with messages instead of tasks. The highest level feature the protocol has is that it can guarantee a message is delivered.</p> <p>I chose Mosquitto for this reason. The throughput for the queue is not nearly high enough that something like RabbitMQ is required to handle the load. The message bus feature of Mosquitto can be used to notify the daemon that a new task is available and then the daemon can fetch the full data over plain old https.</p> <p>The second feature I'm using the message bus for is streaming the logs. Every time a line of data is transmitted from the phone over the serial port the daemon will make that a mqtt message and send it to the RabbitMQ daemon running on the same machine as the webinterface. The webinterface daemon is subscribed to those messages and stores them on disk, ready to render when the job page is requested.</p> <p>With the current implementation the system creates one topic per task and the daemon sends the log messages to that topic. One feature that can be used to make the webinterface more efficient is the websocket protocol support in the mqtt daemon. With this it's no longer required to reload the webinterface for new log messages or fetch chunks through ajax. When the page is open it's possible to subscribe to the topic for the task with javascript and append log messages as they stream in in real time.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>Log messages sent over MQTT</figcaption></figure> <h2>Authentication</h2> <p>With the addition of a message bus, it's now required to authenticate to that as well, increasing the set-up complexity of the system. Since version 2 there's an interesting plugin bundled with the Mosquitto daemon: dynsec.</p> <p>With this plugin accounts, roles and acls can be manipulated at runtime by sending messages to a special topic. With this I can create dynamic accounts for the controllers to connect to the message bus and relay that information on request from the http api to the controller to request on startup.</p> <p>One thing missing from this is that the only way to officially use dynsec seems to be the <code>mosquitto_ctrl</code> commandline utility to modify the dynsec database. I don't like shelling out to executables to get things done since it adds more dependencies outside the package manager for the main language. The protocol used by <code>mosquitto_ctrl</code> is quite simple though, not very well documented but easy to figure out by reading the source.</p> <h2>Flask-MultiMQTT</h2> <p>To connect to Mosquitto from inside a Flask webapplication the most common way is with the <code>Flask-MQTT</code> extension. This has a major downside though that's listed directly at the top of the <a href="">Flask-MQTT documentation</a>; It doesn't work correctly in a threading Flask application, it also fails when hot-reload is enabled in Flask because that spawns threads. This conflicts a lot with the other warning in Flask itself which is that the built-in webserver in flask is not a production server. The production servers are the threading ones.</p> <p>My original plan was to create an extension to do dynsec on top of Flask-MQTT but looking at the amount of code that's actually in Flask-MQTT and the downsides it has I would have to work around I decided to make a new extension for flask that <i>does</i> handle threading. The <a href="">Flask-MultiMQTT</a> is available on pypi now and has most of the features of the Flask-MQTT extension and the extra features I needed. It also includes helpers for doing runtime changes to dynsec.</p> <p>Some notable changes from Flask-MQTT are:</p> <ul><li>Instead of the list of config options like <code>MQTT_HOST</code> etc it can get the most important ones from the <code>MQTT_URI</code> option in the format <code>mqtts://username:password@hostname:port/prefix</code>.</li> <li>Support for a <code>prefix</code> setting that is prefixed to all topics in the application to have all the topics for a project namespaced below a specific prefix.</li> <li>Integrates more with the flask routing. Instead of having the <code>@mqtt.on_message()</code> decorator (and the less documented <code>@mqtt.on_topic(topic)</code> decorators where you still have to manually subscribe there is an <code>@mqtt.topic(topic)</code> decorator that subscribes on connection and handles wildcards exactly like flask does with <code>@mqtt.topic(&quot;number-topic/&lt;int:my_number&gt;/something&quot;)</code>.</li> <li>It adds a <code>mqtt.topic_for()</code> function that acts like <code>flask.url_for()</code> but for topics subscribed with the decorator. This can generate the topic with the placeholders filled in like url_for() does and also supports getting the topic with and without the prefix .</li> <li>Implements <code>mqtt.dynsec_*()</code> functions to manipulate the dynsec database.</li> </ul> <p>This might seem like overkill but it was suprisingly easy to make, even if I didn't make a new extension most time would be spend figuring out how to use dynsec and weird threading issues in Flask.</p> <h2>Log streaming</h2> <p>The serial port data is line-buffered and streamed to an mqtt topic for the task, but this is not as simple as just dumping the line into payload of the mqtt message and sending it off. The controller itself logs about the state of the test system and already parses UART messages to figure out where in the boot process the device is to facilitate automated flashing.</p> <p>The log messages are send as json objects over the message bus. The line is annotated by the source of the message, which is <code>"uart"</code> most of the time. There are also more complex log messages that have the full details about USB plug events.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="" class="kg-image"><figcaption>The USB event generated when the postmarketOS initramfs creates the USB network adapter on the PinePhone</figcaption></figure> <p>Beside uart passthrough messages there's also inline status messages from the controller itself when it's starting the job and flasher messages when the flasher thread is writing a new image to the device. This can be extended to have more annotated sources like having syslog messages passed along once the system is booted and if there is a helper installed.</p> <p>This log streaming can also be extended to have a topic for messages in the other direction, that way it would be possible to get a shell on the running device.</p> <p>With this all together the system can split up the logs over UART in some sections based on hardcoded knowledge of log messages from the kernel and the bootloader and create nice collapsible sections. </p> <figure class="kg-card kg-image-card"><img src="" class="kg-image"><figcaption>A PinePhone booting, flashing an image using Tow-Boot and then bootlooping</figcaption></figure> <h2>Running test jobs</h2> <p>The current system still runs a hardcoded script at the controller when receiving a job instead of parsing a job manifest since I postponed the job description language. The demo above will flash the first attached file to the job, which in my case is <code>pine64-pinephone.img</code>, a small postmarketOS image. Then it will reboot the phone and just do nothing except pass through UART messages.</p> <p>This implementation does not have a way to end jobs yet and have success/failure conditions at the end of the script. There are a few failure conditions that are implemented which I ran into while debugging this system.</p> <p>The first failure condition it can detect is a PinePhone bootlooping. Sometimes the bootloader crashes due to things like insufficient power, or the A64 SoC being in a weird state due to the last test run. When the device keeps switching between the <code>spl</code> and <code>tow-boot</code> state it will mark it as a bootloop and fail the job. Another infinite loop that can easily be triggered by not inserting the battery is it failing directly after starting the kernel. This is what is happening in the screenshot above. This is not something that can be fully automatically detected since a phone rebooting is a supported case.</p> <p>To make this failure condition detectable the job description needs a way to specify if a reboot at a specific point is expected. Or in a more generic way, to specify which state transitions are allowed in specific points in the test run. With this implemented it would remove a whole category of failures that would require manual intervention to reset the system.</p> <p>The third failure condition I encountered was the phone not entering the flashing mode correctly. If the system wants to go to flashing mode but the log starts outputting kernel logs it will mark it as a failure. In my case this failure was triggered because my solder joint on the PinePhone failed so the volume button was not held down correctly.</p> <p>Another thing that needs to be figured out is how to pass test results to the controller. Normally in CI systems failure conditions are easy. You execute the script and if the exit code is not <code>0</code> it will mark the task as a failure. This works when executing over SSH but when running commands over UART that metadata is lost. Some solutions for this would be having a wrapper script that catches the return code and prints it in a predefined format to the UART so the logparser can detect it. Even better is bringing up networking on the system if possible so tests can be executed over something better than a serial port.</p> <p>Having networking would also fix another issue: how to get job artifacts out. If job artifacts are needed and there is only a serial line. The only option is sending some magic bytes over the serial port to tell the controller it's sending a file and then dump the contents of the file with some metadata and encoding. Luckily since dial up modems were a thing people have figured out these protocols in XMODEM, YMODEM and ZMODEM.</p> <p>Since having a network connection cannot be relied on the phone test system would probably need to implement both the "everything runs over a serial line" codepath and the faster and more reliable methods that use networking. For tests where networking can be brought up some helper would be needed inside the test images that are flashed that brings up USB networking like the postmarketOS initramfs does and then communicates with the controller over serial to signal it's IP address.</p> <p>So next part would be figuring out the job description (again) and making the utilities for using inside the test images to help execute them.</p>