BrixIT BlogRandom blog posts from Martijn Braamhttps://blog.brixit.nl/https://blog.brixit.nl/image/w300//static/files/blog.brixit.nl/1669734482/favicon.pngBrixIT Bloghttps://blog.brixit.nl/Tue, 17 Dec 2024 00:30:04 -000060BodgeOS pt.2: Running on real hardwarehttps://blog.brixit.nl/bodgeos-pt-2-running-on-real-hardware/106LinuxMartijn BraamTue, 17 Dec 2024 00:30:04 -0000<p>In the <a href="https://blog.brixit.nl/conjuring-a-linux-distribution-out-of-thin-air/">previous part</a> of this series I created a base Linux distribution from a running LFS system. This version only ran as a container which has several benefits that makes building the distribution a lot easier. For a simple container I didn't have to have:</p> <ul><li>A service manager (systemd)</li> <li>Something to make it bootable on x86_64 systems (grub, syslinux, systemd-boot)</li> <li>A kernel</li> <li>An initramfs to get my filesystem mounted</li> <li>File-system utilities since there&#x27;s a folder instead of a filesystem.</li> </ul> <p>A few of these are pretty easy to get running. I already have all the dependencies to build a kernel so I generated a kernel from the linux-lts package in Alpine Linux.</p> <p>To make things easier for myself I just limited the distribution to run on UEFI x86_64 systems for now. This means I don't have to mess with grub ever again and I can just dump systemd-boot into my /boot folder to get a functional system. I had to build this anyway since I had to build systemd to have an init system for my distribution.</p> <h2>The Initramfs</h2> <p>The thing that took by far the longest is messing with the initramfs to make my test system boot. The initramfs generator is certainly one of the parts that have the most distribution-specific "flavor". Everyone invents it's own solution for it like <code>mkinitcpio</code> for ArchLinux and <code>mkinitfs</code> for Alpine and <code>initramfs-tools</code> for Debian as a few examples.</p> <p>I did the only logical thing and reinvented the wheel here. I'm even planning to reinvent it even further! Like the above solutions my current initramfs generator is a collection of shell scripts. The initramfs is a pretty simple system after all: it has to load some kernel modules, find the rootfs, mount it and then execute the init in the real system.</p> <p>For a very minimal system the only required thing is the <code>busybox</code> binary, it provides the shell script interpreter required to run the messy shell script that brings up the system and also provides all the base utilities. Due to my previous experiences with BusyBox modprobe in postmarketOS I decided to also move the real <code>modprobe</code> binary in the initramfs to have things loading correctly. To complete it I also added <code>blkid</code> instead of relying on the BusyBox implementation here to have support for partition labels in udev so no custom partition-label-searching code is required.</p> <p>Getting binaries in the initramfs is super easy. The process for generating an initramfs is:</p> <ol><li>Create an empty working directory</li> <li>Move in the files you need into the working directory from the regular rootfs like <code>/usr/bin/busybox</code> &gt; <code>/tmp/initfs-build/usr/bin/busybox</code></li> <li>Add in a script that functions as pid 1 in the initramfs and starts execution of the whole system</li> <li>Run the <code>cpio</code> command against the <code>/tmp/initfs-build</code> directory to create an archive of this temporary rootfs and run that through <code>gzip</code> to generate <code>initramfs.gz</code></li> </ol> <p>Step 2 is fairly simple since I just need to copy the binaries from the host system, but those binaries also have dependencies that need to be copied to make the executable actually work. Normally this is handled by the <code>lddtree</code> utility but I didn't feel like packaging that. It is a shell script that does a complicated task which is never a good thing and it depends on python and calling various ELF binary debugging utilities.</p> <p>Instead of using <code>lddtree</code> I brought up <a href="https://harelang.org/">Hare</a> on my distribution and wrote a replacement utility for it called <a href="https://git.sr.ht/~martijnbraam/hare-bindeps">bindeps</a>. This is just a single binary that loads the ELF file(s) and spits out the dependencies without calling any other tools. This is significantly faster than the performance overhead of <code>lddtree</code> which was always the slowest part of generating the initramfs for postmarketOS.</p> <p>The output format is also optimized to be easily parse-able in the mkinitfs shellscript.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>lddtree /usr/sbin/blkid /usr/sbin/modprobe <span class="go">/usr/sbin/blkid (interpreter =&gt; /lib64/ld-linux-x86-64.so.2)</span> <span class="go"> libblkid.so.1 =&gt; /lib/x86_64-linux-gnu/libblkid.so.1</span> <span class="go"> libc.so.6 =&gt; /lib/x86_64-linux-gnu/libc.so.6</span> <span class="go">/usr/sbin/modprobe (interpreter =&gt; /lib64/ld-linux-x86-64.so.2)</span> <span class="go"> libzstd.so.1 =&gt; /lib/x86_64-linux-gnu/libzstd.so.1</span> <span class="go"> liblzma.so.5 =&gt; /lib/x86_64-linux-gnu/liblzma.so.5</span> <span class="go"> libcrypto.so.3 =&gt; /lib/x86_64-linux-gnu/libcrypto.so.3</span> <span class="go"> libc.so.6 =&gt; /lib/x86_64-linux-gnu/libc.so.6</span> <span class="gp">$ </span>bindeps /usr/bin/blkid /usr/bin/modprobe <span class="go">/usr/lib/ld-linux-x86-64.so.2</span> <span class="go">/usr/lib/libblkid.so.1.1.0</span> <span class="go">/usr/lib/libc.so.6</span> <span class="go">/usr/lib/libzstd.so.1.5.6</span> <span class="go">/usr/lib/liblzma.so.5.6.3</span> <span class="go">/usr/lib/libz.so.1.3.1</span> <span class="go">/usr/lib/libcrypto.so.3</span> </pre></div> <p>The bindeps utility seems to be roughly 100x faster in the few testcases I've used it in and it outputs in a format that needs no further string-mangling to be used in a shell script. In BodgeOS mkinitfs it's used like this:</p> <div class="highlight"><pre><span></span><span class="nv">binaries</span><span class="o">=</span><span class="s2">&quot;/bin/modprobe /bin/busybox /bin/blkid&quot;</span> <span class="k">for</span> bin <span class="k">in</span> <span class="nv">$binaries</span> <span class="p">;</span> <span class="k">do</span> install -Dm0755 <span class="nv">$bin</span> <span class="nv">$workdir</span>/<span class="nv">$bin</span> <span class="k">done</span> bindeps <span class="nv">$binaries</span> <span class="p">|</span> <span class="k">while</span> <span class="nb">read</span> lib <span class="p">;</span> <span class="k">do</span> install -Dm0755 <span class="nv">$lib</span> <span class="nv">$workdir</span>/<span class="nv">$lib</span> <span class="k">done</span> </pre></div> <p>The next part is the kernel modules. Kernel modules are also ELF binaries just like the binaries I just copied over but they sadly don't contain any dependency metadata. This metadata is stored in a seperate file called <code>modules.dep</code> that has to be parsed seperately. I did not bother with this and copied the solution from the initramfs generator example from LFS and just copy hardcoded folders of modules into the initramfs and hope it works.</p> <p>The file format for <code>modules.dep</code> is trivial so I really want to just integrate support for that into bindeps in the future.</p> <h2>Debugging the boot</h2> <p>It's suprisingly painful to debug a non-booting Linux system that fails in the initramfs. I wasted several hours figuring out why the kernel threw errors at the spot the initramfs should start executing which ended up being an issue with the <code>/sbin/init</code> file had the wrong shebang line at the start so it was not loadable. The kernel has no proper error message that conveys any of this.</p> <p>After I got the initramfs to actually load and start a lot of time was wasted on executables missing the interperter module. In the example above this is the <code>/lib64/ld-linux-x86-64.so.2</code> line. The issue here ended up that I was just missing the /lib64 symlink in my initramfs. This was very hard to debug in a system without debug utilities because nothing could execute.</p> <p>After all that I spend even more time figuring out why I had no kernel log lines on my screen. After much annoyance this turned out to be missing options in the kernel config for the linux-lts kernel config I took from Alpine Linux. So instead of fixing that I took the kernel config from ArchLinux and rebuild the linux-lts package. This fixed my kernel log output issue but also added a new one... The keyboard in my laptop wasn't working in the initramfs.</p> <p>I never did figure out which module I was missing for that because I fixed the rest of the initramfs script instead so it just continues on to the real rootfs where all the modules are available.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1734393060/16f5654c63ad395d.jpg" class="kg-image"></figure> <p>After all that I did manage to get to a login prompt though!</p> <h2>Cleaning up</h2> <p>After booting up this I realized it would be really handy if I actually had a <code>/etc/passwd</code> file in my system and some more of the bare essentials so I could actually log in and use the system.</p> <p>This mainly involved adding missing dependencies in packages and packaging a few more files in <code>/etc</code> to make it a functional system. After the first boot the journal had a neat list of system binaries from util-linux that systemd depends on but not explicitly, so I added those dependencies to my systemd packaging.</p> <p>I also had to fix the issue that my newly-installed system did not trust the BodgeOS repository, I was missing the keys package that installs the repository key in <code>/etc/apk/keys</code> for me. In this process I noticed that the key I built the system with was called `-randomdigits.pub` instead of being prefixed with a name. This is pretty annoying because this name is embedded in all the compiled packages and I didn't want to ship a file with that name in my keys package.</p> <p>There seemed to be a nifty solution though: the <code>abuild-sign</code> tool appends a key to a tar archive, which is normally used to sign the APKINDEX.tar.gz file that contains the package list in the repository. I decided to run <code>abuild-sign *.apk</code> in my main repository after adjusting the abuild signing settings with a correct key name.</p> <p>Apparently this breaks .apk files and after inspection they now had two keys in them and neither my development LFS install and my test BodgeOS install wanted to have anything to do with the packages anymore.</p> <p>In the end I had to throw away my built packages and rebuild everything again from the APKBUILD files I had. Luckily this distribution is not that big yet so a full rebuild only took about 2.5x the duration of Dark Side of the Moon.</p> <h2>Next steps</h2> <p>Now I have a basic system that can boot I continued with packaging more libraries and utilites you'd expect in a regular Linux distribution. One of the things I thought would be very neat is having <code>curl</code> available, but that has a suprising amount of dependencies. Some tools like <code>git</code> will be useful too before I can call this system self-hosting.</p> <p>I also want to remove all the shell scripts from the initramfs generation. None of the tasks in the initramfs are really BodgeOS specific and most of the complications and bugs in this initramfs implementation (and the one in postmarketOS) is because the utilities it depends on are not really intended to do this stuff and system bootup just has a lot of race conditions shell scripts are just not great at handling.</p> <p>My current plan to fix that is to just replace the entire initramfs with a single statically linked binary. All this logic is way neater to implement in a good programming language. </p> Conjuring a Linux distribution out of thin airhttps://blog.brixit.nl/conjuring-a-linux-distribution-out-of-thin-air/105LinuxMartijn BraamSat, 07 Dec 2024 23:08:27 -0000<p>I decided I had to get something with slightly more CPU power than my Thinkpad x230 for a few tasks so I got a refurbished x280, aside from the worse keyboard the laptop is pretty nice and light. It shipped with Windows of course so the first thing I did is to install Ubuntu on the thing to wipe that off and verify all the hardware is working decently.</p> <p>I was wondering if I should leave Ubuntu on the thing, it works pretty well and it's still possible to get rid of all the Snap stuff, it's not my main machine anyway. The issue I ran into quickly though is some software is pretty outdated, like I don't want to use Kicad 7 anymore...</p> <h2>Picking distros once again</h2> <p>You'd think after using Linux for decades I would know what distro I'd put on a new machine. All the options I could think off though had annoying trade-offs I didn't want to deal with once again.</p> <p>The three main distributions I have running on hardware I manage is Alpine Linux, Archlinux and Debian. I like Alpine a lot but it is quite annoying when you deal with closed-source software. Since this is my go-to laptop to take with me to outages and repairs then I need it to handle random software thrown on it relatively easily.</p> <p>ArchLinux satisfies that requirement pretty well but my main issue with it is pacman. If you don't religiously run upgrades every hour on the thing it will just break because for some reason key management is not figured out yet there. The installation is also quite big usually due to packages not being split.</p> <p>Debian fixes the stability issues but comes with the trade off that software is usually much much older, this also leaks into Ubuntu that's running on the laptop now. It is also internally a lot more complicated due to the way it automatically sets up stuff while installing which I don't usually need.</p> <p>There is another solution though. Just build my own!</p> <h2>Artisanal home-grown Linux</h2> <p>Creating a new Linux distribution is one of those things that sounds much harder than it actually is. I just haven't done it before. I did build a small Debian derivative distro before just to avoid re-doing all the config for all machines but that's just adding an extra repository to an existing distribution. Of course I've also worked extensively on postmarketOS and while the scope of that is a lot larger it still is only a repository with additions on Alpine Linux.</p> <p>Some of you might be familiar with this graphic of Linux distributions:</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1733576472/image.png" class="kg-image"></figure> <p>There are many many many derivative distributions here, way less distributions that are built up from nothing. And that's exactly the part that I want to figure out. How do I make a distribution from scratch?</p> <p>I have once, a long time ago, build a working Linux machine from sources using the great <a href="https://www.linuxfromscratch.org/">Linux From Scratch</a> project. I would recommend that anyone that's really into Linux internals do that at least once. Just like Archlinux learns you how the distribution installer works (before they added an installer) the LFS book will learn you how you bootstrap a separate userspace from another Linux distribution and doing the gcc/glibc dependency loop build.</p> <p>So my plan for the distribution is: create a super barebones system using the LFS book, package up that installation using abuild and apk from Alpine Linux. This way I can basically make my own systemd/glibc distribution that is mostly like Archlinux but uses the packaging tools and methodology from Alpine Linux.</p> <h2>The bootstrap build</h2> <p>So to make my distribution I first have to build the Linux installation that will build the packages for the distribution. To get through this part relatively quickly I used the automated LFS installer called <a href="https://www.linuxfromscratch.org/alfs/">ALFS</a>. This basically does all the steps from the LFS book but very quickly. My intention is to replace all of these packages anyway so this part was not super important. It does all the required setup though to validate the GCC I'm using to build my distribution is sane and tested.</p> <p>In the ALFS installer I picked the systemd option since I didn't want to deal with openrc again and ended up with a nice functional rootfs to work in. I immediately encountered a few things that were critical and missing though. There was no wget or curl. I fixed this by grabbing the Busybox binary from my host Ubuntu distribution and putting that inside with only a wget symlink to it.</p> <p>This wget was not functional in my chroot though since it could not connect to https servers. Annoyingly all solutions you'd find for these errors is passsing <code>--no-check-certificates</code> to wget to get your files which is unacceptable when building a distribution. After a lot of debugging with openssl configs I ended up copying over all the certificates from the host Ubuntu system again and pointed wget to the certificate bundle with the wgetrc file.</p> <p>The very next thing I needed is <code>abuild</code>. This is the tool from Alpine Linux that's used to build the packages. Luckily it's a few small C programs and a large shell script so it's very easy to install in the temporary system. I also added <code>apk.static</code> to the system to be able to install the built packages.</p> <h2>Building packages</h2> <p>So now I have my temporary system running I could start writing <code>APKBUILD</code> files for all the packages in my LFS system. I started with the very simplest one of course that only provides <code>/etc/services</code> and <code>/etc/protocols</code>. No compiling and dependencies needed.</p> <p>For this package made the script that built the current APKBUILD using the abuild subcommands and generated a neat local repository so apk could install the files. So I ran that and now <code>/etc/services</code> and <code>/etc/protocols</code> are now exactly the same files but managed by apk and in a package.</p> <p>The reason I had to use the subcommands for abuild to run seperate stages instead of just running abuild itself to do the whole things is because one of the first steps is installing the build dependencies. In this case I'm in the weird setup where I have all the build dependencies installed through LFS but apk doesn't know about that so I simply skip that step.</p> <p>And that's how I re-build a lot of the LFS packages once again, but this time through my half broken abuild installation. One of the things that abuild was not happy about is the lack of the <code>scanelf</code> utility which it uses after building a package to check which <code>.so</code> files the binaries in the package depend on. Due to this a lot of dependencies between packages are simply missing. The <code>scanelf</code> utility has enough build dependencies though that I could not have that as the first few packages so most of the packages are broken in this stage.</p> <p>When I finally built and installed <code>scanelf</code> I ran into another issue. The packages I build after this failed at the last step because scanelf found the .so files required for the package but the package metadata for all the packages I made before it lack this information about the included .so files apparently. At this point I had to build all those packages for a fourth time (twice in LFS and now twice in abuild) to make dependencies here work.</p> <p>After getting through practically everything in the base system I ended up with around 333 packages in my local repository.</p> <p>Most of these <code>APKBUILD</code> files are a combination of the metadata header copied from the Alpine Linux ABUILD so I have a neat description and the correct license data. And then the build steps and flags from LFS and sometimes the install step from Archlinux.</p> <p>This means that for example the <code>xz</code> build steps in LFS are:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>./configure --prefix<span class="o">=</span>/usr <span class="se">\</span> --disable-static <span class="se">\</span> --docdir<span class="o">=</span>/usr/share/doc/xz-5.6.3 <span class="gp">$ </span>make <span class="gp">$ </span>make check <span class="gp">$ </span>make install </pre></div> <p>And that combined with the Alpine Linux metadata headers and adjustments for packaging becomes:</p> <div class="highlight"><pre><span></span><span class="nv">pkgname</span><span class="o">=</span>xz <span class="nv">pkgver</span><span class="o">=</span><span class="m">5</span>.6.3 <span class="nv">pkgrel</span><span class="o">=</span><span class="m">0</span> <span class="nv">pkgdesc</span><span class="o">=</span><span class="s2">&quot;Library and CLI tools for XZ and LZMA compressed files&quot;</span> <span class="nv">url</span><span class="o">=</span><span class="s2">&quot;https://tukaani.org/xz/&quot;</span> <span class="nv">arch</span><span class="o">=</span><span class="s2">&quot;all&quot;</span> <span class="nv">license</span><span class="o">=</span><span class="s2">&quot;GPL-2.0-or-later AND 0BSD AND Public-Domain AND LGPL-2.1-or-later&quot;</span> <span class="nv">subpackages</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$pkgname</span><span class="s2">-doc </span><span class="nv">$pkgname</span><span class="s2">-libs </span><span class="nv">$pkgname</span><span class="s2">-dev&quot;</span> <span class="nv">source</span><span class="o">=</span><span class="s2">&quot;https://github.com//tukaani-project/xz/releases/download/v</span><span class="nv">$pkgver</span><span class="s2">/xz-</span><span class="nv">$pkgver</span><span class="s2">.tar.xz&quot;</span> build<span class="o">()</span> <span class="o">{</span> ./configure <span class="se">\</span> --prefix<span class="o">=</span>/usr <span class="se">\</span> --disable-static <span class="se">\</span> --docdir<span class="o">=</span>/usr/share/doc/xz-<span class="nv">$pkgver</span> make <span class="o">}</span> check<span class="o">()</span> <span class="o">{</span> make check <span class="o">}</span> package<span class="o">()</span> <span class="o">{</span> make <span class="nv">DESTDIR</span><span class="o">=</span><span class="s2">&quot;</span><span class="nv">$pkgdir</span><span class="s2">&quot;</span> install <span class="o">}</span> </pre></div> <h2>Getting the first install to run</h2> <p>Now the hard part, finding all the issues that prevent this new installation from starting. The first thing I tried to use the <code>apk.static</code> on my host system to generate a new chroot from the repository I created, just like you'd install an Alpine chroot.</p> <p>Unfortunately this did not work and I had to fork apk-tools and make my own adjusted version. This is mainly because apk-tools hardcodes paths in it which conflict with my usrmerge setup. So I now have an <code>apk.static</code> build from my fork that does not try to create <code>/var/</code> for the database before the baselayout package can create the actual filesystem hierarchy with a symlink at that spot.</p> <p>With that fixed <code>apk.static</code> would be able to finish creating an installation from my repository, but I could not chroot into it for some reason. All the binaries are broken and return "Not found" when trying to execute them. I managed to actually enter the chroot by throwing my trusty busybox binary in there but did not get any more information out of that installation.</p> <p>After a bunch of testing, debugging, and more testing, I found out the reason was that I don't have <code>/lib64</code> in my installation. It seems like it's required specifically because x86_64 binaries specify /lib64/ld-linux-x86-64.so.2 as loader. The fix for that is quite easy by just having the glibc package place a symlink at that spot to the real ld-linux.so.</p> <h2>Beyond the first run</h2> <p>There's a lot of things that need to be fixed up to be a good distribution. All the packages will need to be rebuild again from the distribution installled from these first generation packages to leave behind the last parts of LFS in there. There's also the system setup that needs to happen to make it bootable and maintainable. For example things like the keys package that install and update the repository keys and adding the logo to neofetch (after packaging neofetch).</p> <p>I've also rsync'd the repository for the distribution to a webserver so it can actually be added to installations. I've been using this now to create test chroots using my locally build patched apk-tools.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1733611608/image.png" class="kg-image"></figure> <p>The repository itself also needs quite a bit of work, gcc shouldn't've been pulled in here for these packages and a bunch of the large packages need to be split up to remove the uncommon parts. Currently the <code>glibc</code> package is already 10x larger than a base Alpine Linux installation, luckily apk is a very fast package manager and everything still installs super quickly.</p> <h2>So why?</h2> <p>It doesn't really make sense to do this. I wanted to have done it anyway because that's how you learn. This took about 3 days of fiddling with build scripts between other work since it's mostly waiting on builds to finish.</p> <p>At the very least this distribution created this blog post, which is surprisingly one of the very few pieces of information available for bootstrapping a distribution.</p> <p>Even with this tiny base of packages this is already a quite usable OS since I have a kernel, a service manager and python. All you need to build some embedded stuff if this had an ARM64 port bootstrapped or something. For desktop work there's still a mountain of work to be done to package everything required to launch Firefox, getting the basic graphics stack up and running should be relatively straightforward with bringing up Sway with its dependencies.</p> <p>In the end it probably would've been easier to just add a ppa for Kicad to my Ubuntu installation :)</p> Building a timeseries database for funhttps://blog.brixit.nl/building-a-timeseries-db-for-fun/103LinuxMartijn BraamMon, 28 Oct 2024 22:50:55 -0000<p>Everyone that has tried to make some nice charts in Grafana has probably come across timeseries databases, for example InfluxDB or Prometheus. I've deployed a few instances of these things for various tasks and I've always been annoyed by how they work. But this is a consequence of having great performance right?</p> <p>The thing is... most the dataseries I'm working with don't need that level of performance. If you're just logging the power delivered by a solar inverter to a raspberry pi then you don't need a datastore for 1000 datapoints per second. My experience with timeseries is not that performance is my issue but the queries I want to do which seem very simple are practically impossible, especially when combinated with Grafana.</p> <p>Something like having a daily total of a measurement as a bar graph to have some long-term history with keeping the bars aligned to the day boundary instead of 24 hour offsets based on my current time. Or being able to actually query the data from a single month to get a total instead of querying chunks of 30.5 days.</p> <p>But most importantly, writing software is fun and I want to write something that does this for me. Not everything has to scale to every usecase from a single raspberry pi to a list of fortune 500 company logos on your homepage.</p> <h2>A prototype</h2> <p>Since I don't care about high performance and I want to prototype quickly I started with a Python Flask application. This is mainly because I already wrote a bunch of Python glue before to pump the data from my MQTT broker into InfluxDB or Prometheus so I can just directly integrate that.</p> <p>I decided that as storage backend just using a SQLite database will be fine and to integrate with Grafana I'll just implement the relevant parts of the Prometheus API and query language.</p> <p>To complete it I made a small web UI for configuring and monitoring the whole thing. Mainly to make it easy to adjust the MQTT topic mapping without editing config files and restarting the server.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1730152571/image.png" class="kg-image"></figure> <p>I've honestly probably spend way too much time writing random javascript for the MQTT configuration window. I had already written a MQTT library for Flask that allows using the Flask route syntax to extract data from the topic so I reused that backend. To make that work nicely I also wrote a simple parser for the syntax in Javascript to visualize the parsing while you type and give you dropdowns for selecting the values.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1730152874/image.png" class="kg-image"></figure> <p>This is not at all related to the dataseries part but at least it allows me to easily get a bunch of data into my test environment while writing the rest of the code.</p> <h2>The database</h2> <p>For storing the data I'm using the <code>sqlite3</code> module in Python. I dynamically generate a schema based on the data that's coming in with one table per measurement.</p> <p>There's two kinds of devices on my MQTT broker, some send the data as a JSON blob and some just send single values to various topics.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1730153124/image.png" class="kg-image"><figcaption>Example data from the MQTT broker</figcaption></figure> <p>The JSON blobs are still considered a single measurement and all the top-level values get stored in seperate columns. Later in the querying stage the specific column is selected.</p> <p>My worst case is a bunch of ESP devices that measure various unrelated things and output JSON to the topic shown above with JSON. I have a single ingestion rule in my database that grabs <code>devices/hoofdweg/<sensor></code> and dumps it in a table that has the columns for the various sensors, which ends up with a schema like this:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1730153341/image.png" class="kg-image"></figure> <p>A timestamp is stored, no consideration is made for timezones since in practically all cases a house isn't located right on a timezone boundary. The tags are stored in seperate columns with a <code>tag_</code> prefix and the fields are stored in column with a <code>field_</code> prefix. The maximum granularity of data is also a single second since I don't store the timestamp as a float.</p> <p>A lot of the queries I do don't need every single datapoint though but instead I just need hourly, daily or monthly data. For that a second table is created with the same structure but with aggregated data:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1730153646/image.png" class="kg-image"></figure> <p>This contains a row for every hour with the <code>min()</code>, <code>max()</code> and <code>avg()</code> of every field, it also contains a row for every day and one for every month. This makes it possible to after a preconfigured amount of time just throw away the data that has single-second granularity and keep the aggregated data way longer. For querying you explicitly select which table you want the data from.</p> <h2>The querying</h2> <p>To make the Grafana UI not complain too much I kept the language syntax the same as Prometheus but simply implemented less of the features because I don't use most of them. The supported features right now are:</p> <ul><li>Simple math queries like <code>1+1</code>, this can only do addition queries and is only here to satisfy the Grafana connection tester.</li> <li>Selecting a single measurement from the database and filtering on tags using the braces like <code>my_sensors{sensor=&quot;solar&quot;}</code></li> <li>Selecting a time granularity with brackets like <code>example_sensor[1h]</code>. This only supports <code>1h</code>, <code>1d</code> and <code>1M</code> and selects which rows are queried</li> <li>The aggregate functions like <code>max(my_sensors[1h])</code> which makes it select the columns from the reduced table with the <code>max_</code> prefix for querying when using the reduced table. For selecting the realtime data it will use the SQLite <code>max()</code> function.</li> </ul> <p>This is also just about enough to make the graphical query builder in Grafana work for most cases. The other setting used for the queries is the <code>step</code> value that Grafana calculates and passes to the Prometheus API. For the reduced table this is completely ignored and for the realtime table this is converted to SQL to do aggregation across rows.</p> <p>As an example the query <code>avg(sensors{sensor="solar", _col="voltage"})</code> gets translated to:</p> <div class="highlight"><pre><span></span><span class="k">SELECT</span><span class="w"></span> <span class="w"> </span><span class="n">instant</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="n">tag_sensor</span><span class="p">,</span><span class="w"></span> <span class="w"> </span><span class="k">avg</span><span class="p">(</span><span class="n">field_voltage</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">field_voltage</span><span class="w"></span> <span class="k">FROM</span><span class="w"> </span><span class="n">series_sensors</span><span class="w"></span> <span class="k">WHERE</span><span class="w"> </span><span class="n">instant</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="c1">-- Grafana time range</span> <span class="w"> </span><span class="k">AND</span><span class="w"> </span><span class="n">tag_sensor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="c1">-- solar</span> <span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">instant</span><span class="o">/</span><span class="mi">30</span><span class="w"> </span><span class="c1">-- 30 is the step value from Grafana</span> </pre></div> <p>To get nice aligned hourly data for a bar chart the query simply changes to <code>avg(sensors{sensor="solar", _col="voltage"}[1h])</code> which generates this query:</p> <pre><code>SELECT instant, date, hour, tag_sensor, avg_voltage FROM reduced_sensors WHERE instant BETWEEN ? AND ? -- Grafana time range AND tag_sensor = ? -- solar AND scale = 0 -- hourly</code></pre> <p>This reduced data is generated as background task in the server and makes sure that the row with the aggregate of a single hour selects the datapoints that fit exactly in that hour, not shifted by the local time when querying like I now have issues with in Grafana:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1730154759/image.png" class="kg-image"><figcaption>The query running against the old Prometheus database</figcaption></figure> <p>The bars in this chart don't align with the dates because this screenshot wasn't made at midnight. The data in the bars is also only technically correct when viewing the Grafana dashboard at midnight since on other hours it selects data from other days as well. If I view this at 13:00 then I get the data from 13:00 the day before to today which is a bit annoying in most cases and useless in the case of this chart because the <code>daily_total</code> metric in my solar inverter is reset at night and I pick the highest value.</p> <p>For monthly bars this issue gets worse because it's apparently impossible to accurately get monthly data from the timeseries databases I've used. Because I'm pregenerating this data instead of using magic intervals this also Just Works(tm) in my implementation.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1730155171/image.png" class="kg-image"><figcaption>The same sort of query on the Miniseries backend, hourly instead because I don&#x27;t have enough demo data yet.</figcaption></figure> <h2>Is this better?</h2> <p>It is certainly in the prototype stage and has not had enough testing to find weird edgecases. It does provide all the features though I need to recreate by existing home automation dashboard and performance is absolutely fine. The next step here is to implement a feature to lie to Grafana about the date of the data to actually use the heatmap chart to show data from multiple days as multiple rows.</p> <p>Once the kinks are worked out in this prototype it's probably a good idea to rewrite it into something like Go for example because while a lot of the data processing is done in SQLite the first bottleneck will probably be the single-threaded nature of the webserver and the MQTT ingestion code.</p> <p>The source code is online at <a href="https://git.sr.ht/~martijnbraam/miniseries">https://git.sr.ht/~martijnbraam/miniseries</a></p> Making a Linux-managed network switchhttps://blog.brixit.nl/making-a-linux-managed-network-switch/102LinuxMartijn BraamWed, 03 Jul 2024 14:10:04 -0000<p>Network switches are simple devices, packets go in, packets go out. Luckily people have figured out how to make it complicated instead and invented managed switches.</p> <p>Usually this is done by adding a web-interface for configuring the settings and see things like port status. If you have more expensive switches then you'd even get access to some alternate interfaces like telnet and serial console ports.</p> <p>There is a whole second category of managed switches though that people don't initially think of. These are the network switches that are inside consumer routers. These routers are little Linux devices that have a switch chip inside of them, one or more ports are internally connected to the CPU and the rest are on the outside as physical ports.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1719959978/RB2011UiAS-160620170256_160656.png" class="kg-image"><figcaption>Mikrotik RB2011 block diagram from mikrotik.com</figcaption></figure> <p>Here is an example of such a device that actually has this documented. I always thought that the configuration of these switch connected ports was just a nice abstraction by the webinterface but I was suprised to learn that with the DSA and switchdev subsystem in Linux these ports are actually fully functioning "local" network ports. Due to this practically only being available inside integrated routers It's pretty hard to play around with unfortunately.</p> <p>What is shown as a single line on this diagram is actually the connection of the SoC of the router and the switch over the SGMII bus (or maybe RGMII in this case) and a management bus that's either SMI or MDIO. Network switches have a lot of these fun acronyms that even with the full name written out make little sense unless you know how all of this fits together.</p> <p>Controlling your standard off-the-shelf switch using this system simply isn't possible because the required connections of the switch chip aren't exposed for this. So there's only one option left...</p> <h2>Making my own gigabit network switch</h2> <p>Making my own network switch can't be <i>that</i> hard right? Those things are available for the price of a cup of coffee and are most likely highly integrated to reach that price point. Since I don't see any homemade switches around on the internet I guess the chips for those must be pretty hard to get...</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1719960715/image.png" class="kg-image"></figure> <p>Nope, very easy to get. There's even a datasheet available for these. So I created a new KiCad project and started creating some footprints and symbols.</p> <p>I'm glad there's any amount of datasheet available for this chip since that's not usually the case for Realtek devices, but it's still pretty minimal. I resorted to finding any devices that has schematics available for similar Realtek chips to find out how to integrate it and looking at a lot of documentation for how to implement ethernet in a design at all.</p> <p>The implementation for the chip initially looked very complicated, there's about 7 different power nets it requires and there are several pretty badly documented communication interfaces. After going through other implementations it seem like the easiest way to power it is just connect all the nets with overlapping voltage ranges together and you're left with only needing a 3.3V and 1.1V regulator.</p> <p>The extra communication busses are for all the extra ports I don't seem to need. The switch chip I selected is the RTL8367S which is a very widely used 5-port gigabit switch chip, but it's actually not a 5-port chip. It's a 7 port switch chip where 5 ports have an integrated PHY and two are for CPU connections.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1719961532/image.png" class="kg-image"><figcaption>CPU connection block diagram from the RTL8367S datasheet</figcaption></figure> <p>My plan is different though, while there are these CPU ports available there is actually nothing in the Linux switchdev subsystem that requires the CPU connection to be to those ports. Instead I'll be connecting to port 0 on the switch with a network cable and as far as the switchdev driver knows there's no ethernet PHY in between.</p> <p>The next hurdle is the configuration of the switch chip, there's several configuration systems available and the datasheet does not really describe what is the minimum required setup to actually get it to function as a regular dumb switch. To sum up the configuration options of the chip:</p> <ul><li>There&#x27;s 8 pins on the chip that are read when it&#x27;s starting up. These pins are shared with the led pins for the ports so that makes designing pretty annoying. Switching the setting from pull-up to pull-down also requires the led to be connected in the different orientation.</li> <li>There&#x27;s an i2c bus that can be connected to an eeprom chip. The pins for this are shared with the SMI bus that I require to make this chip talk to Linux though. There is pin configuration to select from one of two eeprom size ranges but does not further specify what this setting actually changes.</li> <li>There&#x27;s a SPI bus that supports connecting a NOR flash chip to it. This can store either configuration registers or firmware for the embedded 8051 core depending on the configuration of the bootup pins. The SPI bus pins are also shared with one of the CPU network ports.</li> <li>There is a serial port available but from what I guess it probably does nothing at all unless there&#x27;s firmware loaded in the 8051.</li> </ul> <p>My solution to figuring out is to just order a board and solder connections differently until it works. I've added a footprint for a flash chip that I ended up not needing and for all the configuration pins I added solder jumpers. I left out all the leds since making that configurable would be pretty hard.</p> <p>The next step is figuring out how to do ethernet properly. There has been a lot of documentation written about this and they all make it sound like gigabit ethernet requires perfect precision engineering, impedance managed boards and a blessing from the ethernet gods themselves to work. This does not seem to match up with the reality that these switches are very very cheaply constructed and seem to work just fine. So I decided to open up a switch to check how many of these coupling capacitors and impedance matching planes are actually used in a real design. The answer seems to be that it doesn't matter that much.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1719962591/image.png" class="kg-image"></figure> <p>This is the design I have ended up with now but it is not what is on my test PCB. I got it almost right the first time though :D</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1719962813/image.png" class="kg-image"></figure> <p>The important parts seem to be matching the pair skew but matching the length of the 4 network pairs is completely useless, this is mainly because network cables don't have the same twisting rate for the 4 pairs and so the length of these are already significantly different inside the cable.</p> <p>The pairs between the transformer and the RJ45 jack has it's own ground plane that's coupled to the main ground through a capacitor. The pairs after the transformer are just on the main board ground fill.</p> <p>What I did wrong on my initial board revision was forgetting the capacitor that connects the center taps of the transformer on the switch side to ground making the signals on that side referenced to board ground. This makes ethernet very much not work anymore so I had to manually cut tiny traces on the board to disconnect that short to ground. In my test setup the capacitor just doesn't exist and all the center taps float. This seems to work just fine but the final design does have that capacitor added.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1720003020/fixed.JPG" class="kg-image"><figcaption>Cut ground traces on the ethernet transformer</figcaption></figure> <p>The end result is this slightly weird gigabit switch. It has 4 ports facing one direction and one facing backwards and it is powered over a 2.54mm pinheader. I have also added a footprint for a USB Type-C connector to have an easy way to power it without bringing out the DuPont wires.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1720007603/IMG_20240626_221246.jpg" class="kg-image"></figure> <h2>Connecting it to Linux</h2> <p>For my test setup I've picked the PINE64 A64-lts board since it has the connectors roughly in the spots where I want them. It not being an x86 platform is also pretty important because configuration requires a device tree change, can't do that on a platform that doesn't use device trees.</p> <p>The first required thing was rebuilding the kernel for the board since most kernels simply don't have these kernel modules enabled. For this I enabled these options:</p> <ul><li><code>CONFIG_NET_DSA</code> for the Distributed Switch Architecture system</li> <li><code>CONFIG_NET_DSA_TAG_RTL8_4</code> for having port tagging for this Realtek switch chip</li> <li><code>CONFIG_NET_SWITCHDEV</code> the driver system for network switches</li> <li><code>CONFIG_NET_DSA_REALTEK</code>, <code>CONFIG_NET_DSA_REALTEK_SMI</code>, <code>CONFIG_NET_DSA_REALTEK_RTL8365MB</code> for the actual switch chip driver</li> </ul> <p>Then the more complicated part was figuring out how to actually get this all loaded. In theory it is possible to create a device tree overlay for this and get it loaded by U-Boot. I decided to not do that and patch the device tree for the A64-lts board instead since I'm rebuilding the kernel anyway. The device tree change I ended up with is this:</p> <pre><code>diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-lts.dts b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-lts.dts index 596a25907..10c1a5187 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-lts.dts +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64-pine64-lts.dts @@ -18,8 +18,78 @@ led { gpios = &lt;&amp;r_pio 0 7 GPIO_ACTIVE_LOW&gt;; /* PL7 */ }; }; + +switch { + compatible = &quot;realtek,rtl8365rb&quot;; + mdc-gpios = &lt;&amp;pio 2 5 GPIO_ACTIVE_HIGH&gt;; // PC5 + mdio-gpios = &lt;&amp;pio 2 7 GPIO_ACTIVE_HIGH&gt;; // PC7 + reset-gpios = &lt;&amp;pio 8 5 GPIO_ACTIVE_LOW&gt;; // PH5 + realtek,disable-leds; + + mdio { + compatible = &quot;realtek,smi-mdio&quot;; + #address-cells = &lt;1&gt;; + #size-cells = &lt;0&gt;; + + ethphy0: ethernet-phy@0 { + reg = &lt;0&gt;; + }; + + ethphy1: ethernet-phy@1 { + reg = &lt;1&gt;; + }; + + ethphy2: ethernet-phy@2 { + reg = &lt;2&gt;; + }; + + ethphy3: ethernet-phy@3 { + reg = &lt;3&gt;; + }; + + ethphy4: ethernet-phy@4 { + reg = &lt;4&gt;; + }; + }; + + ports { + #address-cells = &lt;1&gt;; + #size-cells = &lt;0&gt;; + + port@0 { + reg = &lt;0&gt;; + label = &quot;cpu&quot;; + ethernet = &lt;&amp;emac&gt;; + }; + + port@1 { + reg = &lt;1&gt;; + label = &quot;lan1&quot;; + phy-handle = &lt;&amp;ethphy1&gt;; + }; + + port@2 { + reg = &lt;2&gt;; + label = &quot;lan2&quot;; + phy-handle = &lt;&amp;ethphy2&gt;; + }; + + port@3 { + reg = &lt;3&gt;; + label = &quot;lan3&quot;; + phy-handle = &lt;&amp;ethphy3&gt;; + }; + + port@4 { + reg = &lt;4&gt;; + label = &quot;lan4&quot;; + phy-handle = &lt;&amp;ethphy4&gt;; + }; + }; +}; }; </code></pre> <p>It loads the driver for the switch with the <code>realtek,rtl8365rb</code>, this driver supports a whole range of Realtek switch chips including the RTL8367S I've used in this design. I've removed the CPU ports from the documentation example and just added the definitions of the 5 regular switch ports.</p> <p>The important part is in <code>port@0</code>, this is the port that is facing backwards on my switch and is connected to the A64-lts, I've linked it up to <code>&emac</code> which is a reference to the ethernet port of the computer. The rest of the ports are linked up to their respective PHYs in the switch chip. </p> <p>In the top of the code there's also 3 GPIOs defined, these link up to SDA/SCL and Reset on the switch PCB to make the communication work. After booting up the system the result is this:</p> <pre><code>1: lo: &lt;LOOPBACK&gt; mtu 65536 qdisc noop state DOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: &lt;BROADCAST,MULTICAST&gt; mtu 1508 qdisc noop state DOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 3 lan1@eth0: &lt;BROADCAST,MULTICAST,M-DOWN&gt; mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 4 lan2@eth0: &lt;BROADCAST,MULTICAST,M-DOWN&gt; mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 5 lan3@eth0: &lt;BROADCAST,MULTICAST,M-DOWN&gt; mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 6 lan4@eth0: &lt;BROADCAST,MULTICAST,M-DOWN&gt; mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff</code></pre> <p>I have the <code>eth0</code> device here like normal and then I have the 4 interfaces for the ports on the switch I defined in the device tree. To make it actually do something the interfaces actually need to be brought online first:</p> <pre><code>$ ip link set eth0 up $ ip link set lan1 up $ ip link set lan2 up $ ip link set lan3 up $ ip link set lan4 up $ ip link 1: lo: &lt;LOOPBACK&gt; mtu 65536 qdisc noop state DOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1508 qdisc mq state UP qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 3: lan1@eth0: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc noqueue state LOWERLAYERDOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 4: lan2@eth0: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc noqueue state LOWERLAYERDOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 5: lan3@eth0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff 6: lan4@eth0: &lt;NO-CARRIER,BROADCAST,MULTICAST,UP&gt; mtu 1500 qdisc noqueue state LOWERLAYERDOWN qlen 1000 link/ether 02:ba:6f:0c:21:c4 brd ff:ff:ff:ff:ff:ff</code></pre> <p>Now the switch is up you can see I have a cable plugged into the third port. This system hooks into a lot of the Linux networking so it Just Works(tm) with a lot of tooling. Some examples:</p> <ul><li>Add a few of the lan ports into a standard Linux bridge and the switchdev system will bridge those ports together in the switch chip so Linux doesn&#x27;t have to forward that traffic.</li> <li>Thinks like <code>ethtool lan3</code> just work to get information about the link. and with <code>ethtool -S lan3</code> all the standard status return info which includes packets that have been fully handled by the switch.</li> </ul> <h2>Limitations</h2> <p>There's a few things that makes this not very nice to work with. First of all the requirement of either building a custom network switch or tearing open an existing one and finding the right connections. </p> <p>It's not really possible to use this system on regular computers/servers since you need device trees to configure the kernel for this and most computers don't have kernel-controlled GPIO pins available to hook up a switch.</p> <p>As far as I can find there's also no way to use this with a network port on the computer side that's not fixed, USB network interfaces don't have a device tree node handle to refer to to set the conduit port.</p> <p>There is a chance some of these limitations are possible to work around, maybe there's some weird USB device that exposes pins on the GPIO subsystem, maybe there's a way to load switchdev without being on an ARM device but that would certainly take a bit more documentation...</p> Building a DSMR reading boardhttps://blog.brixit.nl/building-a-dsmr-reading-board/101ElectronicsMartijn BraamMon, 27 May 2024 10:08:00 -0000<p>Quite a while back I designed a small PCB for hooking up sensors to an ESP8266 module to gather data and have a nice Grafana dashboard with temperature readings. While building this setup I grabbed one of my spare ESP8266 dev boards and soldered that to the P1 port on my smart energy meter to log the power usage of the whole house.</p> <p>For those unfamiliar with P1 and DSMR since it's quite regional: DSMR is the Dutch Smart Meter Requirements specification. It's a document that describes the connectivity of various ports on energy meters and it's used in the Netherlands and a few countries around it. One of the specifications within DSMR is P1 which is the document for the RJ12 connector for plugging third party monitoring tools.</p> <h2>Electrical design</h2> <p>So the first part in the project is figuring out how to make the hardware itself work. I've decided to slightly modernize my design so this is the first module I build that uses an ESP32 chip instead of the older ESP8266 I have in use everywhere.</p> <p>I specifically designed the board around the ESP32-S3-WROOM-1 module. This makes things significantly simpler than my older designs since there's no programming circuitry required.</p> <p>The design contraints I've used for this board are:</p> <ul><li>Make the design handsolderable instead of using JLCPCB assembly service for the boards. I enjoy designing boards and I probably should get some more actual soldering experience with SMD parts.</li> <li>Connect to the smart energy meter using off the shelf RJ12 cables instead of soldering wires on the board. Also include a passive P1 splitter on the board since it&#x27;s becoming more and more popular to have charging points for electric cars which can also be hooked up to the P1 port.</li> <li>USB-C for programming and power. I have a back-up programming header footprint on the board but it shouldn&#x27;t be necessary . Just like putting USB-B Micro ports on devices shouldn&#x27;t be allowed anymore in 2024.</li> </ul> <p>The schematic is basically the absolute minimum required to get the ESP32 module up and running. Since the ESP32-S3 has native USB support it means that I can drop a whole bunch of parts from the schematic that is normally required for programming.</p> <p>Another neat feature is that this chip has a fully connected I/O mux inside which means I can basically pick any random GPIO pin and later configure in software which hardware block in the chip it hooks up to. This feature is also available in other chips like the RP2040 but there it's way more limited. There is only a few valid choices for pins for UART1 for example and there's no way to swap RX and TX on a board without using a soldering iron.</p> <p>In the DSMR design I'll only be using a single GPIO pin configured to be the RX of one of the hardware UARTS so it can receive the data.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716744001/image.png" class="kg-image"><figcaption>The full schematic for the DSMR module</figcaption></figure> <p>The board has a jumper on it to pick how the data requests are handled. In one position the line is connected straight to 5V so the energy meter will just continuously send the data over the P1 port, which should be correct in most situations. The other mode connects the data request line to the pass-through port so the device connected after the DSMR module can select when the data should be sent, in this case the module will just be passively sniffing the traffic whenever it's sent.</p> <p>There is also a solder jumper on the board to select whether it's powered from the USB connection or from the P1 port itself. According to the P1 specifications the power supplied by most energy meters won't be enough to reliably run the ESP32 module so there is always to option to power it from the USB-C port. In most cases there will be random device like a router nearby that has a powered USB connection to run the module.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716744605/image.png" class="kg-image"><figcaption>The PCB for the DSMR module</figcaption></figure> <p>I managed to fit everything on a small 40x40mm PCB by having the the two RJ12 connectors hang off the board. This also makes those neatly line up with the edges of the case for this board. The ESP32 module also hangs off the edge of the board, pictured at the top here. This is because the antenna part of the ESP module has a keepout area where there shouldn't be any copper on the board anyway to not disturb the WiFi signal.</p> <p>To make it fit I've also moved the power regulator and a few passives to the bottom side of the board. Normally it would be pretty expensive to do this but since I'm hand soldering these boards using both sides of the board is free.</p> <p>Another upside with handsoldering the boards is that I'm not limited to the JLCPCB parts for once and I can just whatever random parts I can source. I've decided to get these boards made by Aisler this time so the boards are made in Germany instead of China and the boards just look great. I normally use the KiCad Aisler Push plugin in my workflow anyway just to run some sanity checks on the board design in addition to the checks ran in KiCad.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716745405/image.png" class="kg-image"><figcaption>Some PCB design notes generated by Aisler</figcaption></figure> <p>In this screenshot it's just warning me about the holes for the plastic pins in the footprint of the USB-C connector. It doesn't matter whether those are copper plated or not so I just ignore this warning. After receiving the boards I've also checked this and the holes did not get copper plated at all.</p> <p>I also noticed there was a <a href="https://community.aisler.net/t/introducing-design-in-europe-by-aisler-and-wurth-elektronik/3886">discount for using components from Würth Elektronik</a> on the board so naturally I took that as a challenge to see how for I can push that. It turns out that in this design the answer is pretty far: The easy thing is to use WE part numbers for the passives on the board.</p> <p>The second thing I replaced is the connectors on the board. I was already pretty happy I did not have to make custom footprints for once because that's just a time consuming job. After comparing a lot of footprints it turns out that WE makes an RJ12 connectors with the exact footprint I had already made. The USB-C connector is a bit more difficult since I'm used to picking the one available at JLCPCB that also happens to have a footprint already in KiCad. It turns out that the Würth Elektronik USB-C connector fits on the <code>USB_C_Receptacle_GCT_USB4105-xx-A_16P_TopMnt_Horizontal</code> footprint in KiCad. I guess there's simply not too many different ways to make spec compliant USB-C connectors.</p> <h2>The soldering</h2> <p>So the board is relatively straightforward to solder, it doesn't have that many parts and I decided to not pick the tiniest SMT parts I could find. If you're not familiar with SMT parts, they get delivered on reels and if you order small quantities you get a smaller cut-off portion of a reel. Here's some capacitors for example:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716749877/20240526_0008.jpg" class="kg-image"></figure> <p>These are the 22uF 0805 capacitors I've ordered for the ESP32 power rail. The rest of the passives are the slightly smaller 0603 packages. On the right of the board you can see the pads for the 0805 capacitor and the 0603 capacitor side by side. In this case the Würth capacitors came in a neat transparent piece of tube. If you order a full reel of cheap resistors they usually come on paper tape which looks like this:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716750066/IMG_20240526_205530.jpg" class="kg-image"><figcaption>This is 5000 resistors in 0603 size</figcaption></figure> <p>This is probably one of few times I've ever ordered 5000 of something. Even the bigger parts like the ESP32 modules come on a reel:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716750232/IMG_20240526_210232.jpg" class="kg-image"><figcaption>This strip holds 20 cores of pure computing power :D</figcaption></figure> <p>This soldering itself was quite time consuming but not that hard. The secret is to just use a lot of flux and not use plastic tweezers that melt while soldering. </p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716750417/20240526_0020.jpg" class="kg-image"></figure> <p>Maybe after some experience I can even get them on straight. Supposedly it's a lot easier and neater to use solder paste and stencils to do this so I also ordered the stencils for this PCB. I don't have the soldering paste and extra equipment for it though so that experiment will have to wait a bit longer.</p> <h2>The case</h2> <p>To have this look somewhat neat when it's finished I also need a case for the board. The need for cases has annoyed me several times already when designing PCBs. The options are either grabbing an off the shelf project box which is usually expensive and never fits exactly with the design or just 3D printing a case which involves me angrily staring at CAD software to figure out how to make things fit and match up to my PCB.</p> <p>I've already written a blog post about my solution here which is <a href="https://blog.brixit.nl/automatic-case-design-for-kicad/">TurboCase</a>. It's a tool that automatically generates a case based on the PCB file from KiCad. I wrote the software specifically because I needed it for this PCB so naturally the case generated by it works perfectly for this project :)</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716746584/image.png" class="kg-image"><figcaption>TurboCase generated case for this board</figcaption></figure> <p>Since writing the blog post about the details on how TurboCase works I've added support for TurboCase-specific footprints. I now have a global KiCad library that holds footprints that are designed to go on the <code>User.6</code> layer to add prefab features to the final OpenSCAD file like screw holes in the case and the hole for by USB-C connectors.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716746734/image.png" class="kg-image"><figcaption>The contents of the User.6 layer of my board design</figcaption></figure> <p>Here you can see the drawing on the <code>User.6</code> layer of the DSMR module. It contains several features:</p> <ul><li>The outline of the case drawn using the graphic elements in KiCad. This functions the same way as the board outline normally works on the <code>Edge.Cuts</code> layer except it describes the outline of the inside of the case.</li> <li>Three mounting holes that fit countersunk M3 sized screws to mount the case on the wall.</li> <li>A M3-sized keyhole screw-mounting thingy so the board can be mounted in a less permanent way.</li> <li>The cube with the line in it marks a spot where turbocase makes an USB-C shaped hole matched up perfectly to the USB-C connector on the board.</li> <li>The RJ12 connectors already cross the border of the case in KiCad so there will be holes generated for them automatically in the case based on the connector outline on the fabrication layer.</li> <li>The 3 MountingHole footprints on the PCB will be used to generate mounting posts for the PCB inside the case, the main reason why I started making this tool to make sure those things align perfectly every time.</li> </ul> <p>I did a few iterations on the footprint library and did a series of test prints of the case to make sure everything is correct and I can happily report that everything just matches up perfectly.</p> <p>The few days I spend building and testing this tool now narrows down the whole process from kicad to physical product down to:</p> <ul><li>Draw outline in KiCad</li> <li>Run TurboCase on the <code>.kicad_pcb</code> file to get my OpenSCAD file.</li> <li>Export the OpenSCAD file to .STL and 3D print it.</li> </ul> <p>I hope this utility is useful for more people and I would love to see how it performs with other board designs!</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716747340/darktable.ZQFGO2.jpg" class="kg-image"></figure> <h2>The software side</h2> <p>The more annoying part of the whole process was dealing with the software. For the ESP8266 I've always used the Arduino IDE. This is mainly because I did not want to deal with the xtensa toolchain required to build software for these chips. I don't do complicated things with these modules and with the Arduino IDE they simply always just worked.</p> <p>Switching over to the ESP32 did mean that stuff stopped working automatically for me sadly, the base software worked great but I had some troubly getting the MQTT library to work. The dependency resolution in the Arduino IDE is very simple and it does not account for having different dependencies with the same include name so I tried PlatformIO for this project.</p> <p>The whole setup process for PlatformIO was pretty smooth until I got to platform selection. There's a few different variations of the ESP32-S3 module available and even a few more specific variations of the WROOM-1 module I put on my board. I dit however commit the sin of picking the 4MB flash version of the board. This is the only one that doesn't have a specific config in PlatformIO and the merge request for it was denied based on that it would be easy to override the platform config with a json file. I did not find how to do that easily so I picked one of the random dev boards that used the wroom-1 module with 4MB flash instead to use whatever changes they have made to the JSON file for the board to make this work.</p> <p>Now for actually implementing the P1 protocol: This is deceptively hard. The way the protocol works is that you pull the data request line high on the connector and then the energy meter will start spewing out datagrams over the serial port on either a 1 second or 10 second interval depending on the protocol version. These datagrams look something like this:</p> <pre><code>/XMX5LGBBFG10 1-3:0.2.8(42) 0-0:1.0.0(170108161107W) 0-0:96.1.1(4530303331303033303031363939353135) 1-0:1.8.1(002074.842*kWh) 1-0:1.8.2(000881.383*kWh) 1-0:2.8.1(000010.981*kWh) 1-0:2.8.2(000028.031*kWh) 0-0:96.14.0(0001) 1-0:1.7.0(00.494*kW) 1-0:2.7.0(00.000*kW) 0-0:96.7.21(00004) 0-0:96.7.9(00003) 1-0:32.32.0(00000) 1-0:32.36.0(00000) 0-0:96.13.1() 0-0:96.13.0() 1-0:31.7.0(003*A) 1-0:21.7.0(00.494*kW) 1-0:22.7.0(00.000*kW) 0-1:24.1.0(003) 0-1:96.1.0(4730303139333430323231313938343135) 0-1:24.2.1(170108160000W)(01234.000*m3) !D3B0</code></pre> <p>This looks to be a relatively simple line based protocol. The fields are identified by a numeric code called the "OBIS" code. This stands for OBject Identification System. Then there's values added in parenthesis after it. The difficulty in parsing this is that there can be multiple values in parentheses added after each OBIS field. This would not be terrifically hard to parse if the DSMR spec had nailed down the encoding a bit more but it merly specifies that the field ends in a newline.</p> <p>In my case some of the values themselves also contain newlines so this breaks the assumption you can parse this based simply based on lines and that combined by the variable (but unspecified) amount of values means that the parser code for this becomes quite nasty.</p> <p>Luckily I got to use one of the features I gained by switching to PlatformIO: Unit testing. The whole DSMR parser is moved to it's own class in the codebase and only gets a <code>Stream</code> reference to parse the DSMR datagrams from. This means I don't have to sit with my laptop in the hallway debugging the parser and I can check if the parser works with dumps from various smart energy meters.</p> <p>This all glued together means that the DSMR module will get the data from the P1 port and then sent it out over WiFi to my MQTT server with a separate topic for every field. And the result:</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1716749436/image.png" class="kg-image"><figcaption>Grafana showing the power consumption and production as measured by the smart energy meter</figcaption></figure> <p>The source for the hardware and firmware is available at <a href="https://git.sr.ht/~martijnbraam/dsmr-module">https://git.sr.ht/~martijnbraam/dsmr-module</a> and the board itself is also available on <a href="https://aisler.net/p/IWCEGPWL">https://aisler.net/p/IWCEGPWL</a></p> Automatic case design for KiCadhttps://blog.brixit.nl/automatic-case-design-for-kicad/100ElectronicsMartijn BraamWed, 15 May 2024 19:48:27 -0000<p>I don't generally get along great with CAD software with the exception of KiCad. I guess the UX for designing things is just a lot simpler when you only have 2 dimensions to worry about. After enjoying making a PCB in KiCad the annoying for me is always getting a case designed to fit the board.</p> <p>If I'm lucky I don't need many external holes to fit buttons or connectors and if I'm really lucky the mounting holes for the board are even in sensible locations. I wondered if there was a quick way to get the positions of the mounting holes into some 3D CAD software to make the mounting posts in the right position without doing math or measuring.</p> <p>But what's even better than importing mounting hole locations? Not having to build the case at all!</p> <h2>Turbocase</h2> <p>So the solution is just several hundred lines of Python code. I've evaluated a few ways of getting data out of KiCad to mess with it and initially the <code>kicad-cli</code> tool looked really promising since it allows exporting the PCB design to several vector formats without launching KiCad. After exporting a few formats and seeing how easy it would be to get the data into Python I remembered that the PCB design files are just s-expressions, so the easiest way is just reading the .kicad_pcb file directly.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715799912/image.png" class="kg-image"><figcaption>A snippet of the .kicad_pcb source file</figcaption></figure> <p>So with this there's several pieces of data the tool extracts for the case design:</p> <ul><li>The footprints with <code>MountingHole</code> in the name to place posts for threaded metal inserts in my 3d prints</li> <li>A case outline from a user layer. This has the same semantics as the Edge.Cuts layer except that it defines the shape of the inner edge of the case.</li> <li>Information about connectors to make holes in the case and to have placeholders in the final OpenSCAD file for easier modifications.</li> </ul> <p>The locations of the mounting holes is pretty easy to get up and running. By iterating over the PCB file the tool saves all the footprints that are mounting holes and for each of those footprints it locates the pad with the largest hole in it and saves that to make a mounting post in the case.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715800268/image.png" class="kg-image"></figure> <p>In my test PCB I already had some nice edge-cases to deal with since the specific footprint I used has vias in it which also counts as pads. The outer side of the pad is used as the diameter of the mounting post that will be generated and inside that a hole will be created that fits the bag of threaded metal inserts I happen to have here.</p> <h2>Case outlines</h2> <p>To make the actual case I initially planned to just grab the PCB outline and slightly enlarge it as a template. This turns out to have a few obvious flaws, for example the ESP32 module that I have hanging over the edge of the PCB. Grabbing the bounding box of the entire design would also be a quick fix but that would mean the case would be way too large again due to the keepout area of the ESP32 footprint. The solution is manually defining the shape of the case since this gives the most flexibility.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715800592/image.png" class="kg-image"></figure> <p>I picked the <code>User.6</code> layer as the case outline layer since it has a neat blue color in this KiCad theme. The semantics for the case outline is the same as what you'd normally do on the <code>Edge.Cuts</code> layer except that this defines the inner wall of the case. The turbocase utility will then add a wall thickness around it to make the 3D model of the case.</p> <h2>Connector holes</h2> <p>What use is a case without any holes for connections? This turned out to be a more difficult issue. For the connectors I would actually need some height information and KiCAD is still very much 2D. It is possible to link a 3D model to the footprint which is great to see how the final board will look like:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715801016/image.png" class="kg-image"></figure> <p>Sadly using this 3D model data would be quite difficult for turbocase since it would require an importer for all the various supported 3D model formats and then a way to grab the outline of a slice of the model.</p> <p>So I picked the uglier but simpler solution. Just give the connector a height in some metadata and treat them as cubes. For the footprint I use the bounding box of the <code>F.Fab</code> layer which should correspond relatively closely to the size of the connector. To store the 3rd dimension I simply added a property to the connectors I wanted to be relevant to the case design:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715801208/image.png" class="kg-image"></figure> <h2>Exporting the case</h2> <p>I decided to export the cases as OpenSCAD files. This is mostly because these are simple text files I can generate and I already have some experience with OpenSCAD design.</p> <p>A large part of the generated file is boilerplate code for doing the basic case components. After that it will export the case outline as a polygon and do the regular OpenSCAD things to it to make a 3D object.</p> <div class="highlight"><pre><span></span><span class="n">standoff_height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">5</span><span class="p">;</span><span class="w"></span> <span class="n">floor_height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1.2</span><span class="p">;</span><span class="w"></span> <span class="n">pcb_thickness</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1.6</span><span class="p">;</span><span class="w"></span> <span class="n">inner_height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">standoff_height</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pcb_thickness</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">11.5</span><span class="p">;</span><span class="w"></span> <span class="n">pcb_top</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">floor_height</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">standoff_height</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pcb_thickness</span><span class="p">;</span><span class="w"></span> <span class="n">box</span><span class="p">(</span><span class="mf">1.2</span><span class="p">,</span><span class="w"> </span><span class="mf">1.2</span><span class="p">,</span><span class="w"> </span><span class="n">inner_height</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">polygon</span><span class="p">(</span><span class="n">points</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[[</span><span class="mi">102</span><span class="p">,</span><span class="mi">145</span><span class="p">],</span><span class="w"> </span><span class="p">[</span><span class="mi">140</span><span class="p">,</span><span class="mi">145</span><span class="p">],</span><span class="w"> </span><span class="p">....</span><span class="w"> </span><span class="p">]);</span><span class="w"></span> <span class="p">}</span><span class="w"></span> </pre></div> <p>The pcb_thickness is also one of the variables exported from the KiCad PCB file and the <code>box(wall, bottom, height)</code> module creates the actual basic case.</p> <p>For the connector a series of cubes is generated and those are substracted from the generated box:</p> <div class="highlight"><pre><span></span><span class="c1">// J1 Connector_USB:USB_C_Receptacle_GCT_USB4105-xx-A_16P_TopMnt_Horizontal USB 2.0-only 16P Type-C Receptacle connector</span> <span class="n">translate</span><span class="p">([</span><span class="mf">104.15</span><span class="p">,</span><span class="w"> </span><span class="mi">119</span><span class="p">,</span><span class="w"> </span><span class="n">pcb_top</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="n">rotate</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">90</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="cp">#connector(-4.47,-3.675,4.47,3.675,3.5100000000000002);</span> <span class="c1">// J2 Connector_RJ:RJ12_Amphenol_54601-x06_Horizontal </span> <span class="n">translate</span><span class="p">([</span><span class="mi">115</span><span class="p">,</span><span class="w"> </span><span class="mf">131.9</span><span class="p">,</span><span class="w"> </span><span class="n">pcb_top</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="n">rotate</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">90</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="cp">#connector(-3.42,-1.23,9.78,16.77,11.7);</span> <span class="c1">// J3 Connector_RJ:RJ12_Amphenol_54601-x06_Horizontal </span> <span class="n">translate</span><span class="p">([</span><span class="mf">127.11</span><span class="p">,</span><span class="w"> </span><span class="mf">138.26</span><span class="p">,</span><span class="w"> </span><span class="n">pcb_top</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="n">rotate</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">270</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="cp">#connector(-3.42,-1.23,9.78,16.77,11.7);</span> </pre></div> <p>This also has some extraced metadata from the PCB to figure out what's what when editing the .scad file. The <code>connector</code> module is a simple helper for generating a cube from a bounding box that accounts for the origin of the connector not being in the center.</p> <p>Finally the mounting posts are added to the case as simple cylinders:</p> <div class="highlight"><pre><span></span><span class="c1">// H1</span> <span class="n">translate</span><span class="p">([</span><span class="mi">105</span><span class="p">,</span><span class="w"> </span><span class="mi">108</span><span class="p">,</span><span class="w"> </span><span class="mf">1.2</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="n">mount</span><span class="p">(</span><span class="mf">3.4000000000000004</span><span class="p">,</span><span class="w"> </span><span class="mf">6.4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">);</span><span class="w"></span> <span class="c1">// H3</span> <span class="n">translate</span><span class="p">([</span><span class="mi">121</span><span class="p">,</span><span class="w"> </span><span class="mi">140</span><span class="p">,</span><span class="w"> </span><span class="mf">1.2</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="n">mount</span><span class="p">(</span><span class="mf">3.4000000000000004</span><span class="p">,</span><span class="w"> </span><span class="mf">6.4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">);</span><span class="w"></span> <span class="c1">// H2</span> <span class="n">translate</span><span class="p">([</span><span class="mi">137</span><span class="p">,</span><span class="w"> </span><span class="mi">108</span><span class="p">,</span><span class="w"> </span><span class="mf">1.2</span><span class="p">])</span><span class="w"></span> <span class="w"> </span><span class="n">mount</span><span class="p">(</span><span class="mf">3.4000000000000004</span><span class="p">,</span><span class="w"> </span><span class="mf">6.4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">);</span><span class="w"></span> </pre></div> <p>The extracted information from the mounting hole is the 3.2mm drill diameter and the 6.2mm pad diameter. The inner hole is expanded by 0.2mm in this case to make the holes work with the metal inserts.</p> <h2>Further improvements</h2> <p>There's a lot of neat things that could be added to this. The major one being a lid for the case. This also would need a bunch more configurability to deal with mounting mechanisms for the lid like screw holes or some clips.</p> <p>The system could also be extended by producing a footprint library specifically for turbocase to signify where to add specific features to the case. This could be things like cooling holes, led holes. Maybe some fancier connector integration.</p> <p>The output from turbocase also suffers from the same issue a lot of OpenSCAD designs suffer from: it's very hard to add chamfers and fillets to make a less rectangular case. That would require someone with more OpenSCAD knowledge to improve the generated output.</p> <p>The source code for the turbocase tool is available at <a href="https://sr.ht/~martijnbraam/turbocase/">https://sr.ht/~martijnbraam/turbocase/</a> and the utility is available on pypi under the <code>turbocase</code> name.</p> Megapixels contributionshttps://blog.brixit.nl/megapixels-contributions/99MegapixelsMartijn BraamSat, 11 May 2024 14:45:17 -0000<p>I've been working on the code that has become libmegapixels for a bit more as a year now. It has taken several thrown-away codebases to come to a general architecture I was happy with and it it has been quite a task to split off media pipeline tasks from the original Megapixels codebase.</p> <p>After staring at this code for many months I thought I've made libmegapixels a nearly perfect little library. That's the problem with working on a codebase without anyone else looking at it.</p> <p>About two weeks ago libmegapixels and the general Megapixels 2.x codebase had it's first contact with external contributors and that has put a spotlight on all the low hanging fruit in documentation and codebase issues. A great example of that is this commit:</p> <div class="highlight"><pre><span></span><span class="gh">diff --git a/src/parse.c b/src/parse.c</span><span class="w"></span> <span class="gh">index bfea3ec..93072d0 100644</span><span class="w"></span> <span class="gd">--- a/src/parse.c</span><span class="w"></span> <span class="gi">+++ b/src/parse.c</span><span class="w"></span> <span class="gu">@@ -403,6 +403,8 @@ libmegapixels_load_file(libmegapixels_devconfig *config, const char *file)</span><span class="w"></span> <span class="w"> </span> config_init(&amp;cfg);<span class="w"></span> <span class="w"> </span> if (!config_read_file(&amp;cfg, file)) {<span class="w"></span> <span class="w"> </span> fprintf(stderr, &quot;Could not read %s\n&quot;, file);<span class="w"></span> <span class="gi">+ fprintf(stderr, &quot;%s:%d - %s\n&quot;,</span><span class="w"></span> <span class="gi">+ config_error_file(&amp;cfg), config_error_line(&amp;cfg), config_error_text(&amp;cfg));</span><span class="w"></span> <span class="w"> </span> config_destroy(&amp;cfg);<span class="w"></span> <span class="w"> </span> return 0;<span class="w"></span> <span class="w"> </span> }<span class="w"></span> </pre></div> <p>A simple patch that massively improves the usablility for people writing libmegapixels config files: Actually printing the parsing errors from libconfig when a file could not be read. Because I generally run libmegapixels through the IDE and have all the syntax highlighting etc set up for the files I simply haven't triggered this codepath enough to actually implement this part.</p> <p>These last two weeks there have also been some significantly more complicated fixes like tracing segfault issues in Megapixels 2.x which helps a lot with getting the new codebase ready for daily use. Figuring out some API issues in libmegapixels like not correctly setting camera indexes in the returned data. Also the config files have now been updated to work with the latest versions of the PinePhone Pro kernel instead of the year old build I've been developing against.</p> <h2>Video recording</h2> <p>I've been saying for a long time that video recording on the PinePhone won't be possible, especially not to the level of support on Android and iOS due to hardware limitations. The only real hope for proper video recording would be that someone gets H.264 hardware encoding to work on the A64 processor.</p> <p>I can happily report that I was wrong. Pavel Machek has made significant progress in PinePhone video recording with a few large contributions that implement the UI bits to add video recording. A new second postprocessing pipeline for running external video encoding scripts just like Megapixels already lets you write your own custom scripts for processing the raw pictures into JPEGs.</p> <p>Video recording is a complicated issue though, mainly due to the sheer amount of data that needs to be processed to make it work smoothly. On the maximum resultion of the sensor in the PinePhone the framerate isn't high enough for recording normal videos (unless you enjoy 15fps video files) but on lower resolutions the pipeline can run at normal video framerates. The maximum framerates from the sensor for this are 1080p at 30fps and 720p at 60fps.</p> <p>For 720p60 the bandwidth of the raw sensor data is 442 Mbps and for 1080P30 this is 497 Mbps. This is a third of the expected bandwidth because the raw sensor data is essentially a greyscale image where every pixels has a different color filter in front. This is too much data to write out to the eMMC or SD card to process later and the PinePhone also struggles already to encode 720p30 video live without even running a desktop environment.</p> <p>There are two implementations of video recording right now. One that saves the raw DNG frames to a tmpfs since RAM is the only thing that can keep up with the data rate. This should give you roughly 30 seconds of video recording capabilities and after that recording time it will take a while to actually encode the video.</p> <p>Pavel has posted an <a href="https://social.kernel.org/notice/AhFxeCMdslrRIhQjE8">example of this video recording</a> on his mastodon.</p> <p>The second way is putting the sensor in a YUV mode instead of raw data. This gives worse picture quality on the sensor in the PinePhone but the data format matches more closely to the way frames are stored in video files so the expensive debayer step can be skipped while video recording. This together with encoding H.264 video with the ultrafast preset should make it just about possible to record real-time encoded video on the PinePhone.</p> <h2>Many thanks</h2> <p>It's great to see contributions to Megapixels 2 and libmegapixels. It's a big step towards getting the Megapixels 2.x codebase production ready and it's simply a lot more fun to work on a project together with other people.</p> <p>It's great to have contributors working on the UI code, the camera support fixes for devices and the many bugfixes to the internals. It's also very helpful to actually have issues created by people building and testing the code on other distributions. This already ironed out a few issues in the build system.</p> <p>There also has been some nice contributions to the Megapixels 1.x codebase, all of those should by now already have been merged into your favorite PinePhone distribution :)</p> <p>The last few Megapixels update blogposts have all been around Megapixels 2.x and the supporting libraries so none of the improvements are immediately usable by actual PinePhone{,Pro} and Librem 5 users until there is an actual release. It will take a bunch more polish until feature parity with Megapixels 1.x is reached.</p> Moving to a RTOS on the RP2040https://blog.brixit.nl/moving-to-a-rtos-on-the-rp2040/96ElectronicsMartijn BraamMon, 06 May 2024 15:58:55 -0000<p>I've been working on a bunch of small projects involving microcontrollers. Currently a lot of them are based around the Raspberry Pi Pico boards because I like the development experience of those a lot. They have a decent SDK and cheap hardware to get started and the debugger works with gdb/openocd so it just integrates in all IDEs that support that.</p> <p>One of my current projects is making a fancy hardware controller for a bunch of video equipment I use. The main things that will be controlled are two PTZ cameras (those are cameras that have motors to move them). One stationary camera and the video switching equipment that that's hooked up to.</p> <p>Currently the control of the control of the PTZ cameras is done with an unbranded panel that looks suspiciously like the Marshall VS-PTC-200:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715003106/VS-PTC-200-Keyboard-PTZ-Compact-Controller.jpg" class="kg-image"><figcaption>(Image from marshall-usa.com)</figcaption></figure> <p>The performance of this controller is simply not very great, especially for the price. It was a €650 device several years ago and for that money it has very annoying squishy buttons and the cheapest analog joystick you could find. Most of the buttons are also not functional with the cameras in use since this seems to be optimized for security cameras. This connects to the cameras over an RS-485 bus.</p> <p>The second thing I want my panel to do is very basic ATEM video switcher control. Currently that's fully done using the software panel on the computer because the panels from Blackmagic Design are very expensive.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715003565/image.png" class="kg-image"><figcaption>There&#x27;s a tiny cheaper one now though. (from blackmagicdesign.com)</figcaption></figure> <p>After a bit of designing I figured the most minimal design I can get away with is 9 buttons, the joystick and a display for the user interface. The hardware design has gone through several iterations over the last year but I now have some PCBs with the 9 RGB buttons on it, the $10 joystick that was also in the Marshall-clone panel and to interface with the outside world it has the TP8485E to communicate with the cameras over RS-485 and a Wiznet W5500 module to communicate with the video switcher over ethernet.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1715003933/IMG_20240429_184436.jpg" class="kg-image"><figcaption>This includes a bunch of &quot;oops-the-wrong-pinout&quot; fixes...</figcaption></figure> <p>After a lot of fixing of the board I had made I now have all the hardware parts functional, but the difficult part of this project is the software.</p> <h2>Initial software</h2> <p>I first started creating the software like I do all the RP2040 based projects. A cmake project that pulls in the pico-sdk. To make anything work at all I dedicated the second core of the pico to dealing with the Wiznet module and the first core then handles all the user interface I/O. This worked fine to blink some leds and I did implement a DHCP client that ran on the second core. It did make implementing the rest of the system a lot more complicated. There's simply a lot of things that need to happen at once:</p> <ul><li>Draw an user interface on the display that&#x27;s somewhat smooth</li> <li>Send out VISCA commands over the RS-485 interface</li> <li>Respond to button presses</li> <li>Keep the entire network stack alive with multiple connections</li> </ul> <p>There's a bunch of things that need to happen on the network, the first of which is some actually standards complicant DHCP support. This would require keeping track of the expire times and occasionally talk to the DHCP server to keep the lease active. The second background task is making mDNS work. The ATEM video switcher IP can be autodiscovered using DNS-SD and it would be great to also announce the existence of the control panel.</p> <p>The ATEM protocol itself is also one of the harder parts to get right, the protocol itself is pretty simple but it does involve sometimes receiving a lot of data that exceeds the buffer size of the Wiznet module and the protocol has a very low timeout for disconnection for when you stop sending UDP datagrams to the ATEM.</p> <p>This all made me decide that it's probably better to switch to an RTOS for this project.</p> <h2>FreeRTOS</h2> <p>The first project I've looked into is FreeRTOS. This is technically already bundled inside the pico-sdk but all tutorials I've found for this download a fresh copy anyway so that's what I did. FreeRTOS seems to be the simplest RTOS I've looked at from this list, the main thing it provides is the RTOS scheduler and some communication between tasks. The simplest way I can show it is with some code:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;FreeRTOS.h&quot;</span><span class="cp"></span> <span class="n">TaskHandle_t</span><span class="w"> </span><span class="n">button_task</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="n">TaskHandle_t</span><span class="w"> </span><span class="n">led_task</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="n">QueueHandle_t</span><span class="w"> </span><span class="n">led_queue</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w"></span> <span class="kt">void</span><span class="w"> </span><span class="nf">buttonTask</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">param</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">get_button_pressed</span><span class="p">();</span><span class="w"></span> <span class="w"> </span><span class="n">xQueueSend</span><span class="p">(</span><span class="n">led_queue</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">state</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="p">}</span><span class="w"></span> <span class="kt">void</span><span class="w"> </span><span class="nf">ledTask</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">param</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">state</span><span class="p">;</span><span class="w"></span> <span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">xQueueReceive</span><span class="p">(</span><span class="n">led_queue</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">state</span><span class="p">,</span><span class="w"> </span><span class="n">portMAX_DELAY</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">gpio_put</span><span class="p">(</span><span class="n">LED_PIN</span><span class="p">,</span><span class="w"> </span><span class="n">state</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="w"> </span><span class="p">}</span><span class="w"></span> <span class="p">}</span><span class="w"></span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span> <span class="w"> </span><span class="n">xTaskCreate</span><span class="p">(</span><span class="n">buttonTask</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Button&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">128</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">button_task</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">xTaskCreate</span><span class="p">(</span><span class="n">ledTask</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Led&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">128</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">led_task</span><span class="p">);</span><span class="w"></span> <span class="w"> </span><span class="n">vTaskStartScheduler</span><span class="p">();</span><span class="w"></span> <span class="w"> </span><span class="c1">// Code will never reach here</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"></span> <span class="p">}</span><span class="w"></span> </pre></div> <p>Both the buttonTask and the ledTask function will seem to run in parallel and there's a few IPC systems to move data between the various tasks. The code above is not functional but I stripped it down to get the general usage across.</p> <p>I've used this for a few days to make an enormous mess of my codebase. I have created several tasks in my test project:</p> <ul><li>The buttonsTask that polls the i2c gpio expander to check if buttons have been pressed and then put a message on the button queue.</li> <li>The ledTask that sets the right RGB color on the right button by putting a message on the ledQueue;</li> <li>The mainTask that runs the main loop of the project that updates the state based on the button presses.</li> <li>The networkTask that communicates with the Wiznet module.</li> <li>The dhcpTask that is spawned by the networkTask when a network cable is plugged in.</li> <li>The mdnsTask that is spawned by the dhcpTask once an ip address is aquired.</li> <li>the atemTask that is spawned by the mdnsTask when it gets a response from an ATEM device.</li> <li>the viscaTask that does nothing but should send data out the RS-485 port.</li> </ul> <p>This is a lot of tasks and the hardware doesn't even do anything yet except appear on the network.</p> <p>I ran into a few issues with FreeRTOS. The main annoying one is that printf simply caused things to hang every single time which makes debugging very hard. Sure the gdb debugger works but it's not neat for dumping out DHCP traffic for example.</p> <p>The FreeRTOS also doesn't seem to provide any hardware abstraction at all which means all the code I wrote to communicate with the various chips is not easily re-used.</p> <p>After a few days I created a new clean FreeRTOS project and started porting the various functionalities from the previous version over to try to get a cleaner and more manageable codebase but ended up giving up because blind debugging because there's no serial output is quite annoying. I decided to look what the alternatives have to offer.</p> <h2>Apache NuttX</h2> <p>Another seemingly popular RTOS is NuttX. This project seems a lot closer to what you'd expect from a regular operating system. It makes your microcontroller look like an unix system.</p> <p>First thing the tutorial tells me to do is fetching the pico-sdk and set the environment variable. No problem, I already have the sdk in /usr/share and that environment variable already exists on my system. Suprisingly this made the build fail because NuttX decides that it really needs to overwrite the version.h file in my pico-sdk for which it doesn't have permissions... why...</p> <p>After doing the initial setup of building a minimal NuttX firmware for my board I connected to the serial port and was greeted by an actual shell.</p> <pre><code>nsh&gt; uptime 00:01:34 up 0:01, load average: 0.00, 0.00, 0.00 nsh&gt; uname NuttX nsh&gt; uname -a NuttX 12.5.1 9d6e2b97fb May 6 2024 15:18:54 arm raspberrypi-pico</code></pre> <p>It looks like I'd just be able to write an app for this operating system and have it auto-launch on boot. Since this tries to do the Unix thing it also has a filesystem of course so the hardware has FS abstractions like <code>/dev/i2c0</code> and <code>/dev/adc0</code>. </p> <p>One thing I liked a lot was that it's build around menuconfig/Kconfig which I'm already used to for Linux development. This also means there's an actual hardware driver system and the GPIO expander chip I've used for the buttons already had a driver. The menuconfig system also allows me to configure the pin muxing of the rp2040 chip so I don't have to keep constants around with pin numbers and do a bunch of hardware setup to make my i2c bus work. I can just go into the menuconfig and tell it that i2c0 of the pico is used and that it's on two specific pins. I've also enabled the i2c testing utility as one of the apps that will be build into the firmware.</p> <pre><code>nsh&gt; i2c dev 0 79 0 1 2 3 4 5 6 7 8 9 a b c d e f 00: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 70: -- -- -- -- -- -- -- -- -- -- nsh&gt; </code></pre> <p>Well uuuuh... yup the basics aren't working. I've spend a bit of time going through the rp2040 setup code and the various i2c related menuconfig options but it seems like this just doesn't really work...</p> <p><b>Update: This actually works fine but I managed to make a config mistake that broke the i2c bus, I have gotten the board to work more now on NuttX and will write a more detailed update post in the future for this. Consider the rest of this section as most likely invalid.</b></p> <p>I also have not figured out yet how I can tell NuttX that my gpio buttons are behind the gpio extender, or how to actually link the gpio extender to my non-functional i2c bus.</p> <p>Another thing that annoyed me is that I had to re-clone the nuttx repository multiple times simply because sometimes one of the configure.sh commands would fail which would leave the repository in an inconsistent state and the distclean command wouldn't work because the repository was in an inconsistent state. Really the classic "configure.sh: you are already configured; distclean: you are not configured yet"</p> <p>Unix-like seems great at first glance, but I don't really want to deal with filesystem paths on a microcontroller for a pretend filesystem. I also don't need a shell in my production system, it should just run my code.</p> <h2>Zephyr</h2> <p>So next on the list is Zephyr. This provides a python utility to set up a project which should make things a bit easier, or it's a sign something is terribly overcomplicated.</p> <p>The very first thing this project does is pull in 5GB of git repositories which includes the entire HAL library for every chip under the sun. The second thing it does is for some reason mess with my user-wide cmake stuff on my system.</p> <p>After that the tutorial told me to install the Zephyr SDK:</p> <blockquote>The <a href="https://docs.zephyrproject.org/latest/develop/toolchains/zephyr_sdk.html#toolchain-zephyr-sdk">Zephyr Software Development Kit (SDK)</a> contains toolchains for each of Zephyr’s supported architectures, which include a compiler, assembler, linker and other programs required to build Zephyr applications.<br><br>It also contains additional host tools, such as custom QEMU and OpenOCD builds that are used to emulate, flash and debug Zephyr applications.</blockquote> <p>Yeah no thanks, I have already several perfectly fine ARM toolchains and I don't really want to either build or fetch precompiled compilers for every architecture Zephyr supports, lets see if I can get away with not installing this.</p> <p>After some messing around I figured out how to get away with it. There need to be two command line options set for cross compiling:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span><span class="nb">export</span> <span class="nv">ZEPHYR_TOOLCHAIN_VARIANT</span><span class="o">=</span>cross-compile <span class="gp">$ </span><span class="nb">export</span> <span class="nv">CROSS_COMPILE</span><span class="o">=</span>/usr/bin/arm-none-eabi- <span class="gp">$ </span>west build -p always -b sparkfun_pro_micro_rp2040 samples/basic/blinky </pre></div> <p>One thing I also found out is that the Raspberry Pi Pico is not actually supported, only other boards that have the same SoC. No worries, these boards are practically the same. The very second issue I hit is that the blinky demo doesn't build because it requires <code>led0</code> to be defined to have something to blink.</p> <p>It turns out the Sparkfun pro Micro RP2040 does not actually have a simple gpio led to blink but a ws2812B adressable led. </p> <p>So I started following the custom board manual which told me to copy a random other board because that's how it always goes. Maybe if you already have a meta tool to set-up a project make it create this scaffolding.</p> <p>In the end I did not manage to build for my board because it simply wouldn't start to exist after fixing all the errors and warnings in the build.</p> <h2>Conclusion</h2> <p>Well at least with FreeRTOS I managed to building some of my own application. I guess I have to follow the online instructions of replacing printf with another printf implementation and make sure to call the different function everywhere.</p> <p>I'll probably continue on trying to get FreeRTOS to do the things I want since it's the only one that can be simply integrated in your own environment instead of the other way around.</p> Bootstrapping Alpine Linux without roothttps://blog.brixit.nl/bootstrapping-alpine-linux-without-root/98LinuxMartijn BraamWed, 20 Mar 2024 23:50:30 -0000<p>Creating a chroot in Linux is pretty easy: put a rootfs in a folder and run the <code>sudo chroot /my/folder</code> command. But what if you don't want to use superuser privileges for this?</p> <p>This is not super simple to fix, not only does the <code>chroot</code> command itself require root permissions but the steps for creating the rootfs in the first place and mounting the required filesystems like /proc and /sys require root as well.</p> <p>In pmbootstrap the process for creating an installable image for a phone requires setting up multiple chroots and executing many commands in those chroots. If you have the password timeout disabled in sudo you will notice that you will have to enter your password tens to hundreds of times depending on the operation you're doing. An example of this is shown in the long running "<a href="https://gitlab.com/postmarketOS/pmbootstrap/-/issues/2052#note_966447872">pmbootstrap requires sudo</a>" issue on Gitlab. In this example sudo was called 240 times!</p> <p>Now it is possible with a lot of refactoring to move batches of superuser-requiring commands into scripts and elevate the permissions of that with a single sudo call but to get this down to a single sudo call per pmbootstrap command would be really hard.</p> <h2>Another approach</h2> <p>So instead of building a chroot the "traditional" way what are the alternatives?</p> <p>The magic trick to get this working are user namespaces. From the Linux documentation:</p> <blockquote>User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs (see <a href="https://man7.org/linux/man-pages/man7/credentials.7.html">credentials(7)</a>), the root directory, keys (see <a href="https://man7.org/linux/man-pages/man7/keyrings.7.html">keyrings(7)</a>), and capabilities (see <a href="https://man7.org/linux/man-pages/man7/capabilities.7.html">capabilities(7)</a>). A process's user and group IDs can be different inside and outside a user namespace. In particular, a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace; in other words, the process has full privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace. </blockquote> <p>It basically allows running commands in a namespace where you have UID 0 on the inside without requiring to elevate any of the commands. This does have a lot of limitations though which I somehow all manage to hit with this.</p> <p>One of the tools that makes it relatively easy to work with the various namespaces in Linux is <code>unshare</code>. Conveniently this is also part of <code>util-linux</code> so it's a pretty clean dependency to have.</p> <h2>Building a rootfs</h2> <p>There's enough examples of using <code>unshare</code> to create a chroot without sudo but those all assume you already have a rootfs somewhere to chroot into. Creating the rootfs itself has a few difficulties already though.</p> <p>Since I'm building an Alpine Linux rootfs the utility I'm going to use is <code>apk.static</code>. This is a statically compiled version of the package manager in Alpine which allows building a new installation from an online repository. This is similar to <code>debootstrap</code> for example if you re more used to Debian than Alpine.</p> <p>There's a wiki page on running <a href="https://wiki.alpinelinux.org/wiki/Alpine_Linux_in_a_chroot">Alpine Linux in a chroot</a> that documents the steps required for setting up a chroot the traditional way with this. The initial commands to aquire the <code>apk.static</code> binary don't require superuser at all, but after that the problems start:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>./apk.static -X <span class="si">${</span><span class="nv">mirror</span><span class="si">}</span>/latest-stable/main -U --allow-untrusted -p <span class="si">${</span><span class="nv">chroot_dir</span><span class="si">}</span> --initdb add alpine-base </pre></div> <p>This creates the Alpine installation in <code>${chroot_dir}</code>. This requires superuser privileges to set the correct permissions on the files of this new rootfs. After this there's two options of populating /dev inside this rootfs which both are problematic:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>mount -o <span class="nb">bind</span> /dev <span class="si">${</span><span class="nv">chroot_dir</span><span class="si">}</span>/dev <span class="go">mounting requires superuser privileges and this exposes all your hardware in the chroot</span> <span class="gp">$ </span>mknod -m <span class="m">666</span> <span class="si">${</span><span class="nv">chroot_dir</span><span class="si">}</span>/dev/full c <span class="m">1</span> <span class="m">7</span> <span class="gp">$ </span>mknod -m <span class="m">644</span> <span class="si">${</span><span class="nv">chroot_dir</span><span class="si">}</span>/dev/random c <span class="m">1</span> <span class="m">8</span> <span class="go">... etcetera, the mknod command also requires superuser privileges</span> </pre></div> <p>The steps after this have similar issues, most of them for <code>mount</code> reasons or <code>chown</code> reasons.</p> <p>There is a few namespace options from <code>unshare</code> used to work around these issues. The command used to run <code>apk.static</code> in my test implementation is this:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>unshare <span class="se">\</span> --user <span class="se">\</span> --map-users<span class="o">=</span><span class="m">10000</span>,0,10000 <span class="se">\</span> --map-groups<span class="o">=</span><span class="m">10000</span>,0,10000 <span class="se">\</span> --setuid <span class="m">0</span> <span class="se">\</span> --setgid <span class="m">0</span> <span class="se">\</span> --wd <span class="s2">&quot;</span><span class="si">${</span><span class="nv">chroot_dir</span><span class="si">}</span><span class="s2">&quot;</span> <span class="se">\</span> ./apk-tools-static -X...etc </pre></div> <p>This will use <code>unshare</code> to create a new userns and change the uid/gid inside that to 0. This effectively grants root privileges inside this namespace. But that's not enough.</p> <p>If <code>chown</code> is used inside the namespace it will still fail because my unprivileged user still can't change the permissions of those files. The solution to that is the uid remapping with <code>--map-users</code> and <code>--map-groups</code>. In the example above it sets up the namespace so files created with uid 0 will generate files with the uid 100000 on the actual filesystem. uid 1 becomes 100001 and this continues on for 10000 uids. </p> <p>This again does not completely solve the issue though because my unprivileged user still can't chown those files, doesn't matter if it's chowning to uid 0 or 100000. To give my unprivileged user this permission the <code>/etc/subuid</code> and <code>/etc/subgid</code> files on the host system have to be modified to add a rule. This sadly requires root privileges <i>once</i> to set up this privilege. To make the command above work I had to add this line to those two files:</p> <pre><code>martijn:100000:10000</code></pre> <p>This grants the user with the name <code>martijn</code> the permission to use 10.000 uids starting at uid 100.000 for the purpose of userns mapping.</p> <p>The result of this is that the <code>apk.static</code> command will seem to Just Work(tm) and the resulting files in <code>${chroot_dir}</code> will have all the right permissions but only offset by 100.000.</p> <h2>One more catch</h2> <p>There is one more complication with remapped uids and <code>unshare</code> that I've skipped over in the above example to make it clearer, but the command inside the namespace most likely cannot start.</p> <p>If you remap the uid with <code>unshare</code> you get more freedom inside the namespace, but it limits your privileges outside the namespace even further. It's most likely that the <code>unshare</code> command above was run somewhere in your own home directory. After changing your uid to 0 inside the namespace your privilege to the outside world will be as if you're uid 100.000 and that uid most likely does not have privileges. If any of the folders in the path to the executable you want <code>unshare</code> to run for you inside the namespace don't have the read and execute bit set for the "other" group in the unix permissions then the command will simply fail with "Permission denied".</p> <p>The workaround used in my test implementation is to just first copy the executable over to <code>/tmp</code> and hope you at least still have permissions to read there.</p> <h2>Completing the rootfs</h2> <p>So after all that the first command from the Alpine guide is done. Now there's only the problems left for mounting filesystems and creating files.</p> <p>While <code>/etc/subuid</code> does give permission to use a range of uids as an unprivileged user with a user namespace it does not give you permissions to create those files outside the namespace. So the way those files are created is basically the complicated version of <code>echo "value" | sudo tee /root/file</code>: </p> <div class="highlight"><pre><span></span><span class="gp">$ </span><span class="nb">echo</span> <span class="s2">&quot;nameserver a.b.c.d&quot;</span> <span class="p">|</span> unshare <span class="se">\</span> --user <span class="se">\</span> --map-users<span class="o">=</span><span class="m">10000</span>,0,10000 <span class="se">\</span> --map-groups<span class="o">=</span><span class="m">10000</span>,0,10000 <span class="se">\</span> --setuid <span class="m">0</span> <span class="se">\</span> --setgid <span class="m">0</span> <span class="se">\</span> --wd <span class="s2">&quot;</span><span class="si">${</span><span class="nv">chroot_dir</span><span class="si">}</span><span class="s2">&quot;</span> <span class="se">\</span> sh -c <span class="s1">&#39;cat &gt; /etc/resolv.conf&#39;</span> </pre></div> <p>This does set-up and tear down the entire namespace for every file change or creation which is a bit inefficient, but inefficient is still better than impossible. Changing file permissions is done in a similar way.</p> <p>To fix the mounting issue there's the mount namespace functionality in Linux. This allows creating new mounts inside the namespace as long as you still have permissions on the source file as your unprivileged user. This effectively means you can't use this to mount random block devices but it works great for things like <code>/proc</code> and loop mounts.</p> <p>There is a <code>--mount-proc</code> parameter that will tell <code>unshare</code> to set-up a mount namespace and then mount <code>/proc</code> inside the namespace at the right place so that's what I'm using. But I still need other things mounted. This mounting is done as a small inline shell script right before executing the commands inside the chroot:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>unshare <span class="se">\</span> --user <span class="se">\</span> --fork <span class="se">\</span> --pid <span class="se">\</span> --mount <span class="se">\</span> --mount-proc <span class="se">\</span> --map-users<span class="o">=</span><span class="m">10000</span>,0,10000 <span class="se">\</span> --map-groups<span class="o">=</span><span class="m">10000</span>,0,10000 <span class="se">\</span> --setuid <span class="m">0</span> <span class="se">\</span> --setgid <span class="m">0</span> <span class="se">\</span> --wd <span class="s2">&quot;</span><span class="si">${</span><span class="nv">chroot_dir</span><span class="si">}</span><span class="s2">&quot;</span> <span class="se">\</span> -- <span class="se">\</span> sh -c <span class="s2">&quot; \</span> <span class="s2"> mount -t proc none proc ; \</span> <span class="s2"> touch dev/zero ; \</span> <span class="s2"> mount -o rw,bind /dev/zero dev/zero ;\</span> <span class="s2"> touch dev/null ; \</span> <span class="s2"> mount -o row,bind /dev/null dev/null ;\</span> <span class="s2"> ...</span> <span class="go"> chroot . bin/sh \</span> <span class="go"> &quot;</span> </pre></div> <p>The mounts are created right between setting up the namespaces but before the chroot is started so the host filesystem can still be accessed. The working directory is set to the root of the rootfs using the <code>--wd</code> parameter of <code>unshare</code> and then bind mounts are made from <code>/dev/zero</code> to <code>dev/zero</code> to create those devices inside the rootfs.</p> <p>This combines the two impossible options to make it work. <code>mknod</code> can still not work inside namespaces because it is a bit of a security risk. <code>mount</code>'ing /dev gives access to way too many devices that are not needed but the mount namespace does allow bind-mounting the existing device nodes one by one and allows me to filter them.</p> <p>Then finally... the <code>chroot</code> command to complete the journey. This has to refer to the rootfs with a relative path and this also depends on the working directory being set by <code>unshare</code> since host paths are breaking with uid remapping.</p> <h2>What's next?</h2> <p>So this creates a full chroot without superuser privileges (after the initial setup) and this whole setup even works perfectly with having cross-architecture chroots in combination with <code>binfmt_misc</code>. </p> <p>Compared to <code>pmbootstrap</code> this codebase does very little and there's more problems to solve. For one all the filesystem manipulation has to be figured out to copy the contents of the chroot into a filesystem image that can be flashed. This is further complicated by the mangling of the uids in the host filesystem so it has to be remapped while writing into the filesystem again.</p> <p>Flashing the image to a fastboot capable device should be pretty easy without root privileges, it only requires an udev rule that is usually already installed by the android-tools package on various Linux distributions. For the PinePhone flashing happens on a mass-storage device and as far as I know it will be impossible to write to that without requiring actual superuser privileges.</p> <p>The code for this is in the <a href="https://git.sr.ht/~martijnbraam/ambootstrap">~martijnbraam/ambootstrap</a> repository, hopefully in some time I get this to actually write a plain Alpine Linux image to a phone :D</p> <p></p> Digital audio mixer pt.2https://blog.brixit.nl/digital-audio-mixer-pt-2/97ElectronicsMartijn BraamSat, 09 Mar 2024 10:08:03 -0000<p>Since writing my <a href="https://blog.brixit.nl/building-a-digital-audio-mixer/">previous post</a> about the digital audio mixing I've made some significant progress. Initially my code was running on an off-the-shelf Teensy 4.1 and using only the digital input and output I could use directly with an external ADC/DAC. Shortly after writing that post I received the Teensy Audio Shield which makes the test setup a bit easier to deal with.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709912877/20240308_0054.jpg" class="kg-image"><figcaption>Teensy Audio Board Rev. D2</figcaption></figure> <p>This is a simple board that connects an NXP SGTL5000 codec chip to the Teensy. It provides a stereo in and output on the header at the top in this picture. It also has a fairly decent built-in headphone amplifier which is exposed with the 3.5mm jack at the bottom of the picture.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709913077/image.png" class="kg-image"><figcaption>SGTL5000 block diagram from the datasheet</figcaption></figure> <p>With this board I replaced the S/PDIF input and output in the code with an i2s interface which completed this into a self-contained 2-input and 2-output audio mixer. But how to expand from here? That's more in the realm of custom hardware since the Teensy Audio Shield is practically the only board you can order for Teensy Audio.</p> <p>So a custom board... There's a few other supported codecs by the audio library. The issue is that most of the codecs in the list are EOL or not recommended for new designs. They are also mostly out of stock so I postponed this idea for a bit.</p> <h2>FOSDEM 2024</h2> <p>So roughly at this point in the process <a href="https://fosdem.org/2024/">FOSDEM</a> happened. Not only is this a very interesting open source software event but it also manages to run ~30 concurrent live streams and in-room audio mixes with an ad-hoc setup with only about a tenth of the amount of personnel you'd expect to be needed to pull this off.</p> <p>To pull this off FOSDEM uses custom build "<a href="https://archive.fosdem.org/2020/schedule/event/videobox/">video boxes</a>" that contain half the equipment needed to run all the multimedia in every room. Two of these (identical) boxes are put in each room for the complete setup. This setup has evolved a lot over the years.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709914101/fosdem-2020-video-box.JPG" class="kg-image"><figcaption>FOSDEM video box from 2020</figcaption></figure> <p>This is one of the examples: the nice laser-cut video boxes used from ~2015 to 2023. These have the job of capturing from the HDMI video inputs and send off the video from the connected camera and speaker laptop to be encoded for the live-stream. This is a very nice and compact solution for deploying a room for FOSDEM and during the event these are controlled remotely from the central operations center allowing only a few people to monitor and manage all the video streams.</p> <p>But what about the audio? There's a microphone for the speaker and one or two microphones for audience questions but these don't hook up to the custom boxes. The room audio in most rooms is handled by a Yamaha MG10 audio desk. This is easily enough for mixing together the audio from 4 sources but has one major downside: you have to be physically present to turn the knobs to adjust anything.</p> <p>While video is usually pretty great while watching back the talks I've noticed there's sometimes a few audio issues like microphones that are clipping. The perfect solution for this all is a digital audio mixer that can be controlled remotely of course, but those are way larger and more expensive.</p> <p>It turns out FOSDEM is that perfect target for a 4-in 4-out audio mixer that is controlled over USB instead of physical controls. I'm very glad I managed to meet up with the FOSDEM video team, which lead to...</p> <h2>FOSDEM Audio Board</h2> <p>So the FOSDEM setup has a few very interesting constraints for an audio mixer:</p> <ul><li>There&#x27;s two audio mixes, one for in-room audio and one for the live-stream</li> <li>All mixes are mono, there&#x27;s not much sense in stereo for running a few microphones.</li> <li>All sources are line-level. The microphones at FOSDEM are all wireless and the receivers can output line-level signals so no need for microphone pre-amps. This massively simplifies the design of the analog inputs.</li> <li>Since no condenser microphones or phantom-powered DI boxes are used no +48V phantom power supply is required.</li> <li>The current iteration of video boxes are inside 19&quot; 1U rack cases which constrains the size of the audio mixer a lot.</li> </ul> <p>So I have practically no experience with designing audio gear. Luckily the additional constraints massively simplify the design which is great for cost optimization as well. So I did the most dangerous thing a software developer can do: I launched Kicad.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709916669/image.png" class="kg-image"><figcaption>PCB of the FOSDEM Audio Interface rev.A</figcaption></figure> <p>For the design I decided to put two of the SGTL5000 codecs on the board. It's one of the few supported codecs that are still available and they already deal nicely with line-level signals. Another great feature of these chips is that they include analog gain control which saves me from having to implement a digital-controlled analog gain circuit which sounds difficult and expensive. Having a built-in headphone amp is also great for adding a headphone connection for monitoring in the room.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709917075/image.png" class="kg-image"><figcaption>This is not only my block diagram but also the schematic itself thanks to Kicad sub-schematics :D</figcaption></figure> <p>This is how the hardware is connected internally. There are 3 analog XLR inputs for connecting the microphone receivers and the fourth input is a 3.5mm jack that is hooked up for some simple analog mono-summing of the incoming signal. This 3.5mm jack will connect to the audio output of the HDMI capture card connected to the laptop of the presenter.</p> <p>One of the codecs also provides the 2 XLR outputs. One connects to the existing audio speakers in the room and the other connects to the audio input of the camera. The headphone connector is connected to the outputs of the second codec so that audio mix can be controlled separately in software.</p> <p>The only thing that needs to happen in addition to the schematic of the original Teensy Audio Shield is dealing with balanced signals. Sadly there isn't a "getting started with designing audio interfaces" book but I found the brilliant website from <a href="https://sound-au.com/articles/balanced-io.htm">Elliot Sound Products</a> that has a lot of information on these circuits. The inputs and outputs have a pair of opamps to convert the signals. This is implemented with TL072 opamps because they are cheap, available and have 2 opamps in a single chip. This means the whole input circuit is a single chip and a few passives.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709917710/image.png" class="kg-image"><figcaption>A single balanced output channel on the audio board</figcaption></figure> <p>The other part that needed figuring out is the power part. This surprisingly was a lot more work than the actual analog audio handling. The whole design is powered from 5V from the USB port but there is a lot of separate voltages needed to run all the audio hardware.</p> <p>The codec chips need two 3.3v rails and one 1.8v rail to function. One of the 3.3v rails is for the digital part and one for the analog part. The opamp circuits are even more complicated because they need a positive and negative voltage rail to function. </p> <p>The exact voltage for the opamps does not matter much but it has to be high enough that they are always above the analog audio input levels and because these are cheap opamps it needs a few volts extra because the TL072 cannot process signals close to the supply voltage which results in distortion. On the other hand the output voltage needs to be below 40 volts because I'm building an audio interface and not a smoke machine. In this design the supplies are +9V and -9V which brings the total voltage on the opamp to 18V.</p> <p>To generate the positive and negative 9V rails I first generate +12V and -12V with a switching regulator and then feed those into an LDO to filter out the switching noise from the switching regulator. After dealing with all this I now finally understand why so much of the audio gear has old-school transformers to power them: it makes it very easy to make a dual-rail supply.</p> <p>It was not easy to figure out how to get the dual rails from 5V at all. To start I decided to open up one of the USB powered audio interfaces I already had and see what the designers of that device did to fix this. In this case it was a Tascam US-2x2. </p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709918861/P1470213.JPG" class="kg-image"></figure> <p>This is a picture of the power supply section of that audio interface. It contains a lot of different voltage regulators to make the various rails. It has to deal with a few extra voltages compared to my design since this also has +48V phantom power. After measuring it the main negative rail of this board was generated by the 34063 chip at the top of that picture. This is used as an inverting switching regulator in this case. The positive rail for opamps in this design is generated by the tiny chip labelled U26 all the way on the bottom of the picture, I've not been able to identify this chip.</p> <p>This all together lead to my initial design:</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709919148/image.png" class="kg-image"></figure> <p>The top left part of the board generates the voltages for the opamp circuits and the top right has the regulators for the codecs (and the PC input jack). This version of the power supply was not complete yet since it was lacking a few capacitors and I found out that the inductor I selected was way too small to function correctly.</p> <p>This revision of the power supply was scrapped because with the correct inductor the power supply simply became too large for the board and I didn't want to make the board any larger since the inside of the FOSDEM box is very space constrained.</p> <p>The MCP34063 is also decades old technology by now. It's a switching regulator that runs at 100Khz max. In this design the switching frequency would be ~60Khz but this results in needing a large capacitor and inductor on the board.</p> <p>In the current revision of the board this has been replaced with the TPS65130 regulator. This is a way more modern switching regulator running at 1.3Mhz instead. This chip is a bit more expensive but it generates both the positive rail and negative rail with a single chip and due to the order of magnitude larger switching frequency the inductor and capacitor can be way smaller. The end result is a more compact and cheaper power supply.</p> <figure class="kg-card kg-image-card kg-width-wide"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709919624/image.png" class="kg-image"></figure> <p>This is the board that was ordered as prototype. It's mostly the same as the board shown above but it has an extra header exposing a few I/O pins of the teensy for prototyping and the PC input connector has moved so the jack will be above the board to waste less space in the case.</p> <h2>The actual hardware</h2> <p> After waiting some days I received this partially assembled board:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709919939/PXL_20240223_094706542.jpg" class="kg-image"></figure> <p>After soldering on the connectors I powered it on and checked the voltage rails, it all seemed fine. Then after powering it on a second time it started making a screeching coil-whine sound and within a second the 12V generator made the bottom of the PCB too hot to touch. After initially working for a bit I didn't get it to generate the correct voltages again, even after restarting the board. Sometimes when powering it on it heated up again, sometimes it didn't but for some reason the output of the +/- 12V supply was +6V/-4.5V instead.</p> <p>This issue plagued me for some days until one day I had plugged in my headphones in the board while measuring things with the multimeter and suddenly the weird noise from the inputs disappeared. This happened when I had the probe of the multimeter on one of the pads of the diode for the negative supply. </p> <p>It turns out that the solder connection on that diode was not reliable and after heating up that pad with a soldering iron for a second I managed to get the board running at the right voltage again... but only sometimes. At least the board was now reliable enough that I could work a bit on the firmware and all the functionality that didn't depend on the opamps was working great.</p> <p>This is one of the moments where it's a massive help when other people double-check your schematics. It turns out I copy-pasted a few capacitors and forgot to adjust the values. Specifically I had a capacitor in the feedback path for the switching regulator that was two orders of magnitude too large. It turns out those capacitors were optional anyway according to the datasheet so after removing those from the power supply it already became a lot more reliable. Still it sometimes failed to start and unreliable equipment in a live environment is a non-starter.</p> <p>After verifying everything it turned out that there were more capacitors with wrong values and sadly these capacitors were actually required. So I took the boards to someone with actual electronics experience and also the correct gear to debug the boards. A few capacitors have been removed, a few have been added and the result is beautiful soldering work like this:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709921261/20240308_0052.jpg" class="kg-image"></figure> <p>Which is a 100nF capacitor soldered on top of an 10uF capacitor to further reduce the ripple of the power supply. The board has since been adjusted to actually have these capacitors included. After an evening of messing with the board to minimize ripple the switching regulator section has turned into quite a battlefield:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709921381/20240308_0046.jpg" class="kg-image"></figure> <p>A few stacked capacitors, a pad I've accidentally ripped when removing capacitors. The power-save pins on the regulator connected to GND instead of VCC and in the bottom left corner a beautiful stack of 3 capacitors and a 1k resistor on the output of the regulator.</p> <p>Luckily after all this the regulator started working reliably. To make it a bit nicer on my desk I also 3d printed a simple front panel for the mixer. I also got a random oled panel from my parts box and connected that to the GPIO pins so I can have a display to show real-time debugging information while testing the software.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709921615/20240308_0036.jpg" class="kg-image"></figure> <p>The source Kicad files for the audio board are available at <a href="https://git.sr.ht/~martijnbraam/mixolydian-4x4">https://git.sr.ht/~martijnbraam/mixolydian-4x4</a></p> <h2>The software side</h2> <p>For the software I started off from the Arduino IDE project I had for my previous blog post. The design for the audio at FOSDEM would be simpler though, no networking is needed at the mixer but instead the USB connection would be used for control. Adding a network port for the audio mixer would mean it needs more switch ports in the box and since the network switch ports are on the outside it would need an ugly cable coming out to the front of the box to connect it up.</p> <p>The control of the mixer would happen through the infrastructure FOSDEM already has where the volunteers in the rooms can control the boxes with a webpage from a phone and some software on the SBC inside the box will communicate with the mixer over a serial port.</p> <p>There were also a few small software issues to deal with due to the hardware. There are two SGTL5000 codecs on the board. There are two variants of this chip, the 32 and the 20 pin version where the major difference with the 20 pin version is that it doesn't have any address pins for the I²C bus. Sadly the 32 pin variant wasn't really available so the codecs are now using the same address but on different I²C busses of the Teensy. The audio library is hard-coded to have the codec on I²C0 of the Teensy so this requires a bit of patching.</p> <p>The issue of patching the Teensy audio library is that libraries in the Arduino build system are a mess and the audio library is also part of the Teensy core in the IDE instead of a separate library. After messing with it to try to make it a bit more sane I decided to convert the Arduino IDE project to a plain cmake project that pulls in the various parts of the Teensy core as git submodules and has the whole audio library vendored in. This also means it's now possible to build the firmware for the FOSDEM audio mixer without first downloading a pre-compiled ARM compiler so it can run quickly in Alpine in CI.</p> <p>The code for the cmake-ified project is available at <a href="https://git.sr.ht/~martijnbraam/mixolydian-4x4-fw">https://git.sr.ht/~martijnbraam/mixolydian-4x4-fw</a></p> <p>I had also added an OLED panel to the board for testing so I added a bit of code that displays the audio levels of all the inputs and outputs of the board on that screen.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709922412/20240308_0040.jpg" class="kg-image"></figure> <p>This part will definitely be different on the FOSDEM boxes since these oleds are very small and the PCB behind it are large enough that it barely fits in the height of an 1U rack case. It is very cool to have audio level bars in realtime though and if there is a color display it could even be usable enough to see if the levels are correct.</p> <p>There's also an initial implementation of the serial control protocol. The exact protocol has not been fully thought out yet but at least it does the one thing all serial protocols should do: print something useful when sending a newline.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709922665/image.png" class="kg-image"></figure> <p>When sending a newline to the interface it will print the state of the mixing matrix as percentages.</p> <p>The source code is of course available at </p> <h2>Further work</h2> <p>The audio interface has been tested with one of the wireless sets from FOSDEM, which is a Sennheiser AVX ME2 set. The audio seems to work great with this and for running FOSDEM there can simply be 3 receivers plugged into the box. </p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1709923660/avx-receiver.jpg" class="kg-image"><figcaption>Sennheiser AVX receiver plugged into the mixer</figcaption></figure> <p>But can this be simpler? The AVX system works with DECT chips so maybe it would be possible to make the worlds first digital audio mixer with built-in DECT base-station :D</p> <h2>Conclusion</h2> <p>This is certainly a lot of progress for the audio hardware part of the mixer project. While a lot of the hardware is specified for the exact requirements of a FOSDEM room it does provide a neat base to work off for designing digital audio mixers. The parts of the hardware design are modular enough to reconfigure them for other audio needs and now the line-level inputs are working it will be neat to figure out a digitally controlled microphone preamp. Or have hi-z input for instruments. Or implement a phantom power supply.</p> <p>It's always a ton of fun to figure out new systems and instead of learning a random new programming language it's hardware design stuff for once. I absolutely couldn't've done it without the expertise of the electrical engineers that helped me design this, especially Thea who actually knows how switching regulators work :)</p> <p>Hopefully the revision B design with the improvements from all the testing that happened with this board will be in the FOSDEM boxes in FOSDEM 2025 but I'll save that for a third part in this series of posts.</p> Fixing the Megapixels sensor linearizationhttps://blog.brixit.nl/fixing-the-megapixels-sensor-linearization/95MegapixelsMartijn BraamThu, 25 Jan 2024 22:45:44 -0000<p>Making a piece of software that dumps camera frames from V4L2 into a file is not very difficult to do, that's only a few hundred lines for C code. Figuring out why the pictures look cheap is a way harder challenge.</p> <p>For a long time Megapixels had some simple calibrations for blacklevel (to make the shadows a bit darker) and whitelevel (to make the light parts not grey) and later after a bit of documentation studying I had added calibration matrices all the way back in <a href="https://blog.brixit.nl/pinephone-camera-pt4/">part 4</a> of the Megapixels blog series.</p> <p>The color matrix that was added in Megapixels is a simple 3x3 matrix that converts the color sensitivity of the sensor in the PinePhone to calibrated values for the rest of the pipeline. Just a simple 3x3 matrix is not enough to do a more detailed correction though. Luckily the calibration software I used produces calibration files that contain several correction curves for the camera. For example the HSV curve that changes the hue, saturation and brightness of specific colors.</p> <p>Even though this calibration data is added by Megapixels I still had issues with color casts. Occasionally someone mentions to be how "filmic" or "vintage" the PinePhone pictures look. This is the opposite of what I'm trying to do with the picture processing. The vintage look is because color casts that are not linear to brightness are very similar on how cheap or expired analog film rolls reproduce colors. So where is this issue coming from?</p> <p>I've taken a closer look to the .dcp files produced by the calibration software. With a bit of python code I extracted the linearization curve from this file and plotted it. It turns out that the curve generated after calibration was perfectly linear. It makes a bit of sense since this calibration software was never made to create profiles for completely raw sensor data. It was made to create small corrections for professional cameras that already produce nice looking pictures. Looks like I have to produce this curve myself</p> <h2>Getting a sensor linearization curve</h2> <p>As my first target I looked into the Librem 5. Mainly because that's the phone that currently has the most battery charge. I had hoped there was some documentation about the sensor response curves in the datasheet for the sensor. It turns out that even getting a datasheet for this sensor is problematic. So the solution is to measure the sensor instead.</p> <p>Measuring this pretty hard though, the most important part is having calibrated reference for most solutions. I've thought about figuring out how to calibrate a light to produce precise brightness dimming steps and measuring the curve of the light with a colorimeter to fix any color casts of the lights. Another idea was taking pictures of a printed grayscale curve but that has the issue that the light on the grayscale curve needs to be perfectly flat.</p> <p>But after thinking about this in the background for some weeks I had a thought: instead of producing a perfect reference grayscale gradient it's way easier to point the camera at a constant light source and then adjust the shutter speed of the camera to produce the various light levels. Instead of a lot of external factors with calibrated lights which can throw off measurements massively I assume that the shutter speed setting in the sensor is accurate.</p> <p>The reason I can assume this is accurate is because the shutter speed setting in these phone sensors is in "lines". These cameras don't have shutters, it's all electronic shutter in the sensor. This means that if the shutter is set to 2 lines it means that the line being read by the sensor at that moment is cleared only 2 scanlines before. This is the "rolling shutter" effect. If the shutter is set to 4 lines instead every line has exactly twice the amount of time to collect light after resetting. This should result in a pretty much perfectly linear way to control the amount of light to calibrate the response with.</p> <p>In the case of the Librem 5 this value can be set from 2 lines to 3118 lines where the maximum value means that all the lines of the sensor have been reset by the time the first line is read out giving the maximum amount of light gathering time.</p> <p>With libmegapixels I have enough control over the camera to make a small C application that runs this calibration. It goes through these steps:</p> <ol><li>Open the specified sensor and set the shutter to the maximum value</li> <li>Start measuring the brightness of the 3 color channels and adjust the sensor gain so that with the current lighting the sensor will be close to clipping. If on the lowest gain setting the light source is still too bright the tool will ask to lower the lamp brightness.</li> <li>Once the target maximum brightness has been hit the tool will start lowering the shutter speed in regular steps and then saving the brightness for the color channels at that point.</li> <li>The calibration data is then written to a csv file</li> </ol> <p>The process looks something like this:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1706114157/image.png" class="kg-image"></figure> <p>This is a short run for testing where only 30 equally spaced points are measured. I did a longer run for calibration with it set to 500 points instead which takes about 8 minutes. This is a plot of the resulting data after scaling the curves to hit 1.0 at the max gain:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1706114296/image.png" class="kg-image"></figure> <p>The response of the sensor is not very linear at all... This means that if a picture is whitebalanced on the midtones the shadows will have a teal color cast due to the red channel having lower values. If the picture would be corrected with whitebalance to correct the darker colors it would result in the brighter colors to turn magenta.</p> <p>The nice thing is that I don't have to deal with actually correcting this. This curve can just be loaded into the .dng file metadata and the processing software will apply this correction at the right step in the pipeline.</p> <h2>Oops</h2> <p>It is at this point that I figured out that the LinearizationTable DNG tag is a grayscale correction table so it can't fix the color cast. At least it will improve the brightness inconsistencies between the various cameras.</p> <p>With some scripting I've converted the measured response curve into a correction curve for the LinearizationTable and then wrote that table into some of my test pictures with <code>exiftool</code>. </p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1706221766/compare-linearizationtable.jpg" class="kg-image"></figure> <p>This is the result. The left image is a raw sensor dump from the Librem 5 rear camera that does not have any corrections at all applied except the initial whitebalance pass. On the left is the exact image pipeline but with the LinearizationTable tag set in the dng before feeding it to <code>dcraw</code>.</p> <p>The annoying thing here is that both pictures don't look correct. The first one has the extreme gamma curve that is applied by the sensor so everything looks very bright. The processed picture is a bit on the dark side but that might be because the auto-exposure was run on the first picture causing underexposure on the corrected data.</p> <p>The issue with that though is that some parts of the image data are already clipping while they shouldn't be and exposing the picture brighter would only make that worse.</p> <p>Maybe I have something very wrong here but at this point I'm also just guessing how this stuff is supposed to work. Documentation for this doesn't really exist. This is all the official documentation:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1706222220/image.png" class="kg-image"><figcaption>No, chapter 5 not helpful</figcaption></figure> <p>Maybe it all works slightly better if the input raw data is not 8-bits but that's a bunch more of kernel issues to fix on the Librem 5 side.</p> <h2>Conclusion</h2> <p>So not that much progress on this at all as I hoped. I made some nice tools to produce data that makes pictures worse. Once the clipping in the highlights is fixed this might be very useful though since practically everything in the DNG pipeline expects the input raw data to be linear and it just isn't.</p> <p>The <a href="https://gitlab.com/megapixels-org/libmegapixels/-/commit/f6686d7a5a176384da3b5a1eaf93985aeb29d7be">sensor measuring tool</a> is included in the libmegapixels codebase now though.</p> <p>To fix auto-exposure I also need to figure out a way to apply this correction curve before running the AE algorithms on the live view. More engineering challenges as always :)</p> <hr> <div style="width: 75%; margin: 0 auto; background: rgba(128,128,128,0.2); padding: 10px;"> <h4>Development Funding</h4> <p>The current developments of Megapixels are funded by... You! The end-users. It takes a lot of time and a lot of weird expertice to make Linux cameras work and I've not been able to do it without your support.</p> <p>The donations are being used for the occasional hardware required for Megapixels development (Like a nice Standard Illuminant A for calibration) and the various other FOSS applications I develop for the Linux ecosystem. Every single bit helps to not do all this work entirely for free.</p> <a href="https://blog.brixit.nl/donations/">Donations</a> </div> <p></p> The dilemma of tagging library releaseshttps://blog.brixit.nl/the-dilemma-of-tagging-library-releases/94MegapixelsMartijn BraamSun, 14 Jan 2024 16:11:17 -0000<p>I've been working on the libmegapixels library for quite a bit now. The base of the library is pretty solid which is configuring a V4L2 pipeline so you can get camera frames on modern ARM platforms. Most of the work on the library side is figuring the AWB/AE/AF code and how that will fit together with applications.</p> <p>Due to the AAA code not working yet and the API not being being fully defined on how those parts will fit together I've been holding of on tagging an actual release on the libmegapixels library.</p> <p>A lot of my projects, especially libraries, are written in Python so I've long enjoyed the luxury of APIs being duck-typed and having the possibility of adding optional arguments to methods in the future. Sadly in C libraries I can't get away with never defining the types for arguments that might change in the future or adding optional arguments.</p> <p>My original plan was to tag a release on libmegapixels together with the first 2.x release of Megapixels since these pieces of software are intended to fit together but after thinking about it some more (and some convincing from other people interested in the libmegapixels release) I've decided to tag a 0.1 release.</p> <p>In an ideal world I can just release code when it's fully done and tested. In this case the long time it takes to get everything ready for use will mean that potential contributors to the code will also be held back from experimenting with the codebase. Especially since a large part of libmegapixels is the config files it ships for specific hardware configurations. If I wouldn't make any releases then at some point users/developers will be forced to just ship random git commits which is a way worse situation to be in for bug tracking.</p> <p>With this 0.1 release I want to make it possible to start writing config files for various phones and platforms to test camera pipelines. Hopefully this will also mean any issues with the configuration file format that people might hit will be figured out before I have to tag a "final" 1.x release.</p> <h2>The release</h2> <p>So the initial tagged release of <code>libmegapixels</code>:</p> <ul><li>located at <a href="https://gitlab.com/megapixels-org/libmegapixels/-/tags/0.1.0">https://gitlab.com/megapixels-org/libmegapixels/-/tags/0.1.0</a></li> <li>Build instructions at <a href="https://libme.gapixels.me/building.html">https://libme.gapixels.me/building.html</a></li> <li>Comes with absolutely no guarantee of stability for the C api of the library</li> <li>Most likely the config file format is stable but might have small tweaks before the 1.x release</li> </ul> <p>Hopefully this will allow people to start experimenting with the codebase and generate some feedback on it so I'm not just developing this for months and completely overfitting it to the three devices I'm testing on.</p> <p>I'm planning to make a similar release for <code>libdng</code> soon. That library is also mostly stable but I need to fix up the last parts of the API to allow reading and writing all the required metadata.</p> Megapixels 2.0: DNG loading and Autowhitebalancehttps://blog.brixit.nl/megapixels-2-0-dng-loading-and-whitebalancing/93MegapixelsMartijn BraamFri, 22 Dec 2023 01:25:46 -0000<p>After getting some nice DNG exporting code to work with libdng in the <a href="https://blog.brixit.nl/megapixels-2-0-dng-exporting/">last post</a> I decided to go mess with auto white-balancing again on the Librem 5.</p> <p>I got the Megapixels 2.x codebase to the point where it smoothly displays the camera feed on the Librem 5 and the PinePhone Pro. One of the things that Just Worked(tm) on the original PinePhone is the auto white-balance correction of the rear camera. This has also not worked on the front camera on that device and the results of lacking AWB code is very obvious: the pictures are very green.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703201479/gc2145.jpg" class="kg-image"><figcaption>Great example of lack of AWB on the PinePhone front camera</figcaption></figure> <p>This was very easy on the rear camera of the PinePhone. The camera module inside the phone can automatically do white-balance corrections by having an AWB algorithm running on the 8051 core inside the sensor which adjusts the analog gain of the camera. The only thing that Megapixels does on the PinePhone is turning that feature on and it just works. The front camera of the PinePhone should have a similar feature but it does not work due to the state of the Linux driver for that sensor.</p> <p>For the PinePhone Pro and the Librem 5 (and most other devices) the white-balancing is a lot harder to deal with. The sensor does not have any automatic way of dealing with this and it has to be done on the CPU side. For this there's also two options:</p> <ul><li>Get the unbalanced camera feed and correct those frames in software while displaying them and storing the correction factors in the DNG files for the pictures that have been taken.</li> <li>Send the corrections back to the sensor instead so the camera feed is already balanced. This should lead to higher quality pictures because the use of the analog to digital converter in the sensor is more optimal. It&#x27;s also harder because now there&#x27;s latency between changing the gain and receiving the corrected data.</li> </ul> <p>But the nice thing of doing hardware support on multiple platforms is that I have to support both these cases :(</p> <p>In case of the Librem 5 I'm implementing the first option since the sensor driver for the rear camera on that device does not implement the necessary controls to do the second option, it's also a bit easier to get working right.</p> <h2>The Algorithm™</h2> <p>There are many ways to actually implement a white-balance algorithm. I'll be going with the most simple one. The gray-world algorithm.</p> <p>This algorithm works on the assumption that if you average out all the colors in your picture you'd get something that's roughly grey.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703202886/grey-world.jpg" class="kg-image"></figure> <p>Intuitively you would think that nothing is ever that nicely balanced or that colorful walls for example would skew the results massively. But as you can see in the demonstration above the more you start to blur the picture the less saturated it will become. This works for a surprising amount of pictures.</p> <p>To calculate the white-balance correction for a picture it's not blurring the picture though as in the demonstration above. The way it's calculated is by taking the average of all the pixels in the picture and then using the inverse of that result to set the gain.</p> <p>Another thing that's different for white-balancing is that it's the first step in the color processing pipeline, it should not be ran on the final picture like this example but on the raw color values from the sensor since there it would be the closest in the pipeline to the ADC gain applied in the sensor.</p> <h2>The Megapixels implementation</h2> <p>taking the average of a full raw frame is not very fast. It also complicates the internal Megapixels code a lot to do extra image processing on the raw frame stage of the code. That's why in Megapixels I'm <i>not</i> running the white-balance code on the raw frames. It's instead being run on the preview feed shown to the user before taking the picture since that way it takes advantage of the GPU debayering and scaling of the input data.</p> <p>To make this work the libmegapixels code does the average of an entire processed RAW frame to get my average R, G and B values. After running this average the code is no longer dealing with a lot of data anymore so it's a lot easier to write quick code. To fix the code the inverse of the color matrix is ran to get a value that's close to what the average would've been of the raw data before scaling it and doing the preview color corrections.</p> <p>The result of that code gives a new R, G and B value that represent the balance of the color of the picture, the new gain for the color channels is then calculated as <code>1/R</code> and then normalized so the gain for the green channel is always 1.0. This is because on the sensors there's only a control for the red and blue gains.</p> <p>Except that on the Librem 5 there's no controls for the red and blue gains at all, so in that case the new gains are fed into the GPU shader that calculates the preview again where it will be applied as gains right after the debayering step.</p> <h2>The white-balance in the DNG output</h2> <p>With the two scenarios above there's also two cases for the DNG exporting. Either the RAW data in the DNG is already balanced or it's completely unbalanced. Luckily the DNG specification has me covered!</p> <p>When the raw image data is completely unbalanced like it is on most professional cameras the gains for balancing the picture are stored in the <code>AsShotNeutral</code> tag. This tells the DNG developing software the gains the camera used to display the preview and it will be available in the white-balance section of the developing software as "As Shot" or "Camera" white-balance.</p> <p>In the case where the ADC gains are manipulated to apply the white-balance this doesn't work since the gains written to the <code>AsShotNeutral</code> tag would be 1.0 for all channels. This <i>does</i> produce the correct picture for simple cases except that the whitebalance shown for the image in the editing software would always be 5612K.</p> <p>Having the wrong whitebalance is not just an issue of metadata neatness though. Practically all the color pipeline calculations after loading the DNG file and applying the RAW white-balance are dependent on the color temperature. The metadata in the DNG stores two color matrices and two correction LUTs. The guidelines for this calibration data is that one of the sets of calibration data is for D65 lighting which is basically outdoors on a cloudy day; pretty blue-ish lighting around 6500k. The second one is for "Standard Illuminant A" which is a reference tungsten light around 2856k. The developing software takes the data for both color temperatures and interpolates between the two to produce the matrices and curves for the color temperature of the white-balanced picture.</p> <p>To deal with the case where the the sensor already produced white-balanced raw data using the ADC gains the white-balance gains can be written to the <code>AnalogBalance</code> tag. This will be used to invert the white-balance gains in the sensor again before running the rest of the processing pipeline which means the correct color temperature will be used.</p> <h2>So does it work?</h2> <p>Yeah, mostly. It could use a bit of tweaking and the calibration I'm using for the sensor is just wrong.</p> <video src="https://brixitcdn.net/white-balance-encoded.mp4" controls style="max-width: 50%"></video> <p>The video here is extremely janky, most likely due to the auto-gain in this test build being completely broken and my code being sloppy. There's a few things that need to be fixed here aside from figuring out more preformance regressions:</p> <ul><li>The whitebalance code stops working when there&#x27;s not enough light and it jumps to the full green picture. At this point it should keep holding the old white-balance to make it less jarring.</li> <li>There needs to be smoothing applied to the whitebalance changes. It&#x27;s mostly pretty solid since this doesn&#x27;t have any latency with sensor adjustments but when the camera moves to the pumpkins you can see it being unstable.</li> </ul> <p>Overall it mostly works though. The performance is a bit more stable when there's daylight. Cameras simply work better when there's more light available. The various sources of artificial light here is also throwing off the camera a lot with a lot of light coming from my monitor and some very poor quality light coming from the room lighting.</p> <h2>The SEGFAULT button</h2> <p>So Megapixels somewhat balances the pictures but the second half of the process is not something I've been able to test yet: storing DNG files with this whitebalance metadata. The Megapixels 1.x codebase had the code for saving the <code>AsShotNeutral</code> and <code>AnalogBalance</code> tags and I've re-implemented that in libdng. The issue is that in the current state of the Megapixels code pressing the shutter button just causes the whole application to segfault.</p> <p>This segfault occurs somewhere in the interaction with the color profile curves loaded from the calibration .dcp file with the libtiff library when saving though libdng. This being 3 threads deep into the Megapixels codebase makes this a bit annoying to debug so I decided I needed to yak-shave this a bit further and add more tooling to the libdng codebase...</p> <h2>The mergedng tool</h2> <p>My solution for making this easier to debug is adding an utility in libdng that actually uses the feature to load a calibration file to append the curves to the final picture. Due to me just not stopping to write code I've implemented basic DNG reading support for this in libdng and as frontend the <code>mergedng</code> utility.</p> <p>The functionality for this tool is pretty simple. It reads an input DNG file and takes the picture data and metadata from that. It then takes a .dcp file as second argument which provides the calibration curves for the camera and it then merges those TIFF tags and writes out a new DNG file. This is an utility I needed anyway since I've been searching for it, it makes it easy to "upgrade" pictures taken with earlier versions of Megapixels with new calibration data from more recent .dcp files.</p> <p>Writing the code for this functionality was pretty straightforward. The .dcp loading and appending code already existed in the libdng codebase since that's the code which already causes the SEGFAULT in Megapixels when taking a picture. The extra added code in libdng is the new functions for reading a DNG file and taking that image metadata for writing a new picture.</p> <p>After implementing all this and adding some unit tests for the DCP loading code I've come to the realization that... it just works...</p> <p>In this simplified codebase everything touching the data just simply works so my original crashing issue in Megapixels is somewhere unrelated. This is where I'm at now and where I've decided to write a blog post instead of diving deep into the Megapixels codebase again :)</p> <hr> <div style="width: 75%; margin: 0 auto; background: rgba(128,128,128,0.2); padding: 10px;"> <h4>Development Funding</h4> <p>The current developments of Megapixels are funded by... You! The end-users. It takes a lot of time and a lot of weird expertice to make Linux cameras work and I've not been able to do it without your support.</p> <p>The donations are being used for the occasional hardware required for Megapixels development (Like a nice Standard Illuminant A for calibration) and the various other FOSS applications I develop for the Linux ecosystem. Every single bit helps to not do all this work entirely for free.</p> <a href="https://blog.brixit.nl/donations/">Donations</a> </div> <p></p> The MNT keyboard reviewedhttps://blog.brixit.nl/the-mnt-keyboard-reviewed/92LinuxMartijn BraamTue, 19 Dec 2023 23:59:01 -0000<p>MNT Research is one of those few companies that actually releases open source hardware. Instead of just getting a <a href="https://mntre.com/documentation/reform-keyboard-v3-manual.pdf">schematic</a> with your hardware (which is great even by itself) there's the full <a href="https://source.mnt.re/reform/reform/-/tree/master/reform2-keyboard3-pcb">sources for that schematic</a>, the Kicad parts libraries, the sources for the firmware and even documentation how to use that code.</p> <p>I received my <a href="https://shop.mntre.com/products/mnt-reform-keyboard-30">MNT Standalone Keyboard V3</a> a few days ago so I've been typing on it now for a bit. This is all happening while I'm recovering from covid so I hope if I read back this post in a few days it is actually somewhat coherent :)</p> <p>This being a more niche product sadly does make it a bit on the expensive side. But I must say this is by far the most solid keyboard I've owned. My main keyboard on my desktop is an Das Keyboard 4 ultimate. It's a nice keyboard but it doesn't compare to the full machined aluminium frame on the MNT keyboard.</p> <p>The whole keyboard is mounted on what's basically a 4mm slab of aluminium which has a nice MNT logo machined on it on the bottom</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703022648/20231219_0011.jpg" class="kg-image"></figure> <p>This makes the keyboard feel incredibly solid, even with the rest of the frame taken off it's practically impossible to even bend the keyboard. The second half of the frame is the top edge that screws on the base plate with 8 screws.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703022719/20231219_0012.jpg" class="kg-image"></figure> <p>This is another very carefully designed aluminium part In the close-up above you can see the opening for the USB-C connection for the keyboard and the internal cutouts for the display daughterboard with the screw mounting.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703027123/20231219_0021.jpg" class="kg-image"></figure> <h2>The electronics</h2> <p>This keyboard is based around an Atmega32U4 microcontroller. This is the same keyboard PCB as what's shipped in the MNT Reform laptop so there are two connectors on this board. The USB-C connector is what's exposed on the standalone keyboard and the laptop presumably uses the USB header that's beside it.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703023510/20231219_0015.jpg" class="kg-image"></figure> <p>Beside the USB header is one of the dip switches. SW36 is labeled "STANDALONE" here. This switches the board to use USB power instead of the 3.3V supplied by the laptop mainboard. The ribbon connector is the connection to the OLED display board.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703026316/20231219_0016.jpg" class="kg-image"></figure> <p>On the left side of the display board there is an empty footprint for the standard Atmega programming header and a serial port that's used to connect to the laptop mainboard. Additionally there's a reset button and SW84 which has the confusing label "RG".</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703026421/image.png" class="kg-image"></figure> <p>Thanks to the schematics being available in the manual it's easy to find that this is the switch to enable programming. The rest of the interesting parts is hidden somewhere below the display board or on the bottom side of the PCB possibly. I have not taken the keyboard further apart for this review since all the information I'd ever want is already available in the schematics. The keyboard matrix itself is read out by the Atmega directly which provides the full keyboard functionality and the OLED display is on a small daughterboard to slightly rise it towards the front bezel.</p> <h2>Firmware</h2> <p>Since this is one of the 8-bit Atmel parts it's very easy to build firmware using the gcc-avr compiler packaged in various distributions. All the source files are stored in the <a href="https://source.mnt.re/reform/reform/-/tree/master/reform2-keyboard-fw">firmware repository</a> for the various MNT products.</p> <p>Checking the version of the firmware is pretty easy. With the circle key on the top-right corner of the keyboard the menu on the display opens. You can use the arrow keys to browse to the "System Status" option or just press the "s" key on the keyboard.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703027504/20231220_0008.jpg" class="kg-image"></figure> <p>Which shows the hardware revision this firmware was build for and the version that was specified when building:</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703027858/20231220_0018.jpg" class="kg-image"></figure> <p>It seems like the "g" at the start of the commit has was accidental here and it refers to commit <code>7e73483</code> in the firmware repository. This seems to be the newest tag when the keyboard was shipped so that makes sense.</p> <p>So lets change something! The key in the bottom left corner of the keyboard is the Hyper key instead of Ctrl as you'd expect from most keyboards. The Ctrl key is moved in place of the Caps Lock button on normal keyboard layouts which is great for a lot of uses. I never use Hyper though so I want to change that key to be my second Ctrl key.</p> <p>The readme specifies that the keyboard layout is defined in the various <code>matrix_*</code> files so after reading around a bit it seems like I have to edit <code>matrix_3.h</code> for my keyboard.</p> <p>Reading the manual again I realized that doing this makes me lose access to the media keys since those are defined as "Hyper+F*" for the various media actions. To fix that I changed the right control button into the Hyper key, this is the button with the three dots on it. My resulting code change:</p> <pre><code>diff --git a/reform2-keyboard-fw/matrix_3.h b/reform2-keyboard-fw/matrix_3.h index bb72f6d..f9db133 100644 --- a/reform2-keyboard-fw/matrix_3.h +++ b/reform2-keyboard-fw/matrix_3.h @@ -25,7 +25,7 @@ // Sixth row #define MATRIX3_DEFAULT_ROW_6 \ - HID_KEYBOARD_SC_EXECUTE,\ + HID_KEYBOARD_SC_LEFT_CONTROL,\ HID_KEYBOARD_SC_LEFT_GUI,\ HID_KEYBOARD_SC_LEFT_ALT,\ KEY_SPACE,\ @@ -33,7 +33,7 @@ KEY_SPACE,\ KEY_SPACE,\ HID_KEYBOARD_SC_RIGHT_ALT,\ - HID_KEYBOARD_SC_RIGHT_CONTROL,\ + HID_KEYBOARD_SC_EXECUTE,\ HID_KEYBOARD_SC_LEFT_ARROW,\ HID_KEYBOARD_SC_DOWN_ARROW,\ HID_KEYBOARD_SC_RIGHT_ARROW </code></pre> <p>Now to build this there's a simple Makefile. Since I've already programmed Atmega parts on this machine I already have the compiler installed making this very quick and easy.</p> <p>I ended up compiling with the following command:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make <span class="nv">REFORM_KBD_OPTIONS</span><span class="o">=</span><span class="s2">&quot;-DKBD_VARIANT_3 -DKBD_MODE_STANDALONE -DKBD_FW_VERSION=\\\&quot;Martijn\\\&quot;&quot;</span> </pre></div> <p>This is straight from the readme with an additional define to set the firmware version to "Martijn". After building this I got the <code>keyboard.hex</code> file that can be flashed.</p> <p>The flashing is as simple as running the <code>flash.sh</code> script. This will instruct you to press "Circle + X" to enter flashing mode and then run the neccesary commands to flash the keyboard. After running this I noticed that the delete key on the keyboard was no longer a delete key. It turns out I don't have <code>VARIANT_3</code> but instead <code>VARIANT_3_US</code>. A quick rebuild and reflash also fixes that.</p> <figure class="kg-card kg-image-card"><img src="https://blog.brixit.nl/image/w1000//static/files/blog.brixit.nl/1703029671/20231220_0021.jpg" class="kg-image"><figcaption>The brightness differences on the display are a camera artifact</figcaption></figure> <p>Tadaa! My own name in the firmware version field. It's super easy to mess with this firmware.</p> <h2>The keyboard itself</h2> <p>Well the keyboard works just fine as a keyboard. Typing on this keyboard takes a few minutes to get used to compared to my normal keyboard since all the keys are slightly closer together. The split spacebar is also annoying me a bit. It turns out that the left split in the spacebar is <i>exactly</i> the spot where I normally hit the spacebar with my thumb.</p> <p>The switches are nice and clicky (but silent, I have the version with brown switches in them). Overall the keyboard just does what it needs to. The standard layout is quite unusual but everything can be changed with open firmware so I'm confident I can get to a layout I'm 100% happy with.</p> <h2>Conclusion</h2> <p>This is an extremely solid and very compact keyboard I can easily throw in my backpack. It being an USB-C keyboard makes it fit neatly with all my other random cables I usually take with me.</p> <p>It might be slightly more expensive than similar keyboards, but I don't know of similar keyboards with a case this rugged and the display functionality (I forgot to mention you can use HID reports from the host to write custom content to the display from your computer). The openness of this product makes the extra cost certainly worth it for me.</p> <p>I'll probably be messing with the firmware for this keyboard a bit more while I use it. There's some small things to fix like the device reporting the name "LUFA Keyboard Demo Application" in Linux instead of a neater "MNT Keyboard" or something.</p> Looking closer at the sysloghttps://blog.brixit.nl/looking-closer-at-the-syslog/91LinuxMartijn BraamMon, 11 Dec 2023 15:43:17 -0000<p>The syslog protocol, it's one of the ancient protocols in the Unix world. For a long time the logging was handled by daemons like syslog-ng and rsyslog, this has now been taken over by journald on a lot of systems. But have you ever wondered how your log messages even end up in <code>/var/log</code> in the first place?</p> <p>I've started looking into syslog implementations when building a replacement for the use of busybox syslogd in postmarketOS. In postmarketOS this daemon is configured to just send syslog messages to a in-memory buffer for logging and never store anything on disk in <code>/var/log</code>. This is mainly to make sure there's no unneeded writes to the flash storage in a lot of the old phones that are supported by postmarketOS. There's a few downsides to this logging implementation though:</p> <ul><li>No persistent logging of system messages across reboots. This would be easy to check if certain log messages were present on earlier boots when debugging.</li> <li>Completely unstructured logs while people are pretty much used to journald logging with nice filters</li> </ul> <p>As a replacement I wrote <code>logbookd</code>. It's a tiny syslog daemon that supports disk and memory logging and provides some nice filtering options to be closer to journald. The bulk of this work is handled by doing structured logging into SQLite.</p> <h2>So how does the syslog work</h2> <p>The way the syslog works is incredibly simple. The syslog daemon opens an unix domain socket at <code>/dev/log</code>. Applications connect to this socket and write log messages in the syslog format and the syslog daemon takes care of filtering those out and putting it in the various files in /var/log.</p> <p>The complication of this is that there is no real syslog protocol. There are two standards for it though. There is <a href="https://datatracker.ietf.org/doc/html/rfc3164">RFC 3164</a> and <a href="https://datatracker.ietf.org/doc/html/rfc5424">RFC 5424</a> which both describe the syslog protocol. The 3164 document was only created in 2001 and describes what various implementations are doing in the wild. It's RFC 5424 that actually nails down a specific format.</p> <p>I wrote parser for the 5424 format initially since that's the newest standard and it's by far the easiest to parse. An RFC 5424 log message looks like this:</p> <pre><code>&lt;13&gt;1 2023-12-11T14:56:59.0189+01:00 laptop test - - [timeQuality tzKnown=&quot;1&quot;] Hi</code></pre> <p>The first part here between the angular brackets is the PRI value. It encodes the logging facility and severity as one number. The least significant 3 bits encode the severity on a scale of 1-8 and the other bits encode one of the 23 facilities that are defined. Some examples of the facilities are:</p> <ul><li><code>0</code> for kernel messages</li> <li><code>1</code> for generic userspace messages</li> <li><code>2</code> for the mail system</li> </ul> <p>Most of the other numbers are for more old services like UUCP and FTP and for some numbered user-defined codes. In the example above the 13 means facility 1 (user) and severity 5 (notice).</p> <p>The other parts of this message are in order:</p> <ul><li>The protocol version number which is set to <code>1</code> here.</li> <li>A timestamp with timezone for the log message</li> <li>The <code>laptop</code> is the hostname for the message, this will be set to <code>-</code> when NULL</li> <li>The <code>test</code> part is the application name. This can also be <code>-</code> for NULL</li> <li>The next field is called PROCID and is set to <code>-</code> for NULL in my case. According to the standard it might be used for the pid but is mostly implementation defined.</li> <li>The second null is the MSGID, it can define a message type from the specific service, it will also be null in most cases.</li> <li>The next part is <code>[timeQuality tzKnown=&quot;1&quot;]</code> which is the STRUCTURED-DATA field. It can contain any implementation defined data. This is a subset of the structured data created by the <code>logger</code> tool used to create test messages. This field can also be just <code>-</code> for no structured data.</li> <li>Finally the actual log message. That&#x27;s just <code>Hi</code> in this case.</li> </ul> <p>Writing a parser for this format is relatively straightforward. In the logbookd implementation there's a row for every one of these fields in the logging table and the message is split up according to these rules.</p> <p>There is a fatal flaw in the RFC 5424 specification though: nobody is using it. None of the log messages on my running systems are actually in this format. It looks like practically all software uses RFC 3164, which is a fancy way of saying they do whatever they want.</p> <h2>RFC 3164</h2> <p>So this is actually the true specification for syslog messages being used in the wild. Let's look at one of these messages:</p> <pre><code>&lt;13&gt;Dec 11 15:21:50 laptop test: Hi</code></pre> <p>It's a lot simpler! But not actually. This is a pretty minimal message. The initial part is the same as the RFC 5424 message, the PRI is luckily parsed the exact same way. There is no version indicator though and it does not use an ISO timestamp format.</p> <p>The more problematic issues with this format though is that it does support a lot more data but it's pretty badly defined. Even all the parts shown in the example above are optional. The most minimal syslog message that is still up to this spec is <code>Hi</code>.</p> <p>It's also somewhat valid to send messages with a badly formatted timestamp and it's up to the syslogd to fix up the timestamp in the message. This also makes it very easy to make it actually parse parts of the timestamp as the hostname since this is all badly defined and space separated.</p> <p>Since there is no official field for the pid of a process this is usually appended to the application name in square brackets.</p> <p>The logbookd implementation is mostly based on the way these old messages are parsed in rsyslog and tries to not guess parts. This means only the timestamp, app, hostname and message fields are filled in.</p> <h2>Kernel logging</h2> <p>Not all logging in the system comes from userspace. On Linux there's also the kernel log ringbuffer that can be read from <code>/dev/kmsg</code>. Reading from this file will return all the log messages in the kernel ringbuffer and also makes it possible to stream new log messages with further reads. The log messages from the kernel are in a similar but different format than the syslog socket:</p> <pre><code>6,1004,5150172365,-;hid-generic 0003: hiddev96,hidraw2: USB HID device on usb-0000:00:14 SUBSYSTEM=hid DEVICE=+hid:0003</code></pre> <p>The first field in the kernel message is again the PRI. This follows the same numbering as the syslog RFCs but it's not in angular brackets this time. In this case it's facility 0 (the kernel) and severity 6 (info). The second field is the KSEQ. This is a number that counts up for every log message since boot. The logbookd implementation uses this to de-duplicate the kernel log messages after opening the file since it will always return the old kernel log messages first.</p> <p>After that comes the timestamp. Instead of string parsing this is a straight up unix timestamp so it's way easier to deal with. The field after the timestamp is <code>-</code> indicating NULL, this is the flags field.</p> <p>After the semicolon the actual kernel log message starts. This is the message as is rendered in the <code>dmesg</code> utility. After the log message there's a newline but the log line doesn't end there! The structured data is defined as indented continuation lines after the message itself and this contains some easier machine-parsable data that is usually hidden in <code>dmesg</code>. </p> <h2>Systemd journald</h2> <p>So everything changed when journald was introduced. Figuring out how this all works involves diving into the systemd source code. Systemd provides several unix sockets related to logging in <code>/var/run/systemd/journal</code>:</p> <ul><li><code>dev-log</code> this is symlinked to <code>/dev/log</code> and receives syslog formatted lines and writes it to the journal</li> <li><code>stdout</code> is a socket that receives logs from systemd units. This is what the <code>systemd-cat</code> command connects to. It writes a header on connection to give the application metadata and then the stdout or stderr is just connected straight to this socket.</li> <li><code>socket</code> receives the log messages in the binary journald format</li> </ul> <p>There is a few other fancy things that journald does. It is possible to filter your log messages with the <code>--boot</code> argument. If no argument is supplied it will only show messages from the current boot. If you specify a negative number it's possible to get only log messages from a specific previous bootup.</p> <p>The way this is done is by reading from <code>/proc/sys/kernel/random/boot_id</code>. This is a value generated by the kernel on bootup. It is a UUID generated from random data. These are also the values you see when you run <code>journalctl --list-boots</code>. The BOOT_ID value shown there is this UUID with the dashes removed.</p> <p>My logbookd implementation also reads the boot_id on startup and stores it with the logs, this allows filtering in the exact same way with the logread <code>-b</code> parameter.</p> <h2>Logging to a database</h2> <p>So the main departure journald and logbookd do from the older syslog daemons is that they don't log to plain-text files. Journald has a custom database format the messages are stored in and logbookd stores messages in an SQLite database.</p> <p>Structured logging to a database has a few nice upsides. The main one is being able to do way more detailed filtering than what is reasonably possible with grep. It's a lot easier to filter on a specific date and time range in a database and due to database indexes this is still fast.</p> <p>One of the other main reasons for using SQLite in logbookd is that the implementation in postmarketOS was configured to only log to memory. Using SQLite as logging back-end meant that it's easily possible to replicate this by writing to an in-memory database which is already supported by SQLite.</p> <p>The final thing added to logbookd is the middle ground between in-memory and on-disk logging: the reduced writes mode. In this mode the syslog is written to an in-memory database but when receiving a SIGINT, SIGTERM or SIGUSR1 signal the logbookd daemon will open the on-disk database and lets SQLite do a database migration. This means that SQLite will append the write the new loglines to the disk without rewriting all the existing logs there. On bootup this database is restored again so the logging system behaves as-if it's configured to do normal on-disk logging.</p> <h2>You can use this now</h2> <p>If you're running postmarketOS edge and you have updated to the latest version your installation should've migrated to the logbookd logging daemon. The <code>logread</code> utility implements the common options the busybox logread command already had. For normal use this means that there's not much difference except that the log output from the logread utility is now colored and contains kernel logs.</p> <p>Some examples of the new things that are now possible:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>logread -b list <span class="go">ID BOOT ID FIRST ENTRY LAST ENTRY</span> <span class="go"> 0 05c3f283-3bae-4b2a-8431-210dd63310e0 Dec 11 16:33:59 Dec 11 16:34:05</span> <span class="go">-1 f3ea2fa1-6f9e-4e82-bd0b-201091fcb5b4 Dec 07 18:21:06 Dec 07 18:25:50</span> <span class="gp">$ </span>logread -b <span class="m">0</span> -n <span class="m">2</span> <span class="go">[Dec 11 16:35:09] daemon dleyna-renderer-service[18060]: Client :1.166 lost</span> <span class="go">[Dec 11 16:35:10] daemon dleyna-renderer-service[18060]: dLeyna: Exit</span> <span class="gp">$ </span>logread -b <span class="m">1</span> -n <span class="m">1</span> <span class="go">[Dec 07 18:25:18] kern kernel: perf: interrupt took too long (2531 &gt; 2500), lowering kernel.perf_event_max_sample_rate to 79000</span> <span class="gp">$ </span>logread -b -1 -u logbookd -n <span class="m">1</span> <span class="go">[Dec 07 18:25:37] syslog logbookd: Ready to process log messages</span> </pre></div> <p>Being able to see interleaved kernel and userspace messages also makes certain scenarios a lot easier to debug.</p> <p>Hopefully this makes a few things easier to debug. There's a bunch of software that also logs directly into <code>/var/log</code> in seperate files, this has not been replaced by logbookd and is also not directly query-able by this new system. For the rest of the log messages enjoy the new colors :)</p>