Test setup - BrixIT Blog

Automated Phone Testing pt.5

Martijn Braam — Tue, 25 Oct 2022 00:55:14 -0000

Now I've written all the parts for the Phone Test Setup in the previous parts, it's time to make the first deployment.

For the postmarketOS deployment I'm running the central controller and mosquitto in a container on the postmarketOS server. This will communicate with a low-power server in a test rack in my office. The controller hardware is an old passively cooled AMD board in a 1U rack case. I sadly don't have any part numbers for the rack case since I got a few of these cases second hand with VIA EPIA boards in them.

Ignore the Arduino on the left, that's for the extra blinkenlights in the case that are not needed for this setup

The specs for my controller machine are:

AMD A4-5000 APU
4GB DDR3-1600 memory
250GB SSD
PicoPSU

The specifications for the controller machine are not super important, the controller software does not do any cpu-heavy or memory-heavy tasks. The important thing is that it has some cache space for unpacking downloaded OS images and it needs to have reliable USB and ethernet. For a small setup a Raspberry Pi would be enough for example.

Ignore the Aperture sticker, this case is from a previous project

This server now sits in my temporary test rack for development. This rack will hold the controller PC and one case of test devices. This rack case can hold 8 phones in total by using 4 rack units.

Deploying the software

After running all the software for the test setup on my laptop for months I now started installing the components on the final hardware. This is also a great moment to fix up all the installation documentation.

I spend about a day dealing with new bugs I found while deploying the software. I found a few hardcoded values that had to be replaced with actual configuration and found a few places where error logging needed to be improved a lot. One thing that also took a bit of time is setting up Mosquitto behind an Nginx reverse proxy.

The MQTT protocol normally runs on plain TCP on port 1883 but since this involves sending login credentials its better to use TLS instead. The Mosquitto daemon can handle TLS itself and with some extra certificates will run on port 8883. This has the downside that Mosquitto needs to have access to the certificates for the domain and it needs to be restarted after certbot does its thing.

Since The TLS for the webapplication is already handled by Nginx running in reverse proxy mode it's easier to just set up Nginx to do reverse proxying for a plain TCP connection. This is the config that I ended up with:

stream {
  upstream mqtt_servers {
    server 10.0.0.107:1883;
  }
  server {
    listen 8883 ssl;
    proxy_pass mqtt_servers;
    proxy_connect_timeout 1s;
    
    ssl_certificate /etc/letsencrypt/.../fullchain.pem; 
    ssl_certificate_key /etc/letsencrypt/.../privkey.pem;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
  }
}

Another thing that had to be done for the deployment is writing actual service files for the components. The init files now sets up openrc in my Alpine installations to supervise the components, deal with logging and make sure the database schema migrations are run on restart.

Putting together the phone case

To neatly store the phones for the test setup I decided to use a 2U rack case since that's just high enough to store modern phones sideways. For this I'm using a generic 2U rackmount project box with the very easy to remember product code G17082UBK. This is a very cheap plastic case with an actual datasheet.

All the internal and external dimensions for this case are documented

I used this documentation to design a tray that fits between the screw posts in this case. The tray is printed in three identical parts and each tray has three slots. I use dovetail mounts to have the phone holders slide on this tray.

All this is designed using OpenSCAD. I never liked 3D modelling software but with this it's more like programming the shapes you want. This appeals a lot to me since... I'm a software developer.

The tray design in OpenSCAD

From this design I can generate an .stl file and send it to the 3D printer. The tray can print without any supports and takes about an hour on my printer. So 3 hours later I have the full base of the phone holder in my rack case.

To actually mount the phones there can be phone-specific models that grip into this baseplate and hold the phone and extra electronics. I made a somewhat generic phone mount that just needs two measurements entered to get the correct device thickness at two points. This is the holder I'm currently using for the PinePhone and the Oneplus 6.

The baseplates here are in blue and the phone holder is the green print. This version of the phone holder is designed to hold the Raspberry Pi Pico and has a ring for managing the soldered cables to the device. The size of the PinePhone is about the largest this case can hold. It will fill up the full depth of the case when the USB-C cable is inserted and it also almost hits the top of the case.

The PinePhone and Oneplus 6 in the test rack case

In this case I can hold 8 phones on the trays and have one of the slots on the tray left over to hold a future USB hub board that will have the 16 required USB ports to use all the devices in the case.

For small desk setups the a single tray is also pretty nice to hold devices and the phone holder itself will also stand on its own. This is great if you have one to three devices you want to hook up to your laptop for local development.

The tray is a bit flimsy without being screwed in the rack case but holds up phones great

Running the first test job

So now the case is together and the controller is deployed it's time to run an actual test. For this first test I'll be using Jumpdrive as the test image. This is by far the smallest image available for the PinePhone which makes testing a lot easier. It just boots a minimal kernel and initramfs and the greatest feature for this test: it spawns a shell on the serial console.

Since GitHub is not a great hosting platform and the bandwidth limit for the https://github.com/dreemurrs-embedded/Jumpdrive repository has been reached it's not possible to fetch the raw file from github without being logged in so this script uses my own mirror of the latest PinePhone jumpdrive image.

devicetype: pine64-pinephone

boot {
  rootfs: http://brixitcdn.net/pine64-pinephone.img.xz
}

shell {
  prompt: / #
  success: Linux
  script {
    ip a
    uname -a
  }
}

This will boot Jumpdrive and then on the root console that's available over the serial port it will run ip a to show the networking info and then uname -a to get the kernel info. Because the success condition is Linux it will mark the job successful if the uname output is printed.

The serial output also shows another feature of the controller: it will set the environment before executing the script, which in this case is just the CI variable and it will add || echo "PTS-FAIL" to the commands so non-zero exit codes of the commands can be detected. When PTS-FAIL is in the serial output the task will be marked as failed. Using the success: and fail: variables in the script the success and failure text can be set.

With these building blocks for test scripts it's now possible to implement a wide variety of test cases. Due to Jumpdrive not having enough networking features enabled in the kernel it's not yet possible to upgrade from the serial connection to a telnet connection in the test setup, this would make the test cases a bit more reliable since there's a very real possibility that a few bits are flipped right in the success or failure string for the testjob marking a successful job as failed due to timeouts.

Getting a generic test initramfs to combine with a kernel to do basic testing will be a good thing to figure out for part 6, Jumpdrive only has busybox utilities available and only has very limited platform support in the first place.

Automated Phone Testing pt.4

Martijn Braam — Fri, 14 Oct 2022 17:20:51 -0000

To execute CI jobs on the hardware there needs to be a format to specify the commands to run. Every CI platform has its own custom format for this, most of them are based on YAML.

My initial plan was to use YAML too for this since it's so common. YAML works just good enough to make it work on platforms like GitHub Actions and Gitlab CI. One thing that's quite apparent though is that YAML is just not a great language to put blocks of shell scripting into.

image: busybox:latest

before_script:
  - echo "Before script section"
  - echo "For example you might run an update here or install a build dependency"
  - echo "Or perhaps you might print out some debugging details"

after_script:
  - echo "After script section"
  - echo "For example you might do some cleanup here"

build1:
  stage: build
  script:
    - echo "Do your build here"

Blocks of shell script are either defined as lists like the example above or using one of the multiline string formats in YAML. This works but is not very convenient.

Design constraints

There is a few things that the job format for PTS needs to solve. The main thing being that jobs are submitted for multiple devices that might behave slightly differently. Of course it's possible to use some conditionals in bash to solve this but leaning on shell scripting to fix this is a workaround at best.

The things that I needed from the common job execution formats is a way to specify metadata about the job and a way to specify commands to run. One major difference with other CI systems is that a test job running on the target hardware involves rebooting the hardware into various modes and checking if the hardware behaves correctly.

Here is an example of the job description language I've come up with:

device: hammerhead
env {
  ARCH: aarch64
  CODENAME: lg-hammerhead
  BOOTLOADER: fastboot
  PMOS_CATEGORY: testing
  NUMBER: 432
}

power {
  reset
}

fastboot (BOOTLOADER=="fastboot") {
  flash userdata http://example.com/${CODENAME}/userdata.img
  boot http://example.com/${CODENAME}/boot.img
}

heimdall (BOOTLOADER=="heimdall") {
  flash userdata http://example.com/${CODENAME}/userdata.img
  flash boot http://example.com/${CODENAME}/boot.img
  continue
}

shell {
  username: root
  password: 1234
  script {
    uname -a
  }
}

This format accepts indentation but it is not required. All nesting is controlled by braces.

The top level data structure for this format is the Block. The whole contents of the file is a single block and in the example above the env, power etc blocks are.. Blocks.

Blocks can contain three things:

Another nested block
A Definition, which is a key/value pair
A Statement, which is a regular line of text

In the example above definitions are used to specify metadata and environment variables and the script itself is defined as statements.

Blocks also have the option to add a condition on them. The conditions are used by the controller daemon to select the right blocks to execute.

This is just the syntax though, to make this actually work I wrote a lexer and parser for this format in Python. This produces the following debug output:

<ActBlock act
  <ActDefinition device: hammerhead>
  <ActBlock env
    <ActDefinition ARCH: aarch64>
    <ActDefinition CODENAME: lg-hammerhead>
    <ActDefinition BOOTLOADER: fastboot>
    <ActDefinition PMOS_CATEGORY: testing>
    <ActDefinition NUMBER: 432>
  >
  <ActBlock power
    <ActStatement reset>
  >
  <ActBlock fastboot: <ActCondition <ActReference BOOTLOADER> == fastboot>
    <ActStatement flash userdata http://example.com/${CODENAME}/userdata.img>
    <ActStatement boot http://example.com/${CODENAME}/boot.img>
  >
  <ActBlock heimdall: <ActCondition <ActReference BOOTLOADER> == heimdall>
    <ActStatement flash userdata http://example.com/${CODENAME}/userdata.img>
    <ActStatement flash boot http://example.com/${CODENAME}/boot.img>
    <ActStatement continue>
  >
  <ActBlock shell
    <ActDefinition username: root>
    <ActDefinition password: 1234>
    <ActBlock script
      <ActStatement uname -a>
    >
  >
>

Now the controller needs to actually use the parsed act file to execute the task. After parsing this is reasonably simple. Just iterate over the top level blocks and have a module in the controller that executes that specific task. A power module that takes the contents of the power block and sends the commands to the pi. Some flasher modules to handle the flashing process.

Developer friendliness

The method to execute the blocks as modules is simple to implement, but something that's very important for this part is the developer friendliness. Testing is difficult enough and you don't want to have to deal with overly verbose specification languages.

It's great that with conditions and flashing-protocol specific blocks an act can describe how to flash on multiple devices depending on variables. But... that's a level of precision that's not needed for most cases. The fastboot module would give you access to run arbitrary fastboot commands which is great for debugging but for most testjobs you just want to get a kernel/initramfs running on whatever method the specific device supports. So one additional module is needed:

boot {
  # Define a rootfs to flash on the default partition
  rootfs: something/rootfs.gz
  
  # For android devices specify a boot.img
  bootimg: something/boot.img
}

This takes the available image for the device and then flashes it to the most default locations for the device. This is something that's defined by the controller configuration. On the non-android devices specifying the rootfs would be enough, it would write the image to the whole target disk. This would be enough for the PinePhone for example.

For Android devices things are different ofcourse. There most devices need to have the boot.img and rootfs.img produced by postmarketOS to boot. For those the rootfs can be written to either the system or userdata partition and in many cases boot.img can be loaded to ram instead of flashing. For test cases that can run from initramfs this would mean no wear on the eMMC of the device at all.

With this together a minimal test case would be something like this:

device: pine64-pinephone

boot {
  rootfs: https://images.postmarketos.org/bpo/v22.06/pine64-pinephone/phosh/20221012-1246/20221012-1246-postmarketOS-v22.06-phosh-18-pine64-pinephone.img.xz
}

shell {
  username: user
  password: 147147
  script {
    uname -a
  }
}

The boot module will take care of resetting the device, powering on, getting into the correct flasher mode, flashing the images and rebooting the device.

After that it's just shell blocks to run actual test scripts.

After implementing all of this in the PTS controller I made a small testjob exactly like the one above that loads the PinePhone jumpdrive image. Since that image is small and easy to test with.

Jumpdrive booted on the PinePhone in the test setup

Detecting success and failure

One thing that's a lot easier in container test scripts than on serial-controlled real hardware is detecting the result of test jobs. There's not multiple streams like stdout/stderr anymore, there's no exit codes anymore. The only thing there is is a string of text.

There's two solutions to this and I'm implementing both. The first one is specifying a string to search for to mark a test as sucessful or failed, this is the easiest solution and should work great for output of testsuites.

shell {
  username: user
  password: 147147
  success: linux
  script {
    uname -a
  }
}

The other one is moving from uart to telnet as soon as possible. If a test image just has telnetd in the initramfs and IPv6 connectivity then the controller can automatically figure out the IPv6 link local address on the phone side and connect to the device with telnet and a preset username/password. This is a bit harder with IPv4 connectivity due to multiple devices being connected and they might have overlapping addresses.

One a telnet session is established something closer to a traditional CI suite can function by sending over the script and just executing it instead of automating a shell.

telnet {
  username: user
  password: 147147
  script {
    uname -a
  }
}

Beside this the other blocks can signal a failure. The boot block will mark the test as failed when the device could not recognize the boot image for example.

Next up

With this all the major components have their minimal required functionality working. The next steps is building up the first rack case and deploying an instance of the central controller for postmarketOS. I've already been printing more 3D phone holders for the system. One of the prototypes of this is teased in the top image in the article :)

Automated Phone Testing pt.3

Martijn Braam — Tue, 27 Sep 2022 12:32:24 -0000

So in the previous post I mentioned the next step was figuring out the job description language... Instead of that I implemented the daemon that sits between the hardware and the central controller.

The original design has a daemon that connects to the webinterface and hooks up to all the devices connected to the computer. This works fine for most things but it also means that to restart this daemon in production all the connected devices have to be idle or all the jobs have to be aborted. This can be worked around by having a method to hot-reload configuration for the daemon and deal with the other cases that would require a restart. I opted for the simpeler option of just running one instance of the daemon for every connected device.

The daemon is also written in Python, like the other tools. It runs a networking thread, hardware monitoring thread and queue runner thread.

Message queues

In order to not have to poll the webinterface for new tasks a message queue or message bus is required. There are a lot of options available to do this so I limited myself to two options I had already used. Mosquitto and RabbitMQ. These have slightly different feature sets but basically do the same thing. The main difference is that RabbitMQ actually implements a queue system where tasks are loaded into and can be picked from the queue by multiple clients. Clients then ack and nack tasks and tasks get re-queued when something goes wrong. This essentially duplicates quite a few parts of the existing queue functions already in the central controller. Mosquitto is way simpler, it deals with messages instead of tasks. The highest level feature the protocol has is that it can guarantee a message is delivered.

I chose Mosquitto for this reason. The throughput for the queue is not nearly high enough that something like RabbitMQ is required to handle the load. The message bus feature of Mosquitto can be used to notify the daemon that a new task is available and then the daemon can fetch the full data over plain old https.

The second feature I'm using the message bus for is streaming the logs. Every time a line of data is transmitted from the phone over the serial port the daemon will make that a mqtt message and send it to the RabbitMQ daemon running on the same machine as the webinterface. The webinterface daemon is subscribed to those messages and stores them on disk, ready to render when the job page is requested.

With the current implementation the system creates one topic per task and the daemon sends the log messages to that topic. One feature that can be used to make the webinterface more efficient is the websocket protocol support in the mqtt daemon. With this it's no longer required to reload the webinterface for new log messages or fetch chunks through ajax. When the page is open it's possible to subscribe to the topic for the task with javascript and append log messages as they stream in in real time.

Log messages sent over MQTT

Authentication

With the addition of a message bus, it's now required to authenticate to that as well, increasing the set-up complexity of the system. Since version 2 there's an interesting plugin bundled with the Mosquitto daemon: dynsec.

With this plugin accounts, roles and acls can be manipulated at runtime by sending messages to a special topic. With this I can create dynamic accounts for the controllers to connect to the message bus and relay that information on request from the http api to the controller to request on startup.

One thing missing from this is that the only way to officially use dynsec seems to be the mosquitto_ctrl commandline utility to modify the dynsec database. I don't like shelling out to executables to get things done since it adds more dependencies outside the package manager for the main language. The protocol used by mosquitto_ctrl is quite simple though, not very well documented but easy to figure out by reading the source.

Flask-MultiMQTT

To connect to Mosquitto from inside a Flask webapplication the most common way is with the Flask-MQTT extension. This has a major downside though that's listed directly at the top of the Flask-MQTT documentation; It doesn't work correctly in a threading Flask application, it also fails when hot-reload is enabled in Flask because that spawns threads. This conflicts a lot with the other warning in Flask itself which is that the built-in webserver in flask is not a production server. The production servers are the threading ones.

My original plan was to create an extension to do dynsec on top of Flask-MQTT but looking at the amount of code that's actually in Flask-MQTT and the downsides it has I would have to work around I decided to make a new extension for flask that does handle threading. The Flask-MultiMQTT is available on pypi now and has most of the features of the Flask-MQTT extension and the extra features I needed. It also includes helpers for doing runtime changes to dynsec.

Some notable changes from Flask-MQTT are:

Instead of the list of config options like MQTT_HOST etc it can get the most important ones from the MQTT_URI option in the format mqtts://username:password@hostname:port/prefix.
Support for a prefix setting that is prefixed to all topics in the application to have all the topics for a project namespaced below a specific prefix.
Integrates more with the flask routing. Instead of having the @mqtt.on_message() decorator (and the less documented @mqtt.on_topic(topic) decorators where you still have to manually subscribe there is an @mqtt.topic(topic) decorator that subscribes on connection and handles wildcards exactly like flask does with @mqtt.topic("number-topic/<int:my_number>/something").
It adds a mqtt.topic_for() function that acts like flask.url_for() but for topics subscribed with the decorator. This can generate the topic with the placeholders filled in like url_for() does and also supports getting the topic with and without the prefix .
Implements mqtt.dynsec_*() functions to manipulate the dynsec database.

This might seem like overkill but it was suprisingly easy to make, even if I didn't make a new extension most time would be spend figuring out how to use dynsec and weird threading issues in Flask.

Log streaming

The serial port data is line-buffered and streamed to an mqtt topic for the task, but this is not as simple as just dumping the line into payload of the mqtt message and sending it off. The controller itself logs about the state of the test system and already parses UART messages to figure out where in the boot process the device is to facilitate automated flashing.

The log messages are send as json objects over the message bus. The line is annotated by the source of the message, which is "uart" most of the time. There are also more complex log messages that have the full details about USB plug events.

The USB event generated when the postmarketOS initramfs creates the USB network adapter on the PinePhone

Beside uart passthrough messages there's also inline status messages from the controller itself when it's starting the job and flasher messages when the flasher thread is writing a new image to the device. This can be extended to have more annotated sources like having syslog messages passed along once the system is booted and if there is a helper installed.

This log streaming can also be extended to have a topic for messages in the other direction, that way it would be possible to get a shell on the running device.

With this all together the system can split up the logs over UART in some sections based on hardcoded knowledge of log messages from the kernel and the bootloader and create nice collapsible sections.

A PinePhone booting, flashing an image using Tow-Boot and then bootlooping

Running test jobs

The current system still runs a hardcoded script at the controller when receiving a job instead of parsing a job manifest since I postponed the job description language. The demo above will flash the first attached file to the job, which in my case is pine64-pinephone.img, a small postmarketOS image. Then it will reboot the phone and just do nothing except pass through UART messages.

This implementation does not have a way to end jobs yet and have success/failure conditions at the end of the script. There are a few failure conditions that are implemented which I ran into while debugging this system.

The first failure condition it can detect is a PinePhone bootlooping. Sometimes the bootloader crashes due to things like insufficient power, or the A64 SoC being in a weird state due to the last test run. When the device keeps switching between the spl and tow-boot state it will mark it as a bootloop and fail the job. Another infinite loop that can easily be triggered by not inserting the battery is it failing directly after starting the kernel. This is what is happening in the screenshot above. This is not something that can be fully automatically detected since a phone rebooting is a supported case.

To make this failure condition detectable the job description needs a way to specify if a reboot at a specific point is expected. Or in a more generic way, to specify which state transitions are allowed in specific points in the test run. With this implemented it would remove a whole category of failures that would require manual intervention to reset the system.

The third failure condition I encountered was the phone not entering the flashing mode correctly. If the system wants to go to flashing mode but the log starts outputting kernel logs it will mark it as a failure. In my case this failure was triggered because my solder joint on the PinePhone failed so the volume button was not held down correctly.

Another thing that needs to be figured out is how to pass test results to the controller. Normally in CI systems failure conditions are easy. You execute the script and if the exit code is not 0 it will mark the task as a failure. This works when executing over SSH but when running commands over UART that metadata is lost. Some solutions for this would be having a wrapper script that catches the return code and prints it in a predefined format to the UART so the logparser can detect it. Even better is bringing up networking on the system if possible so tests can be executed over something better than a serial port.

Having networking would also fix another issue: how to get job artifacts out. If job artifacts are needed and there is only a serial line. The only option is sending some magic bytes over the serial port to tell the controller it's sending a file and then dump the contents of the file with some metadata and encoding. Luckily since dial up modems were a thing people have figured out these protocols in XMODEM, YMODEM and ZMODEM.

Since having a network connection cannot be relied on the phone test system would probably need to implement both the "everything runs over a serial line" codepath and the faster and more reliable methods that use networking. For tests where networking can be brought up some helper would be needed inside the test images that are flashed that brings up USB networking like the postmarketOS initramfs does and then communicates with the controller over serial to signal it's IP address.

So next part would be figuring out the job description (again) and making the utilities for using inside the test images to help execute them.

Automated Phone Testing pt.2

Martijn Braam — Fri, 16 Sep 2022 12:30:37 -0000

In part 1 I hooked up a PinePhone to a Raspberry Pi Pico and wrote the first iteration of the firmware for that setup. For testing I had used one USB port for the Pi Pico, one port for the PINE64 Serial cable and one for the PinePhone itself.

By using 3 USB ports per device I would run out of USB ports on the control computer pretty fast. This is why I picked the Pi Pico, it's not only cheap and available but also is able to emulate multiple USB serial ports at the same time. This required a rework of how the USB protocol was implemented.

The "default" way to use the USB serial port on the Pico is to call printf() to write data to stdout in the firmware and configure the build scripts to enable USB uart. This will make the Pico create an USB CDC ACM serial port on the usb port and it will be hooked into stdout/stdin. To add a second ACM interface none of this automated setup can be used at all.

The second revision of the firmware replaces the printf/getchar usage with directly using the tinyUSB library. Using this I can manually define all USB descriptors I need and hook them up into the parts of the firmware where I need them. With my own descriptors it means the device no longer shows up as Raspberry Pi Pico in lsusb, instead it's now "postmarketOS Test Device".

Ignoring all the extra setup code to get tinyUSB running, the changes to the existing parts of the firmware is mostly replacing getchar with tud_cdc_n_read which accepts the port number to read from, and replacing printf with tud_cdc_n_write_str. One additional piece of the firmware now reads bytes from the USB CDC 2 port and writes it to the hardware uart1 on the Pi Pico and vice versa. With this and a tiny extra cable I fully replaced the need of the USB to serial adapter, freeing up one USB port per test device.

The PinePhone serial port in the headphone jack now connects to the Pi Pico

Power switching

One other feature that was not implemented yet in the previous blog post is power switching. To bring the device in a known state it's easiest to just turn it off and on again. Since having someone sitting beside the test rack to pull out the USB cable is not very practical I needed a way to interrupt the power lines in the USB cable going to the phone. This is slightly more complicated than wiring some test pads directly to the pins of the Pico so to not hold up the software development parts of this I made a test PCB that allows me to control the power of an USB cable:

Power control board, revision 1

This is a piece of protoboard put in the middle of one of the red pine64 USB cables. It has a relay that interrupts the 5V line inside this usb cable and the data lines are passed through directly. It also includes the circuitry to control this relay from the Pi Pico when hooked up to the 2 pins to the left of the board. For those that know electronics; the diode that is missing in this picture is on the bottom of the board.

For the final version in the test setup this won't be a seperate board, it will also not use a relay. Relays have a limited number of operations and they make noise. Having mechanical parts in a test setup is not good for reliability in any case. This will be replaced by a solid state method of switching power, most likely a mosfet.

This board is then hooked up to the ground and GPIO 6 of the Pi Pico and it will make the p/P command work for switching the phone power.

The central controller

To have a look all the way to the other end of the development for this project, the webapplication that will control it all. This is the interface presented towards the developers that will use the test setup.

This piece of software is comparable to something like Lava from Linaro. It has a webinterface to submit and view the builds and have an overview of the devices. The things I'm doing differently compared to Lava is not using jinja2 templates for configuration and having a passthrough mode for scheduling a live connection to a device in the test rack.

The implementation of the central controller is partially modelled after the way Sourcehut Builds works. It will mostly act like a regular CI system but it's backed by physical hardware instead of VMs or containers. It also aims to integrate some of the features that makes sr.ht builds nice to work with like being able to get a shell into the test environment so you don't have to keep submitting slightly changed CI jobs multiple times to debug things. Also having nice split logs for the CI tasks.

The first thing to look at is how devices are defined in the test system. The information is intentionally kept minimal. There's a definition of a device type which is just a name, chipset and architecture. In the case of postmarketOS this name would be the postmarketOS device code like pine64-pinephone

Excerpt of the database schema

For every device in the test rack a row in the device table would be created. This row can have an addition hardware revision number, like 1.2B for a PinePhone, and contains information about which controller it is connected to and which user the maintainer for this device is.

This by itself is not enough to do all the required test scheduling things. There needs to be a way to specify extra information that can be queried in the test jobs or can be used for filtering devices. For this there's another part in the schema:

There is a variable table that is an arbitrary key-value store for extra data. This data can be used for selecting on which devices to run specific tasks and the values will also be available in things like environment variables. The variable table defines data at three different levels. if device_id and devicetype_id is unset in a row it will be a global variable for the whole test system. Then variables can be defined at the device type level and then again at the device level. The job specification itself also contains a fourth level of variables that will be defined in the job description format.

When a job is run all the variables for the current device will be looked up and combined in a cascading system. Device type variables override global variables with the same name, same with device variables and job variables. This data will be then appended to the job that is submitted to the controller and executed.

Some examples for variables in my current test setup are PMOS_CATEGORY which sets in which postmarketOS device classification category the device is., to make it possible to run a testjob on all the devices in the main or community category. Another is the DTB variable that sets what the device tree name is for the specific devices. This is something that would be a device-type variable for a lot of devices, but would be overriden on the device level for the PinePhone due to hardware revisions that have seperate upstream device tree files

The user facing parts

The webapplication has three levels of authentications. Guests, Developers and Admins. Visitors that are not signed in get the guest role and can see the status dashboard and all the job results, similar how build.postmarketos.org lets guests see the state of the postmarketOS build system.

Accounts are created for developers, which grants the permission to submit new jobs in the system. For all the other administration tasks there's the admin role which allows linking controllers, defining devices and creating new accounts.

The device editing part of the administration interface is already shown in the screenshot above. The admin panel gives access to all the settings in the application so it's not necessary to have a shell on the server running it to do administration tasks.

Guests visiting the webapplication will land on the dashboard:

The central controller dashboard page

This shows all the currently running jobs of which there are none since the controller is not implemented yet. It also gives some quick statistics on the side about the overall system.

The more detailed information is shown on the Jobs and Devices page. The devices page shows a list of the defined device types and the active registered devices. In my case the device naming tells me which rack case the device is in and the slot number.

The public device list page

And finally the jobs page. It shows a list of all queued, running and completed jobs. The internal terminology for the overall system is: jobs are the manifests submitted by the user, tasks are generated from the job for every single matching device. If the job specifies exactly one device with the filters it will generate a single task which might have subtasks for test output legibility.

The full job list on the public side

Future improvements to this is showing the status of all the tasks in the job in this job listing. For logged in developers there's the option to submit a new job in the sidebar. This links to the job creation page. This is all very similar to the workflow in builds.sr.ht which this is based on.

The job submission page

Jobs are submitted using a code editor page. All the information of this job is part of the job manifest so it's easy to copy and paste parts of job configurations. Documentation has to be written once the job description format is stable.

Finally there's the job detail page. This shows the job result in real time.

The job detail page

The contents of the job page will mainly be filled by text blocks streamed from the controller. Just like sr.ht there's an option to download the manifest for a job and jump to the job submission form with the current manifest loaded.

Controllers

The main system that will hook all the parts of the system together is the controller software. This will run on a machine in the test rack and will communicate with the central controller to fetch jobs. The controller will be implemented as a daemon and will be controlled using a command line utility. The API interface between the central controller and the rack controller has not been fully defined yet. Most of the tasks will be a regular REST HTTP api but the test results will need a more complicated data streaming setup.

Part of this communication is the possiblity to get a shell on a device that will emulate a regular SSH session. Since in the design the rack controller is not portforwarded this would need to be a reverse shell and it needs to be proxied on the central controller to allow developers to get to that shell from their own machine. This is a bit of a complication but it would make quite a few things a lot simpler like running kernel bisections remotely or even doing remote development in general.

Up to part 3

The next part of this system is figuring out what things are required in the job description format. Submitting a single script to a single phone is easy, dealing with differences between devices is the hard part. Also as a sneak peak into the future parts; the header image shows the 3d printed slots in the rack case the phones will slide into.

Automated Phone Testing pt.1

Martijn Braam — Thu, 04 Aug 2022 11:14:24 -0000

Testing things is one of the most important aspects of a Linux distribution, especially one for phones where people rely on it being in a state where calls are possible.

For postmarketOS the testing is done manually for a large part. There's CI jobs running that verify that packages build correctly but that's as far as the automated part goes. For releases like service packs there is quiet a manual test process that involves upgrading the installation and checking if the device still boots and that the upgraded applications actually work.

With the growth of postmarketOS the things that need to be tested have grown massively. If there is a change that affects multiple devices then that process will take the majority of the time to get an image running with the changes on all the target devices, and in many cases it involves getting other device maintainers to try on their devices for big releases.

Automation

Automated testing for the devices postmarketOS supports is not very easy. Phones don't have the interfaces to fully automate a test process. To get a new image flashed on a phone it usually takes a few steps of holding the right key combinations and plugging in the usb cable at the right moment. This process is also significantly different for a lot of devices to complicate things further.

The goals for the automated testing are quite ambitious. The plan is to get as many devices as possible in a server rack hooked up to a computer with wires soldered to the phone to trigger the key combinations to control the boot process.

For the software side this would require some interface to let multiple developers schedule test jobs on the devices they need and an interface to keep track of the state of all the connected hardware. This is quite a lot like the scheduling and interface for a regular CI system and large parts of this system will be modelled after how Gitlab CI works.

This whole system will consist of many components:

A central webapplication that keeps track of all the available hardware and schedules jobs. The webapplication does not contain any implementation details about hardware except for the names.
A server application that connects to the central webapplication and registers connected devices. This application has the responsibilty of tracking the state of devices and asks for new jobs from the central server when a device is free. There can be many instances of this server application so there can be multiple test racks maintained by different developers.
An application that is spawned for every job that actually executes the testcase and parses the output from the serial port of the device.
A piece of hardware that can press the buttons on the phone and deal with plugging in the power at the right moments. This hardware should be mostly a generic PCB that can deal with the most common interfaces for devices. For devices with special requirements a new board can be made that controls that.
A case design to hold many phones in a relatively dense configuration. It is estimated that there can fit around 8 phones in a 2U rack space with a generic enclosure and some 3D printed holders.

Splitting the software up this way should make this test setup scalable. The most complicated parts seem to be the central webapplication that should present a nice webinterface and deals with making it easy to run a test job on multiple devices, and the runner application that actually needs to deal with the hardware specific implementation details.

Since this is quite a setup to build I've decided to start as small as possible. First get a test running by making some prototype hardware and a prototype runner application that only supports the PinePhone.

The initial test hardware

For the initial test hardware I'm using an off-the-shelf Raspberry Pi pico on a breadboard. Initial design revisions were based around an Atmel atmega32u4 to implement an usb-to-serial adapter and a second serial port for hardware control. Due to the chip shortage the popular Atmel parts are practically impossible to get.

The Pi Pico has a microcontroller that is able to emulate multiple USB serial adapters just like the Atmega32u4 and after dealing with getting the SDK running is actually quite easy to write firmware for.

For this initial revision I'm only running a single USB serial device on the Pi Pico since the PinePhone has an easily accessible serial port with a PINE64 debug adapter. For controlling the buttons I have soldered a few wires to various test points on the PinePhone PCB.

The buttons on the PinePhone normally work by shorting a signal to ground. This is easily emulated with a microcontroller by having the gpio the test point is connected to configured as an input so the pin on the Pico becomes a high resistence that doesn't influence the PinePhone. When pressing the button the gpio can be set to output 0 so the signal is connected to ground.

After some testing with the Pico this works great, it took a while to figure out that the Raspberry Pi Pico enables internal pull-down resistors by default on the gpios, even when it's set to input. This caused the phone to think all the buttons were held down all the time.

Control protocol

To actually control the buttons from a computer a protocol is needed for sending those commands to the Pi Pico. After coming up with a custom protocol first I got pointed to cdba. This is a tool for booting images on boards which is exactly what I need.

This tool is designed to work with some specific control boards which I don't have, but the protocol used for those boards is incredibly simple.

Every command is a single character written to the serial port. For enabling power to the board a P is sent. For disabling the power a p is sent instead. This uppercase/lowercase system is also followed for holding down the power button B/b and the button required to get into a flasher mode R/r.

This is the protocol I implemented in my first iteration of this firmware. The nice thing is that the hardware should also work with cdba, if it is a fastboot device at least.

The code at this point is in this paste.

Test application

To verify this design I wrote a small test application in python. It connects to two serial ports and takes a PinePhone disk image to boot into.

The code used for the first sucessful flash is in this paste.

This application does multiple things. It first connects to the serial port of the Raspberry Pi Pico and resets all the state. Then it will hold the power button of the PinePhone for 15 seconds to force the phone in a known state.

It also connects to the PINE64 uart debug adapter and reads the serial debug logs of Tow-Boot and the kernel. By looking for specific lines in the serial output it knows where in the boot process the phone is and it uses that to hold the volume button to get Tow-Boot into the USB Mass Storage mode.

It then simply dd-s the disk image on the eMMC of the phone and restarts the device. Now the phone will boot into this new installation because this time the volume button is not held down while booting.

The things that needs to be implemented after this detecting when the device is booted and logging in on the serial console so it can run the test script.

This iteration of the script also hardcodes a lot of sleep times and device paths. Hardcoding the path to the block device works somewhat reliably on my laptop but it will fail in production when multiple devices are connected and booting in a random order. This can be fixed by specifing which USB ports the phone and test board are plugged into instead and using udev to figure out what block device and serial device belongs to which phone.

Next steps

The most important next thing to figure out in the application is designing a job description system so developers can write testcases to run. Also using this setup finding quirks in the boot process can be ironed out, like the application only flashing the phone correctly half the time because it sometimes boots the OS instead of getting into the bootloader.

I have also already written some major parts of the central webapplication that actually deals with registering devices and data about those that can be used as variables in the test jobs.

Once those parts integrate it would be important to get a second device up and running in the test rig like the Oneplus 6 to avoid overfitting the design to the PinePhone.