Chapter 9. Technical readings

Table of Contents

Determining input lag of USB-devices
Determining firmware base address
Preparing a disassembly for reassembly
Code transformations done within IDA
Further transformations and Assembly

Determining input lag of USB-devices

Unlike PS2 keyboards which report keypresses immediately, USB devices require the host to request new information in intervals. This introduces lag depending on the polling interval.


In this example a Roccat Ryos MK keyboard with firmware 1.19 and a Roccat Kone with firmware 1.41 are used for testing. Both devices use 3.3 V ARM MCUs.

If no other requirements are to be met one would choose the left mouse button or a WASD-key for testing. After choosing a suitable push-button, you have to locate the corresponding pins which could be challenging for rubberdome keyboards, but is much easier for through-hole "mechanical" switches. Then you have to find out which one is the positive and which one is the negative pole. For 3-pin push-buttons used in mice you additionally have to sort out the two pins that are closing when pressed. Test the voltage between the pins on the opened button, and see it dropping to zero when pressed. Solder two cables on and reassemble the device so it can be worked with.

The capture device is a Raspberry Pi 2 B with firmware 4.1.9-v7+ #819.

The advantage compared to Arduino solutions is that it's also ARM based and therefore the signals from the devices automatically match the requirements of the RasPi GPIOs. Also it's much faster and runs a complete Linux kernel with wide USB support while the Arduinos USB-host libraries don't support HUBs, which would prevent testing keyboards with internal HUBs like the Roccat Ryos MK Pro. As Windows 10 runs on this model it can be used for operating system comparison.

The minimal wiring just needs an additional diode to work (Figure 9.1, “Wiring between USB-device and Raspberry Pi”).

Figure 9.1. Wiring between USB-device and Raspberry Pi

Wiring between USB-device and Raspberry Pi


To make sure Raspbian uses the right polling rate for all HID devices you have to edit /boot/cmdline.txt and add usbhid.mousepoll=X where X is 1, 2, 4 or 8 milliseconds, corresponding to 1000, 500, 250 and 125 Hz respective.

The actual global polling rate can be read from /sys/module/usbhid/parameters/mousepoll.

Package revtools-0.0.8 contains the Python script input_lag/ It contains code for the two mentioned devices to start with. All it does is recording the first falling edge of the GPIO pin and calculating the time difference until the corresponding USB-report comes in. Instead of a complex debouncing routine a simple wait of half a second ensures a valid starting point for the next event. To get valid results you should at least do 50 events, which are summarized at the end.

To get the best timings you should prevent high system load for example by moving the mouse or doing other things in parallel. To get best results you might want to start into runlevel 3 and perform the test with the highest priority: sudo ionice -c 2 -n 0 nice -n -20 python -r 50.


As you can see in Table 9.1, “Input lag of Roccat Ryos MK”, the keyboard has the same responsiveness in all tested polling rates, while the lag of the mouse (Table 9.2, “Input lag of Roccat Kone”) can be reduced to half when used with maximal rate, although the deviation is quite large.

Table 9.1. Input lag of Roccat Ryos MK

Rate [ms]Min [ms]Mean [ms]Stddev [%]Max [ms]

Table 9.2. Input lag of Roccat Kone

Rate [ms]Min [ms]Mean [ms]Stddev [%]Max [ms]

Determining firmware base address

Cortex M3 based Microcontroller Units (MCU) are popular for example in increasingly powerful USB based gaming input devices like mice and keyboards. In Application Programming (IAP) is the method of choice to provide unproblematic firmware updates for customers. As IAP only writes part of the flash memory, the base address of the update image is not the start address of flash memory. For disassembly this base address has to be found for absolute addresses to be resolved.

My solution tries to provide an automatic resolution by means of matching vector table entries with special subroutines.

The solution relies on some Asumptions:

  • Firmware images for IAP usually have their own vector table which gets activated after the fixed code decides to do so.

  • The layout and length of the vector table are producer/model specific, but the first word contains the Main Stack Pointer (MSP) that points to RAM. The second word contains the address of the reset handler which points to Flash. The other entries are zero or also point to a location in Flash.

  • As the Cortex platform understands only Thumb commands, all the pointers (except null pointers) have the Least Significant Byte (LSB) set.

  • IAP writes and vector table start addresses are usually restricted to multiple-byte boundaries, for example 256 bytes. So the base address has to be a multiple of this boundary.

  • Some of the Interrupt Service Routines (ISR) in the vector table tend to be one of two easily identifiable types:

    • null-subroutines that return immediately.

      BX LR

    • loop-subroutines that contain just an unconditional branch to itself, resulting in an endless loop when executed.

      loop: B loop

Package revtools contains code implementing the base address estimation using idapython. File IDA/python/ contains the following related functions:

cortex_m3.estimate_vector_table_length(walker, startaddress, endaddress).  Tries to estimate the length of the variable length vector table. The first of the 32bit entries is the initial stackpointer and has to point to ram, all other entries are either NULL or point into flash memory with LSB set. Can be used on loader_input_t or memory by using one of the Walker classes defined in

cortex_m3.estimate_base_offset(walker, vector_table_length, boundary).  Tries to estimate the base offset of the image, using the forementioned assumptions. Parameter boundary should be the most restrictive condition of IAP sector alignment or vector table start address. Can be used on loader_input_t or memory by using one of the Walker classes defined in

cortex_m3.analyze_vector_table(start, vector_table_length).  Analyses the vector table, marking target functions for analysis and creating meaningful symbols. Operates on memory.

The loaders in the IDA/loaders directory can be used as examples.

Preparing a disassembly for reassembly

The disassembly IDA produces is primarily meant to be human readable. And as most software is written in higher languages IDA allows the usage of higher level constructs like structures that aid in understanding the code, but that assemblers don't necessarily understand. This means that there need to be made some transformations to make such a disassembly vital to be reassembled even without changes to the code.

Package revtools contains code that tries to correct or at least help with most automatable things.

Code transformations done within IDA

First you have to create and analyse a IDA database. File IDA/python/ contains functions for IDA that helps in identifying and fixing problematic code. The context information IDA provides is hereby extremely valuable.

cortex_m3.fix_assembly().  This function fixes things that can be automated and can be called multiple times on the same database.

  • Adds assembler directives to set the cpu and arch information for reassembly.

  • IDA hides referenced but uninteresting functions like nullsubs. As these are needed for reassembly these functions are expanded. Also unreferenced functions get collapsed which removes them from the exported code to free valuable memory in the resulting image. Care has to be taken as it's possible outside references point to labels inside a function.

  • IDA adds alignments at the end of segments. These are hidden so that these areas are treated as unused memory.

  • IDA displays [base, index] references as [base, (target - base)], which gets changed to simple index representation. Also if it can be determined that a 32 bit immediate is a memory reference the target gets a symbol.

  • Thumb commands can't be as broad to contain 32 bit immediates. These constants are stored in so called literal pools, with the value within referenced relative to pc. The actual literal pools are hidden and replaced with the .ltorg directive so that the assembler can insert its own literal values at these positions.

  • IDA shows jumptables as numerical constants. This is changed to display a shifted difference between labels in the format ((target - start) >> 1) which is generally how a programmer would have written it.

cortex_m3.hint_assembly().  Reveals things that can't be resolved automatically and have to be fixed manually. This function is meant for an iterative process. To mark handled issues add the tag 'HINT_OK' to the regular comment of the address.

  • If IDA creates code without a function it might be because the function might be in fact unused or called via a pointer array. You have to determine what's the reason. Either undefine the code or make it a function with or without references.

  • All absolute addresses have to be symbolized for the code and data to be relocatable. IDAs analysis and fix_assembly() do their job of determining if a 32 bit value is a memory reference or just a constant. The values reported here have to be checked and resolved manually.

  • The targets of indirect jumps have to be identified.

cortex_m3.generate_inc(file).  IDA's export format is not compatible with the GNU assembler. This function exports enums and structure member offsets as simple constants the GNU assembler can handle. Especially the size of structures has to be added.

Further transformations and Assembly

Folder firmware contains files that allow export of prepared assembler source and include files from IDA and further transform, edit, reassemble and output a binary firmware image ready to flash the device with.  The Makefile uses this script to export the code and include files from an IDA database. 

  • IDA sizes its created align directives to the biggest alignment possible. These alignments are all resized to 2.

  • The assembler has no knowledge of higher language constructs like structures. IDA's usage of the sizeof() operator therefore has to be transformed into usage of a constant. While cortex_m3.generate_inc(file) creates the constant for structure size, this code fixes the assembly to use them.  That's an example linker script for NXP's LPC1753FBD80 MCU used for example in the Ryos MK keyboard range.

Makefile.  Starting with an FIRMWARE.orig.idb the files FIRMWARE.orig.asm and are created. These are further transformed into FIRMWARE.asm and which can then be edited to implement the desired changes. The final FIRMWARE.asm and are then used to create a FIRMWARE.o which gets linked using the linker script to FIRMWARE.elf. Finally the .text section is extracted in binary format, which is the form the firmware updater expects.