Table of Contents
Unlike PS2 keyboards which report keypresses immediately, USB devices require the host to request new information in intervals. This introduces lag depending on the polling interval.
In this example a Roccat Ryos MK keyboard with firmware 1.19 and a Roccat Kone with firmware 1.41 are used for testing. Both devices use 3.3 V ARM MCUs.
If no other requirements are to be met one would choose the left mouse button or a WASD-key for testing. After choosing a suitable push-button, you have to locate the corresponding pins which could be challenging for rubberdome keyboards, but is much easier for through-hole "mechanical" switches. Then you have to find out which one is the positive and which one is the negative pole. For 3-pin push-buttons used in mice you additionally have to sort out the two pins that are closing when pressed. Test the voltage between the pins on the opened button, and see it dropping to zero when pressed. Solder two cables on and reassemble the device so it can be worked with.
The capture device is a Raspberry Pi 2 B with firmware 4.1.9-v7+ #819.
The advantage compared to Arduino solutions is that it's also ARM based and therefore the signals from the devices automatically match the requirements of the RasPi GPIOs. Also it's much faster and runs a complete Linux kernel with wide USB support while the Arduinos USB-host libraries don't support HUBs, which would prevent testing keyboards with internal HUBs like the Roccat Ryos MK Pro. As Windows 10 runs on this model it can be used for operating system comparison.
The minimal wiring just needs an additional diode to work (Figure 9.1, “Wiring between USB-device and Raspberry Pi”).
To make sure Raspbian uses the right polling rate for all HID devices you have
to edit /boot/cmdline.txt
and add usbhid.mousepoll=X
where X is 1, 2, 4 or 8 milliseconds, corresponding to 1000, 500, 250 and
125 Hz respective.
The actual global polling rate can be read from
/sys/module/usbhid/parameters/mousepoll
.
Package revtools-0.0.8
contains the Python script
input_lag/input_lag.py
. It contains code for the two
mentioned devices to start with. All it does is recording the first falling
edge of the GPIO pin and calculating the time difference until the corresponding
USB-report comes in. Instead of a complex debouncing routine a simple wait
of half a second ensures a valid starting point for the next event. To get
valid results you should at least do 50 events, which are summarized at the
end.
To get the best timings you should prevent high system load for example by
moving the mouse or doing other things in parallel. To get best results you
might want to start into runlevel 3 and perform the test with the highest priority:
sudo ionice -c 2 -n 0 nice -n -20 python input_lag.py -r 50
.
As you can see in Table 9.1, “Input lag of Roccat Ryos MK”, the keyboard has the same responsiveness in all tested polling rates, while the lag of the mouse (Table 9.2, “Input lag of Roccat Kone”) can be reduced to half when used with maximal rate, although the deviation is quite large.
Table 9.1. Input lag of Roccat Ryos MK
Rate [ms] | Min [ms] | Mean [ms] | Stddev [%] | Max [ms] |
---|---|---|---|---|
1 | 8,11 | 8,63 | 4 | 9,29 |
2 | 8,08 | 8,66 | 5 | 10,0 |
4 | 8,02 | 8,53 | 4 | 9,18 |
8 | 8,08 | 8,57 | 4 | 9,26 |
Table 9.2. Input lag of Roccat Kone
Rate [ms] | Min [ms] | Mean [ms] | Stddev [%] | Max [ms] |
---|---|---|---|---|
1 | 1,28 | 3,83 | 38 | 7,08 |
2 | 0,65 | 4,23 | 38 | 7,12 |
4 | 1,22 | 5,45 | 42 | 9,55 |
8 | 2,30 | 7,67 | 36 | 13,4 |
Cortex M3 based Microcontroller Units (MCU) are popular for example in increasingly powerful USB based gaming input devices like mice and keyboards. In Application Programming (IAP) is the method of choice to provide unproblematic firmware updates for customers. As IAP only writes part of the flash memory, the base address of the update image is not the start address of flash memory. For disassembly this base address has to be found for absolute addresses to be resolved.
My solution tries to provide an automatic resolution by means of matching vector table entries with special subroutines.
The solution relies on some Asumptions:
Firmware images for IAP usually have their own vector table which gets activated after the fixed code decides to do so.
The layout and length of the vector table are producer/model specific, but the first word contains the Main Stack Pointer (MSP) that points to RAM. The second word contains the address of the reset handler which points to Flash. The other entries are zero or also point to a location in Flash.
As the Cortex platform understands only Thumb commands, all the pointers (except null pointers) have the Least Significant Byte (LSB) set.
IAP writes and vector table start addresses are usually restricted to multiple-byte boundaries, for example 256 bytes. So the base address has to be a multiple of this boundary.
Some of the Interrupt Service Routines (ISR) in the vector table tend to be one of two easily identifiable types:
null-subroutines that return immediately.
BX LR
loop-subroutines that contain just an unconditional branch to itself, resulting in an endless loop when executed.
loop: B loop
Package revtools
contains code implementing the base address estimation using idapython.
File IDA/python/cortex_m3.py
contains the following related
functions:
cortex_m3.estimate_vector_table_length(walker, startaddress, endaddress).
Tries to estimate the length of the variable length vector table. The first
of the 32bit entries is the initial stackpointer and has to point to ram,
all other entries are either NULL
or point into flash memory
with LSB set.
Can be used on loader_input_t
or memory by using one of the
Walker classes defined in util.py
.
cortex_m3.estimate_base_offset(walker, vector_table_length, boundary).
Tries to estimate the base offset of the image, using the forementioned
assumptions. Parameter boundary
should be the most restrictive condition
of IAP sector alignment or vector table start address.
Can be used on loader_input_t
or memory by using one of the
Walker classes defined in util.py
.
cortex_m3.analyze_vector_table(start, vector_table_length). Analyses the vector table, marking target functions for analysis and creating meaningful symbols. Operates on memory.
The loaders in the IDA/loaders
directory can be used as
examples.
The disassembly IDA produces is primarily meant to be human readable. And as most software is written in higher languages IDA allows the usage of higher level constructs like structures that aid in understanding the code, but that assemblers don't necessarily understand. This means that there need to be made some transformations to make such a disassembly vital to be reassembled even without changes to the code.
Package revtools contains code that tries to correct or at least help with most automatable things.
First you have to create and analyse a IDA database. File IDA/python/cortex_m3.py
contains functions for IDA that helps in identifying and fixing problematic
code. The context information IDA provides is hereby extremely valuable.
cortex_m3.fix_assembly(). This function fixes things that can be automated and can be called multiple times on the same database.
Adds assembler directives to set the cpu and arch information for reassembly.
IDA hides referenced but uninteresting functions like nullsubs. As these are needed for reassembly these functions are expanded. Also unreferenced functions get collapsed which removes them from the exported code to free valuable memory in the resulting image. Care has to be taken as it's possible outside references point to labels inside a function.
IDA adds alignments at the end of segments. These are hidden so that these areas are treated as unused memory.
IDA displays [base, index] references as [base, (target - base)], which gets changed to simple index representation. Also if it can be determined that a 32 bit immediate is a memory reference the target gets a symbol.
Thumb commands can't be as broad to contain 32 bit immediates.
These constants are stored in so called literal pools,
with the value within referenced relative to pc. The
actual literal pools are hidden and replaced with the
.ltorg
directive so that the assembler
can insert its own literal values at these positions.
IDA shows jumptables as numerical constants. This is
changed to display a shifted difference between labels
in the format ((target - start) >> 1)
which
is generally how a programmer would have written it.
cortex_m3.hint_assembly(). Reveals things that can't be resolved automatically and have to be fixed manually. This function is meant for an iterative process. To mark handled issues add the tag 'HINT_OK' to the regular comment of the address.
If IDA creates code without a function it might be because the function might be in fact unused or called via a pointer array. You have to determine what's the reason. Either undefine the code or make it a function with or without references.
All absolute addresses have to be symbolized for the
code and data to be relocatable. IDAs analysis and
fix_assembly()
do their job of determining
if a 32 bit value is a memory reference or just a constant.
The values reported here have to be checked and resolved
manually.
The targets of indirect jumps have to be identified.
cortex_m3.generate_inc(file). IDA's export format is not compatible with the GNU assembler. This function exports enums and structure member offsets as simple constants the GNU assembler can handle. Especially the size of structures has to be added.
Folder firmware
contains files that allow export
of prepared assembler source and include files from IDA and further
transform, edit, reassemble and output a binary firmware image ready
to flash the device with.
ida_gen_assembly.py. The Makefile uses this script to export the code and include files from an IDA database.
fix_assembly.py.
IDA sizes its created align directives to the biggest alignment possible. These alignments are all resized to 2.
The assembler has no knowledge of higher language constructs like structures. IDA's usage of the sizeof() operator therefore has to be transformed into usage of a constant. While cortex_m3.generate_inc(file) creates the constant for structure size, this code fixes the assembly to use them.
lpc1752FBD80.lds. That's an example linker script for NXP's LPC1753FBD80 MCU used for example in the Ryos MK keyboard range.
Makefile.
Starting with an FIRMWARE.orig.idb
the files
FIRMWARE.orig.asm
and FIRMWARE.orig.inc
are created. These are further transformed into FIRMWARE.asm
and FIRMWARE.inc
which can then be edited to
implement the desired changes.
The final FIRMWARE.asm
and FIRMWARE.inc
are then used to create a FIRMWARE.o
which gets
linked using the linker script to FIRMWARE.elf
.
Finally the .text
section is extracted in binary format,
which is the form the firmware updater expects.