I've been working on firmware for a board with numerous analog inputs, analog outputs, diagnostics collecting, and I/O with an Ethernet module. An STM32F3 microcontroller was chosen for the job, with ST's MX code generating providing the HAL and FreeRTOS (a big help in bringing up the board faster).
After initial development and getting all of these components running, I decided to take a peek into how efficiently the MCU was operating. The easiest way to do this was with the "idle hook" function provided by FreeRTOS (a feature common to most RTOS) and an output pin. This initial spin of the board had all of the unused GPIO pins brought out to test pads, so I picked one (PA12) and made a simple idle hook that sets the pin high, sleeps via ARM's wait-for-interrupt (WFI) instructions, then sets the pin low again once the MCU wakes up:
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_12, GPIO_PIN_SET);
asm("wfi");
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_12, GPIO_PIN_RESET);
Success with this method requires three things:
- Having an interrupt-driven system (often the case with an RTOS)
- Using peripherals in an interrupt-driven manner
- Writing RTOS tasks/threads to avoid being CPU-intensive
A bonus would be a "tickless" RTOS, which sleeps or idles for as long as possible rather than periodically waking up the processor with every tick.
Measurement
After flashing the MCU, I hooked up a probe to my USB oscilloscope and measured
the positive duty time of this idle signal. To my surprise (and delight!) the
MCU was idling 97% of the time! At this point I also checked on my compiler's
optimization level, finding it stuck at -Og
(minimal optimizing in exchange
for a good debugging experience). Bumping that up to -O2
brought the idle time
towards 98.5%.
Know your microcontroller
The STM32 family is stuffed with useful peripherals, and with each peripheral comes offloaded processing effort to save CPU time.
Instead of measuring analog inputs one-by-one, the ADC peripheral can be set up to collect a sequence of conversions given a single trigger. A DMA channel can be hooked to the ADC to copy and store these conversions into a buffer of memory that wraps around with each sequence. Finally, the ADC and DMA can be triggered by a timer peripheral to make the whole operation continuous and autonomous. Simply copy the conversions out of the buffer when you need to use them -- do so from the DMA's interrupt handler to only do so once new data is in.
The system needs to process these inputs with a good bit of math. Fortunately, the formulas could be coerced into integer operations to minimize CPU time. If floating-point is necessary for you, be sure to choose a microcontroller with a hardware floating-point unit; this can cut tens to hundreds of cycles per mathematical operation down to a handful.
Know your RTOS
RTOS generally come with plenty of signaling and synchronization tools that can maximize your code's efficiency. Don't be afraid to simply scroll through the documentation to find out what utilities are available. These often have little impact on code or data size, so there's no reason not to use them.
Multi-threading leads to cleaner code too. This board's Ethernet module (which certainly offloads a lot of work) needs to be constantly polled over SPI to check if any requests need handling. By isolating this routine to its own RTOS thread, the code is concise, understandable, and quick to return to sleep if no work needs to be done.
No need to waste energy
The microcontroller here is running as fast as it can at 64 MHz. Why? Because I didn't want to run into any issues during bring-up by going too slow. With the idle measurement now, I can go ahead and try dropping the clock rate.
Slower speeds mean:
- Less power consumed
- Less or zero flash wait-states (more efficiency)
- Slower signal sampling (more efficiency; the ADC is also just running far faster than necessary)
- Less signal noise (potentially, always test and measure to be sure)
Halving to 32 MHz maybe brought the idle time down half of a percent. Through iteration, I managed to get all the way down to 8 MHz; this meant the PLL system could also be powered down. Testing the system proved no loss in performance, and idle time is still measuring >90%.
The microcontroller's datasheet gives hints towards potential power savings. For the drop from 64 MHz to 8 MHz, we should see:
- A drop from 24.5 mA to 3.5 mA while active, an 85.7% reduction
- A drop from 5.7 mA to 0.7 mA while sleeping, an 87.7% reduction
Awesome!