“Top Ten” is a well-tested catch phrase that can be attached to restaurants in London, high buildings over the world and, well, just about anything. And rest assured there’s also a list of the Top Ten Bugs in Firmware. Written by industry veteran Michael Barr, former editor-in-chief of Embedded Systems Programming magazine and author of three books on the subject, it not only lists the bugs but also suggests design patterns to avoid them.
Good design patterns are invaluable, but mistakes are always made. When it comes to debugging, we could not help noticing that at least five of these bugs can be found and analyzed using an RTOS tracing tool such as Percepio Tracealyzer.
In true David Letterman style, Barr rattles off his bugs from #10 and counting down, so here goes:
When you have a task in your system that is supposed to execute at regular intervals, say for instance to read an analog-to-digital converter every 10 milliseconds, then you have a system that is sensitive to random delays – also known as jitter. If that 10 ms interval degrades to 10±2 ms, the precision in your calculations will degrade accordingly.
To minimize jitter, Barr says, you need to fine-tune your task priorities and maybe use timer interrupts for the most sensitive code. We concur, and Tracealyzer can be quite helpful when it comes to locating jitter in your system.
9. Incorrect priority assignment
Setting suitable task priorities is critical for the performance and reliability of an RTOS-based system, as a task running at too high priority may cause unacceptable delays in other tasks.
Michael Barr points to Rate Monotonic Analysis (RMA), a formal method for assigning task priorities in a system with fixed priorities and preemptive scheduling. This however requires that you have sufficient information about your tasks’ behaviors, and that your tasks behave according to the assumptions of this analysis method. According to this article by Jack Ganssle, RMA is rarely used in practise.
Finding the right combination of task priorities that works well under all circumstances can be very difficult, unless you have a good tracing tool that shows you what is going on in your RTOS. Tracealyzer allows you to inspect the execution times, execution patters and the resulting response times of your tasks. This allows you to assess and optimize your priority assignment to achieve faster response times and more reliable behavior.
8. Priority inversion
The central idea underlying an RTOS with a fixed-priority scheduler is that a high-priority task should be scheduled ahead of one with lower priority, but a lot of things can go wrong when two or more tasks need to coordinate their work with a shared resource such as a global data area or a peripheral device.
One of these can-go-wrong things is priority inversion, where a low-priority task inadvertently blocks a task with higher priority. This too can be rather easily avoided, if you are aware of this pitfall. But if you notice occasional delays in the responsiveness of your system, it might be because of priority inversion. With Tracealyzer you can spot such delays by plotting the response times of your tasks. To see the cause of any extreme values in this plot, just double-click to show the corresponding task execution trace.
A deadlock is a circular dependency between two or more tasks. For example, if Task 1 has already acquired A and is blocked waiting for B while Task 2 has previously acquired B and is blocked waiting for A, neither task will wake up. A clear indication that you may have a deadlock problem is when multiple tasks suddenly stop executing, although no higher priority tasks are running. Again, this is something that Tracealyzer can show.
If you want to avoid deadlocks, the first thing to note is that a deadlock can only occur if a task tries to hold on to two resources at the same time. So: structure your code so that no task ever holds more than one shared resource at the time and it is deadlock-free. If this is not possible, there are several other design patterns that can help you; see Michael Barr’s article or our Tracealyzer article for more suggestions.
6. Memory leak
Dynamic memory allocation is typically not recommended for embedded software, but is sometimes motivated for various reasons (right or wrong). The catch is that if you use it, you have to ensure every allocated block of memory is freed once the memory block is no longer in use. If you miss this in some corner case, you have a memory leak and will eventually run out of memory. And remember: even if you have banned dynamic memory allocation in your project, you might have third-party software libraries or external development teams that use it without your knowledge.
A memory leak is especially dangerous if it only occurs occasionally, as a “slow” memory leak is easily missed during functional testing but may cause critical errors after some time in a deployed unit. Given the long-running nature of many embedded systems, combined with the deadly or spectacular failures that some safety-critical systems may have, this is one bug you definitely don’t want in your firmware.
Tracealyzer allows for monitoring RTOS calls for dynamic memory allocation and can highlight suspected memory leaks, as shown in this page from the User Manual.
Barr’s bugs number 1 through 5 are listed in a separate article, which is also a very interesting read but without the same immediate connection to Tracealyzer.