We collect examples of how Tracealyzer has been of useful to our customers and have recreated similar issues to illustrate the benefits of our Tracealyzer tools for embedded software developers.
In this case, a customer had an issue with a periodic task not running as expected, here called SamplerTask. This was the highest priority task in the system and should execute every 5 ms to read a sensor. But they found significant timing variations in the data sampling. Why?
With Tracealyzer, they soon found the Actor Instance Graph, showing runtime metrics of individual task executions, as shown below. There are several such graphs, one for each supported metric, where each data point represents one execution of the task. The X-axis shows its start time of the instance and the Y-axis shows the specific metric, in this case “Periodicity – from Ready”, i.e., the time in between task activations.
As you can see in the above screenshot, the SamplerTask is not executing strictly every 5 ms as intended. Usually it does, but there are a cases with unusually long intervals in between activations, over 6.5 ms.
So what is going on at these instances? If we add a horizontal trace view, synchronized and with the other tasks enabled, we see that the ControlTask (dark green) is running in this interval. Is this causing the problem? But SamplerTask (orange) has higher priority, so its should preempt ControlTask. So what is going on here?
Perhaps the ControlTask is disabling interrupts for a few milliseconds and thereby also the kernel tick interrupt? We can find out in the main trace view, which allows us to inspect the tick events. Let’s have a look…
By double-clicking on the data point, the corresponding interval is shown in the main trace view. This view provides a vertical time-line, with events presented as horizontal text labels. The OS tick events should occur every 1 ms in this system, but there is a gap of about 2.5 ms, starting at 2.018.000.
So it seems plausible that ControlTask is disabling interrupts, probably to protect access to a shared resource, like a file system. But SamplerTask is not using that resource, so is it really necessary to block it by disabling interrupts? Let’s check how the other task, ServerTask, is handling this.
In the lower right corner, you find the View Filter, where we have now enabled “Mutex” type kernel calls, and also User Events – generated by logging calls added to the application code. Here we can see that the ServerTask (bright green) is using a Mutex for protecting access to the file system, but ControlTask makes no such calls, even though the logging indicates a similar operation in both tasks (“Updating file”). So this can probably be solved by changing ControlTask to use the mutex (FS_Mutex) instead of disabling interrupts. So let’s see what happens if we change this. In the below screenshot you see that SamplerTask is now running periodically every 5 ms, as intended.
When looking at the detailed trace (below), we verify that ControlTask is now using FS_Mutex and the OS ticks are running periodically, even in the critical section, and lets SamplerTask preempt as intended. Problem solved!
Tracealyzer provides over 20 interactive views of the runtime world, connected in clever ways. It is available for several leading real-time operating systems as well as for Linux.