This Tracealyzer user developed a networked system, containing a TCP/IP stack, a flash file system, and an RTOS running on an ARM Cortex-M4 microcontroller. The system contained several RTOS tasks, including a server-style task that responds to network requests, and a log file spooler task (unrelated to Tracealyzer).
The response time on network requests had often been an issue, and when testing their latest build, the system responded even slower than before. So now they really wanted to figure this out!
But when comparing the code of the previous and new version, they could not see any obvious reason for the slower response time of the server task. There were some minor changes due to refactoring, but no significant features had been added. However, since other tasks had higher scheduling priority than the server task, there could be many other causes for the increased response time. They therefore decided to use Tracealyzer to compare the runtime behaviors of the earlier version and the new version, in order to see the differences.
They recorded traces of both versions in similar conditions and began at the comparison at the highest level of abstraction, i.e., the Statistics Report (see below). This report can display CPU usage, number of executions, scheduling priorities, but also metrics like execution time and response time calculated per each execution of each task and interrupt.
|Earlier version||New version|
As expected, the Statistics Report revealed that response times are higher in the new version, here about 50 % higher on average. The execution times of the Server task were however quite similar, only about 7 % higher in the new version. So the main reason of the higher response time must be other tasks that interfere.
To find out what is causing this difference, we can simply click on the extreme values in the Statistics Report. This focuses the main trace view on the corresponding locations, so we can see the details. By opening two parallel instances of Tracealyzer, one for each trace, we can compare and see the differences, as illustrated below.
|Earlier version||New version|
Since the application server task performed several services, two User Events have been added to mark the points where the specific request are received and answered, labeled “ServerLog”. The zoom levels are identical, so we can clearly see the higher response time in the new version. We can also see that the Logger task preempts the Server task 11 times, compared to only 6 times in the earlier version, which is a significant difference. Moreover, we see that the Logger task is running on higher priority than Server task, so every logging call preempts the Server task.
So there seems to be new logging calls added in the new version, which causes the Logger task to interfere more with the Server task. To see what is logged, let’s add a User Event in the Logger task to show the messages in the trace view. Perhaps some can be removed to improve performance?
Now we see that also other tasks generate logging messages that affects the Server task response time, for instance the ADC_0 task. To see all tasks sending messages to the Logger task, we can use the Communication Flow view, as illustrated below.
The Communication Flow view is a dependency graph showing a summary of all operations on message queues, semaphores and other kernel objects. Here we have generated this view for the entire trace, but it is also possible to generate this view for a selected interval (and likewise for the Statistics Report). For instance, we can see how the Server task interacts with the TCP/IP stack. Note the interrupt handler named “RX_ISR” that triggers the Server task using a semaphore, i.e., when there is new data on the Server socket, and the TX task for transmitting over the network.
But back to the Logger task, the Communication Flow reveals five tasks that sends logging messages. By double-clicking on the “LoggerQueue” node in the graph, the Kernel Object History view is opened and shows all operations on this message queue.
As expected, the can see that Logger task receives messages frequently, one at a time, and is blocked after each message, as indicated by the “red light”.
But is this a really good design? It is probably not necessary to write the logging messages to file one-by-one. If we would increase the scheduling priority of Server task above that of the Logger task, the Server task would not be preempted as frequently and would thereby be able to respond faster. The logging messages would be buffered in LoggerQueue until the Server task (and other high priority tasks) has completed. Only then would the Logger task be resumed and process all buffered messages in a batch.
Let’s try that. The below screenshot shows the Server task instance with highest response time, after increasing its scheduling priority above the Logger task.
The highest response time is now just 5.4 ms instead of 7.5 ms, which is even faster than in the earlier version (5.7 ms) despite more logging. This since the Logger task is no longer preempting the Server task, but instead processes all pending messages in a batch after Server is finished. Here we also see “event labels” for the message queue operations. As expected there are several “xQueueSend” calls in sequence, without blocking (= red labels) or task preemptions. There are still preemptions by the ADC tasks, but this no longer cause extra activations of the Logger task. Problem solved!
The below screenshot shows LoggerQueue after the priority change. In the right column you see how the messages are buffered in the queue, allowing the Server task to respond as fast as possible, and the logging messages are then processed in a batch.
Tracealyzer provides over 20 interactive views of the runtime world, connected in clever ways. It is available for several leading real-time operating systems as well as for Linux.