www.percepio.com

Tracealyzer for Linux

Introduction

Tracealyzer for Linux is a runtime diagnostics tool for Linux-based software systems. Tracealyzer gives developers an unprecedented insight into the runtime behavior, which allows for reduced troubleshooting time and improved software quality, performance and reliablity. Complex software problems which otherwise may require many hours or days to solve, can with Tracealyzer be understood quickly, often in a tenth of the time otherwise required. This saves you many hours of troubleshooting time, which quickly returns the investment cost of this tool. Moreover, the increased software quality resulting from using Tracealyzer can reduce the risk of defective software releases, causing damaged customer relations.

The insight provided by Tracealyzer also allows you to find opportunities for optimizing your software. You might have unnecessary resource conflicts in your software, which are "low hanging fruits" for optimization and where a minor change can give a significant improvement in real-time responsiveness and user-perceived performance. By using Tracealyzer, software developers can reduce their troubleshooting time and thereby get more time for developing new valuable features. This means a general increase in development efficency and a better ability to deliver high-quality embedded software within budget. Percepio's leading Tracealyzer technology has been developed since 2004.

Tracealyzer for Linux visualize traces from LTTng and supports LTTng v2.x (both kernel- and user-space tracers) as well as older LTTng versions.

Percepio is a Wind River Premium Software Partner and Tracealyzer for Linux supports the integrated LTTng recorder in Wind River Linux from version 5 and forwards.

Tracealyzer provides more than a 20 interlinked views of the runtime behavior, including task scheduling and timing, interrupts, interaction between tasks as well as user events generated from your application. Tracealyzer can be used side-by-side with a traditional debugger and complements the debugger view with a higher level perspective, ideal for understanding the complex errors where a debuggers perspective is too narrow.

Tracealyzer is more than a just a viewer. It contains several advanced analyses developed since 2004, that helps you faster comprehend the trace data. For instance, it connects related events, which allows you to follow messages between tasks and to find the event that triggers a particular task instance. Morover, it provides various higher level views such as the Communication Flow graph and the CPU Load graph, which make it easier to find anomalies in a trace.

Tracealyzer does not depend on additional trace hardware, which means that it can be used in deployed systems to capture rare errors which otherwise are hard to reproduce.

System Overview

The Tracealyzer solution depends on LTTng, a very established trace monitoring framework for Linux. Tracealyzer supports LTTng version 2.x and late versions of the older LTTng v0.x, for LTTng v2.x both the Kernel Tracer and User Space Tracer. The LTTng User-Space Tracer (UST) is very flexible, allowing tracepoints to be inserted anywhere in an application, and even on library calls without modifying the library source code.

We divide User-Space events from LTTng UST into two main classes: service calls corresponds to structured instrumentation of library functions, e.g, malloc or free, while user events correspond to tracepoints intended for more or less temporary debug logging anywhere in the application code. The LTTng kernel tracer generates service calls automatically based on the default system call (syscall) instrumentation, while configuration of Tracealyzer is required in order to visualize other service calls logged using LTTng UST.

An LTTng trace is not a single file, but a directory of files. To open an LTTng trace in Tracealyzer, select the metadata file.

The PC application Tracealyzer runs on Microsoft Windows and Linux, the latter using the Mono runtime environment for .NET applications.

For more information regarding LTTng, please refer to the website, especially the documentation section.

Using Tracealyzer

Tracealyzer provides several graphical views which gives different perspectives of the runtime behavior, based on a recording of scheduling, interrupts and system calls. You may also choose to include custom user-space instrumentation using LTTng UST.

This section is intended to give a quick overview of the features and how this tool can be used. More detailed information on the various features are available in the later sections.

The main trace view provides all recorded information on a vertical time line. This view is complemented by over 20 additional views providing high level overviews or focused views from different perspectives. The task scheduling is presented using color coded rectangles, where the color helps to identify the actor. By actor we mean a thread of execution - a task or interrupt. The actor colors are taken from a color scheme based on the actor name. The default color scheme is the natural light spectrum, going from red to blue, and finally light grey for the idle task (swapper). The colors used depends on the number of actors in the trace, but the colors can be changed in the View menu > Trace View Settings > Set Color Scheme.

The execution of tasks and interrupts can be visualized using three visualization modes, which you quickly can switch between to get best clarity for each situation.

On single-core systems, the available view modes are:

Gantt View Mode: Shows one column per task and interrupt. This is best for spotting execution patterns, rare tasks or interrupts. This is the default visualization mode for smaller single-core systems with up to 20 tasks. (Shortcut Key "g")
Merged View Mode: Shows all tasks and interrupts in a single column, with sideway indents to show preemption and blocking. This gives a more compact display compared to Gantt View Mode and the best sense of execution order and preemptions hotspots. This is the default visualization mode for single-core systems with more than 20 tasks. (Shortcut Key "m")
Split View Mode: Shows tasks and interrupts in two columns, with indents like in the Merged View Mode. This removes the "noise" from interrupts by presenting them separately. (Shortcut Key "s")

On multi-core (SMP) systems, the available view modes are:

Gantt View Mode: Shows one column per task and interrupt, all CPU cores together. This is best for spotting execution patterns, rare tasks or interrupts. (Shortcut Key "g")
Gantt/CPU View Mode: Shows one column per task and interrupt, separated in groups per CPU. This view mode is best for studing task migration between CPU cores. (Shortcut Key "c")
Merged View Mode: Shows all tasks and interrupts in a single column per core, with sideway indents to show preemption and blocking. This gives the best sense of execution order. This is the default visualization mode for multi-core/SMP systems. (Shortcut Key "m")
Flat View Mode: Shows tasks and interrupts in a single column per core, with no indents. This is useful for systems with many CPU cores, to get a more compact display. (Shortcut Key "f")


Kernel events such as system calls are displayed as text labels on the right side of the scheduling trace. The labels are color coded depending on the type and status of the operation.

Clicking on an actor, system call or user event shows information in the Actor Information display, as illustrated above. This is a tree structure containing a lot of information, both general statistics of the actor and information about the selected instance. Some nodes can be double-clicked to navigate the trace view accordingly. You can follow the execution of the selected actor by pressing "Previous Instance" and "Next Instance" buttons below the Actor Information display.

Double-clicking on an actor, system call or user event opens a focused view showing a list of all related events, the object history view. This shows all system calls on a selected kernel object, i.e., a message queue, semaphore or mutex. A similar view is available for actors, showing all instances of the actor. Double-clicking on a user event shows the user event log, showing a list of all user events recorded.


Additional Views

Below follows a few examples of other views and dialogs which complements the trace view. There are over 20 views in total. All views are connected to the main trace window or another relevant view, which allows you to switch perspectives without losing focus.


To locate a particular task or interrupt in the trace view, open the Finder (Shortcut Key "Ctrl-f"). In the Finder dialog, you can also specify filters on timing properties such as response time, in order to quickly find the extreme cases.


The Object History View allows you to track a particular kernel object. If you have message queue operations instrumented using LTTng UST, you can even follow the buffered messages from send to receive, or vice versa. Double clicking in this view highlights the event in the main trace view.

The CPU Load Graph displays the CPU usage per task and interrupt over time. This can be zoomed separately of the main trace view, and can also be used to navigate the trace view by double clicking in the graph.

Example Uses

Some of the possible uses of Tracealyzer are, to:

Tracealyzer gives you better possibilities for finding out what is really going on in your system. For instance, if a task never gets to execute you can see why this is the case. Perhaps higher priority tasks are using all CPU time, so that the scheduler never lets the lower priority tasks execute. This is revealed by the CPU Load Graph.

Another reason for a "missing task" might be that the triggering message/semaphore signal is never sent, or sent to the wrong destination, or that the timeout is too short so that the send fails. The Communication Flow graph shows an overview diagram of all IPC events, which can reveal if a message is sent to the wrong message queue. The Finder can be used to find individual IPC events, e.g., a message sent from a particular task to a particular message queue, and has a filter which can be used to quickly find operations which failed due to timeouts.

If the system performance is too slow, or if you simply want to measure the performance, you can study the statistics report for a performance overview - task response times and average CPU usage. If the system performance is not satisfying, the Actor Instance graph reveals the instances with high response times. The trace view shows you the reason - it could be that the actual execution time of the task is too long, but could also be caused by interference from other tasks or interrupts. This allows you to focus your optimization efforts on the true cause, which can save you a lot of time.

Requirements

Tracealyzer requires:


Copyright Percepio AB 2014, all rights reserved.