How Many Bugs Do You Miss?

Mar 12, 2019 |

Computer scientist Edsger Dijkstra once said, “Program testing can be used to show the presence of bugs, but never to show their absence”. Despite developers testing their software as much as possible, they just can’t prove that bugs don’t exist in the system. And missed bugs are pretty common – Jack Ganssle writes that about 95 % of all bugs introduced during embedded software development are found, meaning that 5 % are missed and remain in the production firmware. And since even great programmers (top 1%) introduce around 11 defects per KLOC (1000 lines of code), missing 5 % of the bugs is quite significant.

Assuming a 100 KLOC application with 20 defects per KLOC and 5% of the bugs missed, you end up with 100 bugs in your shipped product. Some perhaps harmless or very unlikely to ever cause any trouble, but you just can’t know. Your system may appear to work just fine, as the bugs you miss are probably related to unexpected scenarios and corner cases. But once the system is exposed to large amounts of real life use-cases, these bugs may cause all sorts of trouble. A famous example is NASA’s Mars PathFinder mission that nearly failed due to a software issue. In this case, the problem was actually analyzed and fixed thanks to remote diagnostics and update capability.

Once a product is deployed, it can be extremely difficult to get any useful information as to what issues that actually occurs. In practice, development teams are reliant upon their customers to report any issues, a responsibility they have not agreed to and thus can’t be expected to fulfill. With a connected IoT device though, developers can leverage a new service, the Device Firmware Monitor (DFM), to report issues during testing or in-the-field. Let’s look at what the DFM is and how it can help developers.

What is the Device Firmware Monitor?

The Device Firmware Monitor allows development teams to become aware of issues in their deployed devices and retrieve trace data, allowing the team to analyze and identify the root cause. Once the cause is identified, teams can quickly provide a fix for the software and patch it using over-the-air updates before most users are affected by the issue.

The DFM can be thought of as a software “flight recorder” that leverages cloud connectivity. A small trace recorder library is installed in the code base and records the software behavior to a RAM ring buffer, based on code instrumentation in the RTOS kernel and other relevant APIs. When the system misbehaves, an error message and the trace data that has been recorded in the background can then be transmitted (directly or after a reboot) through a communication interface such as Wi-Fi or Ethernet to a cloud service, that stores a report and notifies the developer. The developer can then access this trace data via Tracealyzer to review what was happening in the system leading up to the error, and reproduce those events on the bench so that the issue can be resolved quickly. A general overview of how this works can be seen in the image below:

DFM via AWS

What types of bugs can DFM detect?

The Device Firmware Monitor can detect a large array of potential problems within the device firmware. First, developers can setup alerts for typical issues such as failed assertions or when a fault handler is generated. They can also setup custom triggers that can detect issues such as timeouts, stack overflow or other issues that might occur in a real-time embedded system. Developers can customize the firmware to detect and report only what they consider to be issues. This may also include warnings, e.g. that the stack usage has exceeded 95%.

When an issue is detected, the error message and the trace data is uploaded to the cloud service, that stores it and notifies the developer about the issue. One nice thing about the Device Firmware Monitor is that if a team has 1,000 devices in the field all reporting the same bug, they aren’t notified 1,000 times. Instead, they are notified once and informed that there has been 1,000 detections of this unique issue so far. This automatic classification helps to keep the developer’s inbox from being overwhelmed from multiple devices reporting the same bug. This way, you also ensure that each unique issue is noticed, even if a single report of this issue in a large volume of other reports.

Conclusions

In our connected world, there is no longer a need to rely on the end customer to report when a device isn’t working as expected in the field. Using the Device Firmware Monitor, development teams can make sure to get alerted as soon as issues are detected, in the field or on the bench, and with meaningful diagnostics that allows the bug to be quickly resolved. This is a capability that won’t just improve the quality of embedded software but allow early adopters to get ahead of their competition. In the next post, we will dive deeper into the Device Firmware Monitor and look at an example use case on how it might be used on a device.


Interested in evaluating DFM? Let us know!

If you are interested in evaluating Percepio DFM, apply for early access which will be available in Fall 2019 by sending a mail to info@percepio.com.