Percepio DevAlert™ – Cloud-based Error Reporting and Remote Diagnostics for Deployed IoT Firmware
(Formerly known as Device Firmware Monitor/DFM.)
Let’s face it – we can never be certain that any software is free of bugs. At release, embedded software often contain 3-5 bugs per 1,000 lines of source code, i.e., bugs that have been missed despite all verification efforts. It’s practically impossible to test every possible usage scenario and code path – they are simply far too many. You can always spend more time and money on verification, but you can never know when all bugs have been found and at some point you need to stop testing and start shipping.
(Percepio DevAlert was formerly known as Device Firmware Monitor)
Even if you are testing all code according to best practices and everything seems to work perfectly, the requirements and test cases probably doesn’t cover all ways that your customers will be using the product in practice. The real test often comes when thousands of people start using your product, in ways you never anticipated. Will it stand the test of reality?
Missed bugs irritate your customers, damage your reputation and hurt sales. In some cases, bugs may even lead to accidents, product recalls and legal action. The rise of Internet-of-Things (IoT) makes firmware quality assurance even more challenging, but IoT also provides a new remedy – over-the-air (OTA) software updates.
The faster you can fix missed bugs and push out an OTA update, the fewer customers will be affected, and for less time. If you are fast enough, most customers won’t even notice it. However, you cannot fix bugs that you are not aware of, and the reaction time is critical.
FreeRTOS creator Richard Barry presents Percepio DFM (DevAlert) at Embedded World 2019.
Automatic Feedback Within Seconds
Enter Percepio DevAlert, a ground-breaking new cloud service for IoT product organizations that provides awareness of firmware problems in deployed devices and speeds up resolution. When a firmware issue has been detected, DevAlert notifies the developers within seconds and provides diagnostic information about the issue, including a trace for Percepio Tracealyzer. This shows what was going on in the code when the error occurred, making it far easier to understand the problem and quickly find a solution.
“A game-changer in that it enables instant feedback from systems deployed in the field, to ensure your firmware quality is constantly improving.”
Jack Ganssle, Principal Consultant, TGG
“Percepio DevAlert is early to the market and original. IoT developers need this sort of direct feedback from their deployed systems.”
William E. Lamie, President, Express Logic
Without automatic feedback, you actually rely on your end users to report any issues, a responsibility they have not agreed to. Then you might not hear about the issues until it’s too late, when many customers have already been affected. Moreover, your end users can’t be expected to provide sufficiently detailed information for you to quickly reproduce, debug and solve the problem. A vague error report like “it just crashed” may require weeks of guesswork until you find a likely cause, and even then, you still don’t know if you really solved the right problem. Imagine how much troubleshooting time that could be saved if you instead had access to detailed diagnostic information about every issue in the production software.
Percepio DevAlert is designed to leverage existing secure solutions for cloud connectivity, storage and OTA updates. It initially supports Amazon Web Services (AWS IoT Core), Amazon FreeRTOS and ThreadX, but support for additional platforms is planned and can be provided on request.
The information flow starts in the error handling code of the IoT device, such as sanity checks and fault exception handlers. By calling the DevAlert firmware agent from these locations, firmware issues are uploaded as alerts to the customer’s cloud account. An alert may include an error message and any other information relevant to the specific issue, such as software state variables and hardware registers. Depending on the severity of the issue, the alert is either uploaded directly or after a device restart, once the cloud connection has been restored.
The alert also includes a trace of the most recent software events prior to the error, which is recorded automatically by the DevAlert target agent. This tracing technology builds on 15 years of experience in RTOS tracing and is 4-8x more memory efficient than traditional RTOS tracers – only 4 KB is needed to store a trace with up to 1,000 software events. The efficient trace encoding is very important for three reasons – it allows us to collect traces of sufficient length even from memory-constrained IoT systems, it minimizes the upload time to a fraction of a second and also reduces the cloud-side operational costs.
Alerts from the DevAlert target agent are uploaded to the customer’s cloud service, which is configured to store the alerts and also to notify the Percepio DevAlert Classification Engine. This cloud service is the core of the DevAlert solution and is a fully managed service, running in Percepio’s AWS cloud account. It is responsible for classification, statistics and notifications to the developers. It also offers configuration options, e.g., under what conditions notifications should be sent and where to send them.
When the developers receive a notification about a new issue, they can access the alerts and traces directly from Percepio Tracealyzer. The DevAlert Dashboard in Tracealyzer shows the recent alerts and allows for high-level analytics, e.g. if a certain issue was fixed by your latest firmware update. Moreover, the traces can be opened directly from the DevAlert Dashboard in Tracealyzer.
Your Information Is Secure
The software trace never leaves the customer’s cloud account. Only an anonymous signature of the issue is provided to the Percepio DevAlert Classification Engine and this information is completely transparent and configurable for the customer. Furthermore, all communication and storage is protected using best practices for authentication and encryption.
Since DevAlert focuses on the behavior of the device software and does not identify the device users, it is not subject to privacy legislation such as GDPR.
Cloud-Side Operational Costs
DevAlert does not generate any data traffic unless an issue is detected, so if you don’t have any missed bugs in your code, there is no DevAlert activity that drives cost. You need to store the alerts for some time, but due to the small amount of data per alert (typically 5 KB) and today’s cheap cloud storage, this is a negligible cost. Especially when compared to the value of the provided information. Say that you have a large fleet of one million devices, with a lot of firmware issues – say 1 alert per device and week on average, and each alert is 5 KB. This would produce about 260 GB per year. Storing this data for a year would cost about $72, assuming Amazon S3 standard storage. But since most data can be deleted after a short time, e.g. duplicates of the same issue from different devices, the storage needed can be reduced to a fraction of this level.
Sending out OTA updates in response to DevAlert notification can somewhat more costly, at least for large device fleets, but this must be compared to the alternative cost of letting a serious bug remain unfixed – damaged customer experience, reduced product sales, or even accidents and legal action. In case of minor issues, it may be sufficient to include the bug fix in the next planned update.
Does this sound interesting? Please contact us at email@example.com to learn more and get started.