RTOS applications rarely fail because a single task is misbehaving. Instead, problems emerge from interactions that are often invisible at the code level. The source may show you which task uses which synchronisation object but it cannot show you when these interactions occur or what the timing variations are. On multi-core systems, additional sources of non-determinism further complicate the picture.
As a result, systems can appear stable during testing only to fail in the field in ways that are nearly impossible to reproduce.