Detecting Stack Overflows (Part 2 of 2)

In Part 1 of this two-part series, we looked at what stack overflows are and how to determine the size of a task stack. Now, we turn to detecting stack overflows, as there are a number of techniques that can be used. Some make use of hardware, while some are performed entirely in software. They are listed in order of the most preferable to the least preferable, based on the likelihood of detecting the overflow. As we will see shortly, having the capability in hardware is preferable, since stack overflows can be detected nearly immediately as they happen, which can help avoid those strange behaviors and aid in solving them faster.

Hardware stack overflow detection mechanisms generally trigger an exception handler. The exception handler typically saves the current program counter (PC) and possibly other CPU registers onto the current task’s stack. Of course, because the exception occurs when we are attempting to access data outside of the stack, the handler would overwrite some variables or another stack in your application, assuming there is RAM beyond the base of the overflowed stack.

In most cases, the application developer will need to decide what to do about a stack-overflow condition. Should the exception handler place the embedded system in a known safe state and reset the CPU, or simply do nothing? If you decide to reset the CPU, you might figure out a way to store the fact that an overflow occurred and which task caused the overflow so you can notify a user upon reset.

Technique 1: Using a stack limit register

Some processors (unfortunately, very few of them) have simple yet highly effective SP overflow-detection registers. This feature will, however, be available on processes based on the ARMv8-M CPU architecture. When the CPU’s SP goes below (or above, depending on stack growth) the value set in this register (let us call it the SP_Limit register), an exception is generated. The drawing in Figure 2 shows how this works.

Figure 2 – Using a Stack Limit Register to Detect Stack Overflows

Figure 2 – Using a Stack Limit Register to Detect Stack Overflows

  1. The SP_Limit register is loaded by the context switch code of the kernel when the task is switched in.
  2. The location where the SP_Limit points to could be at the very base of the stack or, preferably, at a location that would allow the exception handler enough room to save enough registers on the offending stack to handle the exception.
  3. As the stack grows, if the SP register ever goes below the SP_Limit, an exception is generated. As we have seen, when your code calls a function and uses local variables, the SP register can easily be positioned outside the stack upon entry of a function. One way to reduce the likelihood of this happening is to move the SP_Limit further away from the stack base address.

µC/OS-III® was designed from the get-go to support CPUs with a stack limit register. Each task contains its own value to load into the SP_Limit, and this value is placed in the task control block (TCB). The value of the SP_Limit register used by the CPU’s stack overflow detection hardware needs to be changed whenever μC/OS-III performs a context switch.

The sequence of events to do this must be performed in the following order:

  1. Set SP_Limit to 0. This ensures the SP is never below the SP_Limit register. Note that I assumed here that the stack grows from high memory to low memory, but the concept works in a similar fashion if the stack grows in the opposite direction.
  2. Load the SP register.
  3. Get the value of the SP_Limit that belongs to the new task from its TCB. Set the SP_Limit register to this value.

The SP_Limit register provides a simple way to detect stack overflows. Unfortunately, I know of very few CPUs that implement this feature. The only one that comes to mind is the Infineon 80C166/167.

Technique 2: Using an MPU – Stacks are contiguous

Many of the current processors are equipped with a memory protection unit (MPU), which typically monitors the address bus to see if your code is allowed to access certain memory locations or I/O ports. MPUs are relatively simple devices to use but are somewhat complex to set up. However, if all you want to do is detect stack overflows, then an MPU can be put to good use without a great deal of initialization code. The MPU is already on your chip, meaning it is available at no extra cost to you, so why not use it? In the discussion that follows, we will set up an MPU region that says, “If ever you write to this region, the MPU will generate an exception.”

One way to set up your stacks is to locate all of the stacks together in contiguous memory, starting the stacks at the base of RAM, and locating the C stack as the first stack at the base of RAM, as shown in Figure 3.

Figure 3 – Locating Task Stacks Continuously

Figure 3 – Locating Task Stacks Continuously

As the kernel context switches between tasks, it moves a single MPU “protection window” (I will call it the “RED Zone”) from task to task, as shown in Figure 4. Note that the RED Zone is located below the base address of each of the stacks. This allows you to make use of the full stack area before the MPU detects an overflow.

Figure 4 – Moving the RED Zone During Context Switches

Figure 4 – Moving the RED Zone During Context Switches

As shown, the RED Zone can be positioned below the stack base address. The size of the RED Zone depends on a number of factors. For example, the size of the RED Zone on the MPU of a Cortex-M CPU must be a power of 2 (32, 64, 128, 256, etc.). Also, stacks must be aligned to the size of the RED Zone. On processors based on the ARMv8-M architecture, this restriction has been removed, and MPU region size granularity is 32 bytes. The larger the RED Zone, the more likely we can detect a stack overflow when a function call allocates large arrays on the stack. However, locating RED Zones below the stack base address has other issues. For one thing, you cannot allocate buffers on a task’s stack and pass that pointer to another task, because it is possible that the allocated buffer would be overlaid by the RED Zone, thus causing an exception. However, allocating buffers on a task’s stack is not good practice anyway, so getting slapped by an MPU violation is a kind punishment.

You may also ask: “Why should the C stack be located at the start of RAM?” Because in most cases, once multitasking has started, the C stack is never used and is thus lost. Overflowing into RAM that is no longer used might not be a big deal but, technically, it should not be allowed. Having the C stack RAM simply allows us to store the saved CPU registers that are stacked on the offending task’s stack during an MPU exception sequence.

Technique 3: Using an MPU – Stacks are noncontiguous

If you are not able to allocate storage for your tasks in continuous memory as I outlined in the previous section, then we need to use the MPU differently. What we can do here is to reserve a portion of RAM toward the base of the stack and, if anything gets written in that area, then we can generate an exception. The kernel would reconfigure the MPU during a context switch to protect the new task’s stack. This is shown in Figure 5.

Figure 5 – Locating the Red ZONE Inside a Task's Stack

Figure 5 – Locating the Red ZONE Inside a Task's Stack

Again, the size of the RED Zone depends on a number of factors. As previously discussed, for the MPU on a Cortex-M CPU (except for ARMv8-M), the size must be a power of 2 (32, 64, 128, 256, etc.). Also, stacks must be aligned to the size of the RED Zone. The larger the RED Zone, the more likely we can detect a stack overflow when a function call allocates large arrays on the stack. However, in this case, the RED Zone takes away storage space from the stack because, by definition, a write to the RED Zone will generate an exception, and thus cannot be performed by the task. If the size of a stack is 512 bytes (i.e., 128 stack entries for a 32-bit wide stack), a 64-byte RED Zone would consume 12.5% of your available stack, and thus leave only 448 bytes for your task, so you might need to allocate larger stacks to compensate.

As shown in Figure 6, if a function call “skips over” the RED Zone by allocating local storage for an array or a large data structure, then the code might not ever write in the RED Zone and thus bypass the stack overflow detection mechanism altogether. In other words, if the RED Zone is too small, foo() might just use i and array[0] to array[5] but nothing happens to overlap the RED Zone.

Figure 6 – Bypassing the RED Zone

Figure 6 – Bypassing the RED Zone

To avoid this, local variables and arrays should always be initialized:

void    foo (void);
{
        int i;
        int array[20];

        for (i = 0; i < 20; i++  {     // Make sure we have storage
            array[i] = 0;
        }
        :
        :
        // Code
}

Technique 4: Software-based RED zones

µC/OS-III has a built-in RED Zone stack overflow detection mechanism, but it is implemented in software. This software-based approach is enabled by setting OS_CFG_TASK_STK_REDZONE_EN to DEF_ENABLED in os_cfg.h. When enabled, µC/OS-III creates a monitored zone at the end of a task's stack, which is filled upon task creation with a special value. The actual value is not that critical, and we used 0xABCD2345 as an example (but it could be anything). However, it is wise to avoid values that could be used in the application such as zero. The size of the RED Zone is defined by OS_CFG_TASK_STK_REDZONE_DEPTH. By default, the size of the RED Zone is eight CPU_STK elements deep. The effectively usable stack space is thus reduced by eight stack entries. This is shown in Figure 7 (below).

µC/OS-III checks the RED Zone at each context switch. If the RED Zone has been overwritten or if the stack pointer is out of bounds, µC/OS-III informs the user by calling OSRedzoneHitHook(). The hook allows the user to gracefully shut down the application, since at this point, the stack corruption may have caused irreversible damage. The hook, if defined, must ultimately call CPU_SW_EXCEPTION() or otherwise stop µC/OS-III from proceeding with corrupted data.

Since the RED Zone is typically small, it is crucial to initialize local variables, large arrays or data structures upon entry of a function to detect the overflow using this mechanism.

The software RED Zone is optimal because it is portable across any CPU architecture. However, the drawback is that it consumes potentially valuable CPU cycles during a context switch.

Figure 7 – Software-Based RED Zone

Figure 7 – Software-Based RED Zone

Technique 5: Determining the actual stack usage at run-time

Although not actually an automatic stack overflow detection mechanism, determining the ideal size of a stack at run-time is highly useful and is a feature available in µC/OS-III. Specifically, you would allocate more stack space than is anticipated to be used for the stack, then monitor and possibly display actual maximum stack usage at run-time. This is fairly easy to do. First, the task stack needs to be cleared (i.e., filled with zeros) when the task is created. You should note that we could have used a different value than zero. Next, a low-priority task walks the stack of each created task, from the bottom toward the top, counting the number of zero entries. When the task finds a nonzero value, the process is stopped and the usage of the stack can be computed (in number of stack entries used or as a percentage). Then, you can adjust the size of the stacks (by recompiling the code) to allocate a more reasonable value (either increase or decrease the amount of stack space for each task). For this to be effective, however, you need to run the application long enough and under stress for the stack to grow to its highest value. This is illustrated in Figure 8.

Figure 8 – Determining Actual Stack Usage at Run-Time

Figure 8 – Determining Actual Stack Usage at Run-Time

µC/OS-III provides a function that determines stack usage of a task at run-time, OSTaskStkChk() and, in fact, µC/OS-III’s statistics task, OS_StatTask(), calls this function repeatedly for each task created every 1/10th of a second. This is what µC/Probe displays, as described in my article: “Exploring µC/OS-III’s Built-In Performance Measurements.”

Summary

This paper described different techniques to detect stack overflows. Stack overflows can occur either in single- or multithreaded environments. Even though we can detect overflows, there is typically no way to safely continue execution after one occurs and, in many cases, the only recourse is to reset the CPU or halt execution altogether. However, before taking such a drastic measure, it is recommended that your code bring your embedded system to a known and safe state. For example, you might turn off motors, actuators, open or close valves and so on. Even though you are in a shutdown state, you might still be able to use kernel services to perform this work.

Tags: , ,

Questions or Comments?

Have a question or a suggestion for a future article?
Don't hesitate to contact us and let us know!
All comments and ideas are welcome.