Detecting Stack Overflows (Part 1 of 2)

In a previous blog, I showed how you can display stack usage of kernel-based applications using µC/Probe™. In this two-part series, I will describe the importance of sizing your stacks at design time and checking task stacks at run‑time to catch stack overflows. I will first explore how to determine the size of task stacks and then, in Part 2, I will go into ways that can be used to detect overflows.

How Do You Determine the Size of a Task Stack?

In a kernel-based application, each task requires its own stack. The size of the stack required by a task is application-specific. It is possible to manually figure out the stack space needed by adding up:

  • The memory required by all function-call nesting. For each function-call hierarchy level:
    • Depending on the CPU architecture, one pointer for the return address of a function call. Some CPUs actually save the return address in a special register reserved for that purpose (often called the link register). However, if the function calls another function, the link register must be saved by the caller so, it might be wise to assume that the pointer is pushed onto the stack anyway.
    • The memory required by the arguments passed in those function calls. Arguments are often passed in CPU registers but again, if a function calls other functions, the register contents will be saved onto the stack anyway. I would thus highly recommend that you assume arguments are passed on the stack for the purpose of determining the size of a task’s stack.
    • Storage of local variables for those functions
    • Additional stack space for state-saving operations inside the functions
  • The storage for a full CPU context (depends on the CPU) plus FPU registers as needed
  • The storage of another full CPU context for each nested ISR (if the CPU does not have a separate stack to handle ISRs)
  • The stack space needed for local variables used by those ISRs.

Adding all this up is a tedious chore, and the resulting number is a minimum requirement. Most likely you would not allocate the size of the stack that precisely so that you can plan for “surprises.” The number you come up with should probably be multiplied by some safety factor, possibly 1.5 to 2.0. The stack usage calculation assumes that the exact path of the code is known at all times, which is not always possible. Specifically, when calling a function such as printf(), it might be difficult or nearly impossible to even guess just how much stack space printf()will require. Also, indirect function calls through tables of function pointers could be problematic. Generally speaking, start with a fairly large stack space and monitor the stack usage at run-time to see just how much stack space is actually used after the application runs for a while. For more information, please read “Exploring µC/OS-III’s Built-In Performance Measurements.”

Also, avoid writing recursive code, because stack usage is typically nondeterministic with this type of code.

There are cool and clever compilers/linkers such as Keil and IAR that provide this information in a link map. Specifically, for each function, the link map indicates the worst-case stack usage. However, these tools will not account for indirect calls (i.e., function pointers) or assembly language routines. GCC has partial support by providing per-function stack usage but not a call-graph. This feature clearly enables you to better evaluate stack usage for each task. It is still necessary to add the stack space for a full CPU context plus another full CPU context for each nested ISR (if the CPU does not have a separate stack to handle ISRs), plus whatever stack space is needed by those ISRs. Again, allow for a safety net and multiply this value by some factor.

If your kernel monitors stack usage at run-time, then it is a good idea to display that information and keep an eye on your stacks while developing and testing the product. Stack overflows are common and can lead to some curious behaviors. In fact, whenever someone mentions that his or her application behaves “strangely,” insufficient stack size is the first thing that comes to mind.

What are Stack Overflows?

Just so we are on the same page, below is a description of what a stack overflow is. For the sake of discussion, it is assumed here that stacks grow from high memory to low memory. Of course, the same issue occurs when the stack grows in the other direction. Refer to Figure 1.

Figure 1 – Stack Overflow

Figure 1 – Stack Overflow

  1. The CPU’s stack pointer (SP) register points somewhere inside the stack space allocated for a task. The task is about to call the function foo(), as shown below:
    
    void    foo (void);
    {
            int i;
            int array[10];
    
            :
            :
            // Code
    }
  2. Calling foo() causes the CPU to save the return address of the caller onto the stack. Of course, that depends greatly on the CPU and the compiler.
  3. The compiler then adjusts the stack pointer to accommodate for local variables. Unfortunately, at this point, we overflowed the stack (the SP points outside the storage area assigned for the stack), and just about anything foo() does will corrupt whatever data is beyond the stack base. In fact, depending on the code flow, the array might never be used, in which case, the problem would not be immediately apparent. However, if foo() calls another function, there is a high likelihood that will cause something outside the stack to be touched.
  4. So, when foo() starts to execute code, the SP has an offset of 48 bytes from where it was prior to calling foo() (assuming a stack entry is 4 bytes wide).
  5. We typically do not know what resides here. It could be the stack of another task, it could be variables, data structures or an array used by the application. Overwriting whatever resides here can cause strange behaviors: Values computed by another task may not be what you expected and could cause decisions in your code to take the wrong path, or your system may work fine under normal conditions but then fail. We just do not know, and it is actually quite difficult to predict. In fact, the behavior can change each time you make changes to your code.

Having looked at how to determine the size of a task stack as well as what stack overflows are in this article, Part 2 (coming March 14) will look at the different stack overflow detection methods.

Questions or Comments?

Have a question or a suggestion for a future article?
Don't hesitate to contact us and let us know!
All comments and ideas are welcome.