Why write a book about debugging real-time systems? Good question, I’m glad you asked. A lot has been written about debugging real-time systems or embedded systems, but what has been written has not, to the best of my knowledge, been collected into one resource, such as a book.
After having taught embedded system design for many years, I’ve come to the conclusion that we are failing as teachers because our students can write a program in assembly, C, C ++, C#, some Arduino dialect, or Verilog, and get their program to compile. However, if problems crop up, as they invariably do, students lack the diagnostic skills to analyze the problems and, in a systematic way, zero in on the possible causes, and then find and fix the bugs. I hope to address this issue in the chapters to come.
What I observe with depressing regularity is that students take the “shotgun approach.” Try a bunch of changes at once and hope for the best. Even more disturbing, rather than try to find and fix a problem, students will just throw away their code or their prototype and start all over again, hoping beyond hope that will fix the problem.
You might assume when the students of today become the engineers of tomorrow and are totally immersed in product design, they will have developed the debugging skills they need to do their job in the most effective manner. I’ve learned that assumption does not hold true.
Before I became an academic, I worked in industry, creating design and debug tools for embedded systems designers. In particular, I designed and led teams that designed logic analyzers, in-circuit emulators, and performance analyzers. These were and in many cases still are complex instruments designed to solve complex problems. Just learning to effectively use one of these instruments can be a chore that many engineers don’t feel the desire to invest the time required to learn.
Maybe you’ve been there yourself. Do you do a mental cost/benefit analysis to invest the time to wade through a set of manuals,a or just dive in and hope for the best? One of the most brilliant engineers I ever worked with, John Hansen, made this observation that came to be known as Hansen’s Law, which says:
If a customer doesn’t know how to use a feature, the feature doesn’t exist.
So, as vendors of these complex and expensive debugging tools, we certainly own a good part of the problem. We have not been able to effectively transfer technology to a user in a way that allows the user to take full advantage of the power of the tool to solve a problem.
Here’s another example. I remember this one vividly because it led me to think a whole new way about technology and how to transfer it. We’ll come back to it in a later chapter, but this is a good time to introduce the problem. It involves logic analyzers. For many years, the logic analyzer has been one of the premier tools for real-time system analysis. There’s some evidence that dominance may be changing, but for now, we’ll assume the logic analyzer still holds a position of prominence.
Suppose you are trying to debug a complex, real-time system with many high-priority tasks running in parallel. Stopping the processor to “single step” through your code is not an option, although many excellent debuggers are task-aware, so they may be able to single step in a particular task without stopping other tasks from running at full speed.
The logic analyzer sits between the processor or processors and the rest of the system, and records in real time the state of every address bit, data bit, and status bit output from the processor, then inputs to the processor as they occur in real-time. Once the buffer or recording memory is full, the engineer can then trace through it and see what exactly transpired during the time interval of interest.
But how do you define the time interval of interest? Your memory buffer is not infinitely large, and your processor is clipping along at 10 s to 100 s of millions of bus cycles every second. This is where the logic analyzer really shines as a tool. The user can define a sequence of events through the code, very much like a sequence of states in a finite state machine. In some logic analyzers, these states can be defined in terms of the high-level C ++ code with which the engineer is accustomed to programming.
If the user gets the sequence of states correctly defined, the logic analyzer will trigger (capture) the trace at just the right time in the code sequence to show where the fault occurs. Here is where it gets interesting. At the time,b the HP (now Keysight) logic analyzers had relatively small trace buffers but very sophisticated state machines. The design philosophy was that the user didn’t need a deep trace buffer because she could zero in on exactly the point where the problem occurs. In fact, the state machines on the logic analyzers were eight levels deep. Here’s an example of how you might use it. Refer to Fig. 1 below.
This example is three levels deep and each level had many options in terms of defining the state, or the number of times a loop might run. What we discovered was that our customers rarely if ever tried to set up the trigger condition beyond two levels. What they didn’t like about the product was that our trace buffer was too shallow (5000 states). They preferred simple triggering with deep memory to complex triggering with shallow memory. This was pretty consistent with each customer visit we conducted.
What’s the point? Instead of using the powerful triggering capability of the logic analyzer, the engineers we spoke with preferred to take the path of least resistance. They preferred to manually wade through a long trace listing to find the event of interest rather than learn how to set up a complex trigger condition. In other words, trade off using a complex tool for a less capable but easier-to-use tool.
You could argue that the engineers were just lazy, but I think that’s the wrong perspective. I’m sure that if we, the tool designers, could have invented a more intuitive and user-friendly interface between the engineer trying to solve a difficult debugging problem and the tool itself, the engineer would have gone for the best solution every time and his debugging skills would have improved in the process.
Why did I bring up this example? I wanted to mention it up front because debugging real-time systems is often very difficult and engineers need to use complex tools in order to bring high-quality products to market in a timely manner. If this book contributes to learning how to use tools or sensitizes the reader to dig deeper into the user manual, then this book will have served its purpose.
Let’s discuss this book. The initial focus is aimed at the student. Not just the ones who want to enter the field of embedded systems design, but rather all Electrical Engineering, Computer Science, or Computer Engineering students who wish to improve their skills in the debugging of their designs. Also, you would be able to casually mention during a job interview that you’ve taken some effort to go beyond just doing projects by also being able to bring defect-free projects to completion. You’ve just expressed to the interviewer that you are entering the job market with a skillset beyond what your graduating peers might have.
For the experienced engineer who is already a practitioner and wishes to hone his or her skillset, I hope this book will provide you with a roadmap to tools and techniques that you may not be aware of, or to more efficient ways to solve the problems that seem to crop up on an ongoing basis.
In researching and writing the book, I decided that application notes and white papers were the very best sources of information on specific categories of bugs. Having written more than a few of these articles myself, I am pretty confident that this was a good decision.
If you think about it, it becomes obvious. Companies are constantly polling their customer base for design and debug problems that invariably appear as the technology advances. These customer problems are the driving force for the creation of tools that will solve the problems, or at least point to the source of the problems.
Once the tool is invented, its potential value must be explained to the customer base, which leads to presentations at conferences, technical articles in industry publications, and application notes that link the problem, the source of the problem, the tool, and the solution in a way that the engineer can internalize and justify purchasing to upper management.
While these articles are clearly self-serving for the companies generating them, they are also valuable resources that provide the best up-to-date and practical information that engineers need. For me, they became my principal resource for this book.
We’ll start by examining the debugging problem itself. What is the nature of real-time systems that makes debugging so unique? At this point you might say, “Duh, its real time.” But that’s only a part of the problem. For a large fraction of embedded systems, the fixed hardware, reprogrammable hardware, firmware, operating system, and application software will probably be unique, or at least a large fraction of the design is unique.
Even with products that are evolutionary, rather than revolutionary, there may be many new elements that must be integrated into the overall design. So, the problem is more than simply real time versus not real time. The problem is really the number of variables in the system AND the fact that the systems must run in real time as well.
After we scope the problem, we’ll turn our attention to an overall strategy of how to debug hardware and software. We’ll look at best practices and general strategies. Also, it will be useful to consider testability issues as well because debugging a system that wasn’t designed to be debugged in the first place can be a challenge.
From strategies, we’ll turn our attention to tools and techniques. We’ll look at some classic problems and look for methods to solve them.
Next, we’ll look at what kind of support silicon manufacturers provide in the form of on-chip debugging and performance resources to help their customers bring new designs to market.
The final section of the book will cover serial protocols and how to debug them. According to some experts in the field, more and more of the debugging problems are related to serial data movement and less are lending themselves to classical debugging techniques.
I’ve tried to make this an easy read rather than a pedantic tome. I’ve put a lot of personal anecdotes in because they can help make a point and they’re fun to read and fun to write. I took my lead here from Bob Pease’s classic book, Troubleshooting Analog Circuits . Many senior engineers remember Bob’s column in Electronic Design Magazine, “What’s all this….”
His book and these columns were classic reads and I am unashamedly borrowing his conversational style for this book as well. As an aside, my favorite “What’s all this….” column was about his analysis of these ultrapricy speaker cables that were cropping up, claiming audio advantages over simple lamp cord wires. Bob does a rigorous analysis that is both fun to read and educational at the same time. More to the point, he pretty well puts the audio superiority claims to bed.
I’ve also noticed that I haven’t strictly adhered to keeping subject material isolated in the appropriate chapters. You’ll find that some examples may appear in different chapters. That’s by design, rather than me having a “senior moment.” It comes from teaching. I often repeat and review material in order to place it in a different context, rather than simply lecturing on it and moving on. So, if you see a discussion of common real-time operating system (RTOS) bugs in the chapter on common software defects and then run into it again in the chapter on debugging real-time operating systems, don’t say I didn’t warn you.