6: The hardware/software integration phase – Debugging Embedded and Real-Time Systems

6: The hardware/software integration phase

Abstract

This chapter is a detailed examination of the hardware/software integration phase. It covers each phase of the integration process and looks at possible defects that can be introduced, or are introduced, in that phase. It also considers performance issues as the product begins to be tested against its design specifications. Finally, it examines compliance issues that will inevitably come to light during final product validation.

Keywords

CodeTEST; TUV; FPGA; Synthesis; MIPS; EEMBC; Validation; RF; Integration; Compliance

Introduction

The hardware/software (HW/SW) integration phase is the point in the project when untested software and untested hardware are brought together for the first time. Power is applied and either the hardware catches fire, nothing happens, or, miracle of miracles, there are signs of life. Of course, there are gradations within this nightmare scenario, but the key characteristic, the one major issue faced by the developers of embedded systems, is that you have more unknown variables than most other categories of product designs.

Consider the problem without my doomsday scenario. A software developer, writing applications for a PC or smart phone, has a standard platform with known APIs. In this scenario, bugs are overwhelmingly due to errors in the application code being written at the time. Now consider the same scenario when the hardware platform is not standard and has not been thoroughly tested and wrung out. This is the issue we face when we are developing new embedded systems and mitigating this issue is the bread and butter that keeps embedded tool vendors alive.

The HW/SW integration diagram

The classic model of the embedded system lifecycle is shown in Fig. 6.1. I’m certain that if we extracted this graph from every tool marketeer’s slide deck, we could create one massive coffee table book. Fundamentally, they all tell the same story:


Fig. 6.1 Classic embedded systems life cycle model.

Phase 1: Product specification: This is where the idea for a product comes from. It might be from the marketing department, from sales, from R&D,a from a customer, or from a competitor. Wherever the germ of the idea comes from, this is the phase where it is fleshed out and a set of specifications is derived. Many companies will also create a list of “musts” and “wants.” The “musts” list represents the features that are needed for it to be successful and the “wants” are features that would be added if time and/or resource pressures permit.

This is also the phase where the concepts of “internal specification” and “external specification” are developed. The external specification is what the customer sees. These are the specifications that you may need to validate with customers through various forms of market research, such as a focus group or through customer visits.b

The internal specification is the roadmap for how the product will be designed. It includes details about the processor, the memory, clock speed, hardware and software tools, customer use environment (office, home, outdoors, military, hospital, industrial, etc.), manufacturing cost goals, development schedule, and anything else that the design team needs to begin work on the project.

Like most of the other phases to follow, Phase 1 is iterative. There is, or should be, a fluidity that allows the various project stakeholders to provide input. Stakeholders are any group that may be impacted by the product as they must execute on it. For example, I once was working on a project and during one of the regularly scheduled reviews, both manufacturing and QA let it be known that a particular semiconductor vendor was no longer viable due to manufacturing shortages and higher than normal part failures in other products. Because I wasn’t planning to use the vendors’ parts, it wasn’t an issue, but it could have been. Most of the time, any well-respected vendor’s parts are acceptable and usually aren’t called into question except when the part is only available from that manufacturer. A sole-sourced part generally got special attention because, as should be obvious, if that part goes away, the product must be redesigned or obsoleted, and products in the field might not be repairable.

Many products designed for industrial or military applications have useful lifetimes of 25 years or more. Products that are intended to be used in these applications need to have special provisions for extended life support. One situation that I was aware of when I worked for a semiconductor manufacturer was when an aerospace company decided to use our product in an avionics application. In order to get the contract to supply them with processors, we had to do a number of production runs ahead of any sales of the part, and then put those processors away in a safe place should we stop production of the part anytime in the future.

What about debugging Phase 1? Does it make any sense at all? Can you have bugs in the specification of something? I would advocate for a more general definition of debugging embedded systems to include any defect that arises in any phase of the product development life cycle. That definition should include a process misstep, an erroneous assumption, bad marketing data, and judgment errors because we are human and product definition is not an exact science.

Let’s consider some possible defects in the definition phase of an embedded microprocessor system. I’ll just cover a few that I am aware of or was personally involved in.

The case of the nonstandard hard disk drive interface

Hewlett-Packard introduced its Vectra Portable CS computer in 1987, and it contained a floppy disk drive and a 20 MB, 3.5″ hard drive. The company had high hopes for the portable, but it failed in the market. Here’s the description from the HP Computer Museum [2]:

The Vectra Portable CS was the portable version of the Vectra CS. The Portable CS had a large LCD screen as well as CGA adaptor for use with an external monitor. The Portable was offered in two mass storage configurations: dual 3.5 inch (1.44 MB) floppy disc drives - P/N D1001A, or a floppy disc drive and a 20 MB hard disc drive - P/N D1009A. The Portable CS did not succeed due to its large size (much larger than the Portable Plus), relatively high price and non-standard media (3.5 inch discs).

The Portable Vectra CS was introduced on September 1, 1987. It was discontinued on May 1, 1989.

The company had high hopes of selling the hard drive as an OEM product to other computer manufacturers because it was one of the first hard drives out in the 3.5″ form factor. Rather than adopt the industry standard integrated drive electronics (IDE) interface, developed by Compaq and Western Digital in 1986, they designed an interface that used a 40-pin connector, but otherwise, was entirely different. As a result, the drive was never adopted by other manufacturers and HP discontinued manufacturing soon after. The lesson? Standards matter.

The last gasp of the vector display

The Colorado Springs Division of HP produced oscilloscopes and OEM displays for other HP divisions. These displays were vector displays. These displays were called vector displays because they drew images on the screen by writing a line, or vector, from one point to another. Text was created in the same way.

We were convinced that vector displays, despite the fact that they were expensive, would never be replaced by raster displays such as the display in a TV or computer monitor because of the phenomenon called “the jaggies”; raster displays had jaggies and displays did not. The jaggy was the slight stair-step effect that was apparent in a line on the screen that was not exactly horizontal or exactly vertical. Even with smaller and less-expensive raster displays becoming more prevalent, our display group continued to push products with vector displays, even as the market for these displays dried up. The lesson? Don’t continue to go with old technology when all indicators point to its demise.

Underpowered emulator card cage

I can recall another poor decision that resulted in a failed product. It involves the HP 64000 product family. The original HP 64000 was a standalone workstation. Because most of our customers were migrating to UNIX workstations and HP had just purchased a workstation manufacturer (Apollo), we felt it necessary to develop a product that would interface with a UNIX workstation. The product was a card cage box with a control card and power supply that would take the existing HP 64000 personality cards. A decision was made to use an 8-bit MC6809 microprocessor to control the box rather than the much more powerful MC68000, 16/32-bit processor. The reason for the decision was that having the MC68000 processor in the box might allow a user to run UNIX on the card cage itself. Because of this decision, the box was so underpowered that the performance was terrible with long delays in data transfer and response.

Feature creep and the big customer

This is the classic marketing story about the sales or marketing engineer who returns from a visit to a major account and insists that a new feature must be added to the current product under development because that client wants it. This usually results in “feature creep” because the R&D team was uncertain about the product that they were actually trying to invent, so they decided to cover their rear ends (CYA) by adding a lot of features that really weren’t necessary and just added cost and time to the development schedule.

For 5 years while I was a project manager at HP, I served on the Project Management Council, a group sponsored by HP’s Corporate Engineering. We met several times a year and tried to come up with initiatives that could disseminate the best practices within the many worldwide divisions of HP.c One of the Project Management Council membersd undertook a study of several successful and failed projects. The study highlighted 10 areas that were deemed crucial to the product’s success.

Figs. 6.2 and 6.3 show the results of the study. These images were sanitized and were taken from my lectures for a course, The Business of Technology, that I taught for a number of years.


Fig. 6.2 Chart of six projects that failed in the market due to defects in the extended design’s initial project activities [3].

Fig. 6.3 Chart of six projects that were successful in the market. Note how the extended design team carried out almost all the activities not done by the project teams in Fig. 6.2 [3].

The data speaks for itself. The key factor here is the data from the first row of the matrix, “User’s needs understanding.”

While all the successful projects showed that they had attempted to have a good understanding of what their customers needed, only two of the six project teams for the failed projects attempted to do the same thing.

We can’t say for certain that these deficiencies directly led to the marketplace failure of projects A through F in the same way that we can point to a circuit design flaw or a coding error and assign a causal relationship to the defect. The reason is that we are trying to assess blame as a defect, and that is questionable at best. However, continuing the development process with a design flaw that results from questionable decisions in the specification phase is still a flaw. Fortunately, finding the defect is easy. Nobody buys the product. Unfortunately, the defect can only be fixed by replacing the defective product with a better one.

Phase 2: HW/SW Partitioning: Partitioning is the process of deciding what is going to be done in hardware and what will be done in software. This is not always an easy decision. In order to introduce the concept to my students, I often use this example. “How many of you are gamers?” I ask. I’ll get 1 or 2 students in a class of 25 raise their hand. I then ask them how much they paid for their graphics card in their game machine. Typically, it’s in the $500 range.

Then, I ask them, “Why pay so much?” Because they need graphics capability to play games. I then follow up with, “Can’t you play games with a $50 graphics card?” The answer is no because the game would be so slow that it would be unplayable. My last question is what’s the difference? Both cards can play the game, but one card is unacceptably slow compared with the other. The difference is that the faster card uses dedicated hardware to accelerate the game algorithm while the slower card must use software. In a nutshell, that’s what partitioning is all about.

Once the product features are specified, partitioning takes over and is arguably the most important part of the design process because how the embedded system is partitioned will drive all the hardware and software developments to follow.

Of course, there are trade-offs that have to enter into any partitioning decision. Here’s what we discuss in my class on microprocessor system design:

Advantages of a hardware solution

  •  Can be factors of 10 ×, 100 ×, or greater speed increase.
  •  Requires less processor complexity, so overall system is simpler.
  •  Less software design time required.
  •  Unless a hardware bug is catastrophic, workarounds might be doable in software.

Disadvantages of a hardware solution

  •  Additional HW cost to the bill of materials.
  •  Additional circuit complexity (power, board space, RFI).
  •  Potentially large nonrecoverable engineering (NRE) charges (~$100 K +).
  •  Potentially long development cycle (3 months).
  •  Little or no margin for error.
  •  IP royalty charges.
  •  Hardware design tools can be very costly ($50–100 K per seat).

Advantages of a software solution

  •  No additional impact on materials costs, power requirements, circuit complexity.
  •  Bugs are easily dealt with, even in the field.
  •  Software design tools are relatively inexpensive.
  •  Not sensitive to sales volumes.

Disadvantages of a software solution

  •  Relative performance versus hardware is generally very inferior.
  •  Additional algorithmic requirements may force more ○ processing power:
    •  Bigger, faster, processor(s).
    •  More memory.
    •  Bigger power supply.
  •  RTOS may be necessary (royalties).
  •  More uncertainty in software development schedule.
  •  Performance goals may not be achievable in the time available.
  •  Larger software development team adds to development costs.

Of course, modern FPGAs have embedded cores, typically one or more ARM processor cores. In a way, this represents the best of both worlds as a partitioning environment. With the correct software development tools, it is entirely possible to design our embedded system starting from only one development environment and partitioning entirely within that environment.

If we think of an algorithm in a more generalized way, then we can see that you can think of software-only at one end of a scale and hardware-only at the other end of the scale. Along the scale is a slider that enables us to continuously change the partitioning of the design between the two possibilities.

This is possible because custom hardware generally implies an FPGA or custom ASIC. These devices are designed using hardware description languages such as Verilog or VHDL. Given that we are using software to design hardware, it isn’t much of a stretch to make the leap to a single design methodology that incorporates simultaneous software and hardware design.

Phase 3: Iteration and implementation: Nane et al. [4] did a survey and evaluation of high-level synthesis tools (HLS) for FPGA development. High-level synthesis enables a designer to code an algorithm in C or C ++ and the output of the compiler would be Verilog or VHDL. Fig. 6.4 is a reproduction of part of the table. The partitioning aspect of these tools is to enable the designer to code key algorithms without the need to consider how the algorithm would be implemented. Then, the next step is compiling the algorithm using the HSL tools and having the information necessary to make informed decisions about partitioning the design.


Fig. 6.4 Currently available high-level synthesis tools. From R. Nane, V.-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y.T. Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, K. Bertels, A survey and evaluation of FPGA high-level synthesis tools, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(10) (2016) 1591.

For no other reason than I’m familiar with Synopsis CAD tools, I looked into their Synphony C HLS tool. In a white paper (Eddington [5]), Synopsis discusses how the Synphony C compiler supports higher levels of abstraction and blurs the line between hardware and software. These are:

  •  Enables exploration from a single sequential C/C ++ algorithm.
  •  Provides a balance of design abstraction and implementation direction to build efficient hardware.
  •  Easy partitioning of the programmable and nonprogrammable hardware.

Another feature of the Synphony C compiler is the inclusion of a tool called an Architectural Analyzer. The tool enables a user to import unmodified C/C ++ code and then try possible optimizations and see the performance trade-offs that result.

However, these tools are not for everyone. They are complex and costly. It takes a dedicated investment in time and resources to master their use. However, if you can devote the time to purchase, learn, and use them, then defects that might be introduced during the partitioning process can be avoided.

This is critically important if the hardware will be an ASIC because of the high up-front cost of ASIC design and fabrication, but also because of the time involved in manufacturing a custom integrated circuit. For an FPGA-based design, this is not the case because it can be reprogrammed at any point in time. The disadvantages of the FPGA relative to the ASIC are apparent in several areas. Singh discusses these differences in an Internet article [6], and they may be summarized as follows:

  •  The FPGA is reconfigurable while the ASIC is permanent and cannot be changed.
  •  The barrier to entry for an FPGA design is very low while the cost and effort to do an ASIC is very high.
  •  The key advantage of the ASIC is that of cost in high volumes where the NRE and manufacturing costs can be amortized.
  •  The ASIC can be fine-tuned to lower the total power requirements while this is generally not possible with an FPGA.
  •  The ASIC can have a much higher operating frequency than an FPGA because the internal routing paths of the FPGA will limit the operating frequency.
  •  An FPGA can have only limited amounts of analog circuitry on the chip while an ASIC can have complete analog circuitry.
  •  The FPGA is the better choice for products that might need to be field upgraded.
  •  The FPGA is the ideal platform for prototyping and validating a design concept.
  •  The FPGA designer doesn’t need to focus on all the design issues inherent in an ASIC design, so the designer can just focus on getting the functionality correct.

It is also interesting to note that in today’s climate of hacking and cybersecurity, an FPGA can represent a security risk. If a malevolent entity such as a rogue government can gain access to a key part of a country’s telecomm infrastructure, such as a data switch, then changing the FPGA code could easily be more difficult to detect than the same hacking in the software.

Let’s consider debugging in the context of cybersecurity. When an embedded device is hacked, then we have a software or hardware defect in the product. This defect can be considered to be the vulnerability of the device to hacking, or it can be the hack itself. Many of the techniques we would use to find and fix a bug are the same whether you are trying to find a vulnerability in a device or whether the flaw is manifested as a bug in normal operation. So, while the use of an FPGA in released products brings many benefits, there is a class of infrastructure-critical devices that can be compromised in ways that are very hard to detect. Not impossible to detect, but very hard.

As the hardware and software teams begin to start their separate design processes, we can assume that the boundaries between the requirements, as championed by the stakeholders in marketing, the hardware team, and the software team, are still rather fluid. What the customer may need, or claim to need, versus the price point, competition, needed technology, and all the other factors is generally under constant scrutiny and debate. At some point, these issues must be resolved, and the features and boundaries are frozen and agreed upon.

We all are familiar with the term “feature creep.” There is even a Dilbert cartoon with an ogre-like character called the Feature Creep whose sole function is to tell the engineers to add more features whenever they think they’re done.

Feature creep results whenever:

  •  The product definition is weak.
  •  A marketing engineer just got back from visiting a big customer.
  •  The competition just introduced a new product that usurps your proposed new product.

The worst thing to do is panic, decide to put on a bandage and add some more features, then not adjust the project schedule. I recall a presentation by an R&D project manager at an HP Project Management Conferencee on just this phenomenon. The manager’s suggested “best practice” was that, at the start of every status update meeting, he would ask the assembled engineers and other stakeholders if anything has changed in terms of features. If anything has changed, the manager declares that the project is on hold until the issue can be resolved, and a new schedule is generated. At the next meeting, the impact of the requested change is assessed, and the decision is made to add the feature and slip the schedule, or not.

By doing this, the manager forces all the stakeholders to do a reality check. If the feature is worth the risk, or the product would be noncompetitive without revising the specs, then a new schedule is in place and the new features are added to the schedule and to the design. Of course, this also has the effect of unfreezing the partitioning of HW and SW so the ripple effect can be quite significant. The key is that a feature adjustment after the initial partitioning is completed is an ever more serious perturbation to the project schedule and all the stakeholders need to assume ownership for decisions that might impact the development schedule.

Once again, we need to take the broad view of debugging during this phase of the project. Debugging here could involve running performance tests on the processor of interest. This could be accomplished using an evaluation board from the silicon manufacturer and some representative code that would mimic the actual loading on the processor.

The Embedded Microprocessor Benchmark Consortium (EEMBC)f is an organization made up of member companies within specific application disciplines such as automotive, office equipment, avionics, silicon manufacturers, and tool vendors.

According to their web site:

EEMBC benchmark suites are developed by working groups of our members who share an interest in developing clearly defined standards for measuring the performance and energy efficiency of embedded processor implementations, from IoT edge nodes to next-generation advanced driver-assistance systems.

Once developed in a collaborative process, the benchmark suites are used by members to obtain performance measurements of their own devices and by licensees to compare the performance of various processor choices for a given application. Recently developed EEMBC benchmark suites are also used throughout the community of users as an analysis tool that shows the sensitivity of a platform to various design parameters.f

The need for benchmarks that mirrored the actual algorithms that a particular user segment would typically require became a necessity when compiler manufacturers realized that they could improve sales by optimizing their compilers for the most common benchmark in use at the time, the MIPS benchmark. The MIPS benchmark is actually derived from the Dhrystone benchmark and was referenced to the old Digital Equipment Corporation VAX 11/780 minicomputer. The 11/780 could run 1 million instructions per second, or 1 MIPS, and it could execute 1757 loops of the Dhrystoneg benchmark in 1 s. The Dhrystone benchmark was a simple C program that compiled to approximately 2000 lines of assembly code and was independent of any O/S services. If your microprocessor could execute 1757 Dhrystone loops in 1 s, it was a 1 MIPS machine.

As soon as the compiler vendors started to tweak their compilers to optimize for the MIPS benchmark, the acronym changed to:

  • Meaningless indicator of performance for salesmen

The EEMBC consortium was driven by Marcus Levy, the technical editor of EDN magazine. He brought users and vendors together to form the core group. The first “E” in EEMBC originally stood for EDN but has since been dropped from the name of the organization. The first “E” was kept because the name had become so widespread.

The EEMBC benchmarks provide a consistent set of algorithms that can be used for relative performance measurement. Taken in isolation, a processor benchmark may not be very useful because it is subject to the compiler being used, the optimization level, the cache utilization, and whether the evaluation board accurately reflects the processor and memory system of the actual product. These evaluation boards were called “hot boards” because they were designed to run with the fastest clock and lowest latency memory.

Other factors that could negate the results included RTOS issues such as priority level and processor task utilization. However, at the very least, system architects and designers would have a significantly more relevant code suite to use to predict processor performance for a given application.

To demonstrate the effect of the interplay between hardware and software performance, and why the EEMBC benchmarks are so valuable, I show the graph in Fig. 6.5 to my classes [7].


Fig. 6.5 Relative performance of the TMS320C64x DSP processor running the EEMBC Telemark benchmark. Courtesy of EEMBC.

The three columns represent EEMBC scores on the EEMBC Telemark Benchmark. This benchmark is one of a set of benchmarks that make up the TeleBench suite of benchmarks. This suite of benchmarks allows users to [8]:

approximate the performance of processors in modem and related fixed-telecom applications.h

The leftmost column shows the benchmark score for the TMS320C64x DSP processor compiled without any optimization switches enabled, resulting in a benchmark score of 19.5. When various optimization strategies are employed, particularly those that can take advantage of the architecture of the processor, the benchmark score dramatically improves. Specifically, the TMS320C64x has 2 identical groups of 4 functional units and 2 identical banks of 32 32-bit general-purpose registers.

With the compiler able to aggressively take advantage of these architectural features, the benchmark score jumps to 379.1, which is an improvement of more than 19 times. In column 3, the code is hand optimized at the assembly language level, resulting in an improvement in the benchmark score by almost another factor of 2. The total improvement from out of the box to hand-crafted in assembly language is 32 ×.

To put this performance data in context, this is a potential bug that will only become apparent during validation testing, when the system loading begins to stress the processor’s ability to handle it. The scenario might be that deadline failures are observed during testing and the engineers begin the process of debugging the code. However, there is nothing wrong with the algorithms themselves, as the defect lies in the decision of how to compile the code.

It might have been something as simple as the need to turn off optimizations during testing, and the bug was not able to revise the make-file to turn the optimizations back on. That’s certainly happened before.

The key takeaway here is that these performance issues need to be resolved sooner rather than later. In fact, all the compiler issues should be part of the internal specification document that will define the development environment of the project.

Phase 4: Detailed HW/SW Design: This is the phase that everyone is most familiar with because this is the phase where the hardware and software bugs are predominantly introduced into the project. However, I hope that I’ve at least sensitized you to the reality that defects can be introduced far earlier in the process due to poor decisions in processor selection or design partitioning. While clearly not the same as missing trace on a PCB, a project decision having to do with the poor choice of a vendor for a critical part can have the same level of schedule impact as trying to chase down an elusive hardware glitch.

My next favorite defect is the hardware bug workaround. This typically occurs later in the process when hardware and software are brought together for the first time. If the hardware is an FPGA, then it is generally a nonissue. If it is a custom ASIC, then it is a big problem. This is where the “fix it in software” solution is brought into the equation.

Now the advantage of hardware acceleration is lost and more of the burden of maintaining the desired level of performance is put upon the software team because the part of the hardware algorithm that does not work properly must be repaired/replaced/augmented/etc. in software.

In the ideal case, software and hardware are incrementally integrated during the development process using the techniques described in earlier chapters. For example, as software modules are completed, there should be a test scaffolding available to exercise the module. At the lowest level of driver software that must directly manipulate the hardware, the necessity for early integration is most important. You want to catch errors as soon as possible, not have to find and fix the defects when the stakes are much greater and time is of the essence.

The same process of incremental integration is also critical for the hardware team. Again, in the ideal case, the low-level software drivers will be available to exercise the hardware, either for real hardware or through simulation techniques such as coverification or cosimulation. Here, the drivers are used to exercise the hardware while the ASIC is still HDL code.

Tracing defects in this phase is a lot simpler because the number of variables that can be the root cause of the problem is smaller and more manageable. Also, following good design processes such as running simulations and validating designs with formal design reviews are worth their weight in gold, and will filter out many potential defects before they make it to production.

Glitches are the bugs that give us nightmares. A glitch is usually so infrequent that we would be lucky if we detected it during this phase. A software glitch, such as a priority inversion within the RTOS or a stack overflow, might not show up until the system is fully loaded and operating under actual conditions, rather than unit testing. In my opinion, a hardware glitch is even more of a challenge because of the number of possibilities.

My first introduction to hardware glitches occurred while I was a graduate student. We were using a high-voltage pulse circuit that used a mercury-wetted relay to generate a 0–5 kV pulse with a 1 ns rise time. Located in the same room as the pulse generator was our minicomputer (I hope this doesn’t date me too badly) that controlled the experiment. A bundle of cables exited the minicomputer rack and encircled the lab, going to various sensors and detectors. Everything worked properly until we fired the pulse generator while the minicomputer was logging data from some of the more remote sensors. Then the program would crash.

Because graduate students have an infinite amount of time for their thesis and with no endpoint in sight, I set about trying to figure out what was going on. I’ll spare you the details of how I eventually found the source of the problem and cut right to the chase. The source of the problem was the radiated energy from the pulse generator whenever the pulse amplitude got above about 2 kV. This was picked up by the ground shields on the various cables, and they were transmitting enough energy back to the minicomputer power supply that we could see a ground bounce of several volts.

This is the part where I learned about optical isolation. I rebuilt the data acquisition hardware to isolate the data logging and signal conditioning from the digital I/O to the computer. The minicomputer ground was then isolated in the equipment rack. Problem solved.

Detecting glitches is what keeps oscilloscope and logic analyzer manufacturers in business.i Manufacturers such as Tektronix and Keysight provide a wealth of application and instructional data for students and engineers alike. While researching glitches, I came upon an interesting Keysight videoj that discusses how to use the fast Fourier transform (FFT) capability built into their oscilloscopes to detect potential sources of glitches. I thought this was rather clever, so I watched the video and learned that in modern high-speed digital systems, crosstalk between components is becoming an ever more common problem. The video showed a clock signal with some high-frequency noise riding on it. I would have normally dismissed this as ground bounce, but when the FFT was enabled, it showed that the high-frequency noise was at 19 MHz and was due to crosstalk in the circuit. Crosstalk can lower noise thresholds and make infrequent glitches more likely. My approach would likely have been to stay in the time domain and try to set up the oscilloscope for a single capture glitch detection trace.

When a glitch occurs inside an FPGA, then detecting the glitch becomes more challenging. One method is to incorporate a glitch detection circuit within the FPGA along with the hardware algorithm being implemented [9]. The article points out that simulation can only go so far because whether a glitch will or will not occur in a particular routing of the FPGA may also be dependent upon the FPGA implementation. Also, assertions within Verilog are common ways to detect glitches if they occur.

A digital glitch detector circuit is described in a US patent for a logic analyzer [10] that appears easy to implement in an HDL and can detect either a positive or negative going glitch within a clock cycle. Thus, during testing of your FPGA circuit, this glitch detector can be added to the FPGA with very little extra resources required and provide a registered output signal that a glitch has occurred on the data. This type of detector can easily be added to the design and removed once the design is verified.

Suffice it to say that when you suspect that you have a glitch in the circuit, either because you see it (rarely) or you see its effect upon the circuit (most likely), the best approach is to start taking copious notes and head for the Internet. Students and experienced engineers alike can benefit from the infinite resource that is the Internet. A few hours of directed research can provide a wealth of information and insights into what could be causing the glitch and techniques for finding it.

The takeaway from this part of the embedded system development life cycle is that integration of hardware, software, test software, turn-on software, and anything else that will be needed when the final hardware and software are brought together should not occur in one Big Bang. As much as possible, it should be an incremental process with notetaking along the way so that when something goes awry, and we know that it will, you can trace back to what you know about the problem. While engineers are loath to document, and I freely admit to that rap, keeping a paper trail will ultimately save time in the end.

Perhaps it takes more time on the current project, but if you get in the habit and make ongoing documentation part of your process, then Murphy’s Law guarantees that you will need it in the future.

Phase 5: HW/SW Integration: HW/SW integration was the traditional debugging phase. This is where tool vendors like me focused our product offerings and debugging solutions. In the classic model, a model that sold a lot of in-circuit emulators and logic analyzers, untested software meets untested hardware and may the best team win. The classic model also describes how the hardware team throws the embedded system “over the wall” to the software team and as soon as they hear the board hit the floor, they move on to the next project and are no longer available to the software team. The imagery is powerful, and it makes a good story, especially in the slide deck of a good sales engineer.

I think we’re better at it today then we were 20 years ago, but I don’t have any real data to back me up there. I think we’ve learned a lot about effective development processes, and, as I’ve pointed out, there are tools today that enable HW/SW integration to occur earlier in the embedded life cycle and in a more incremental fashion. Let’s look at the HW/SW integration problem in some depth and see how debugging fits into it.

For this discussion, we’ll assume that the only software of interest are the low-level drivers and the board support package (BSP) that an RTOS vendor, or the development team, must create if an RTOS will be used. The key challenge is reducing the number of variables that are in play when untested hardware meets untested software. Therefore, the debugging strategy should be based on eliminating as many of these key variables as possible so that the rest of the system integration becomes more tractable.

Step #1: Processor to memory interface: Whether the memory system is internal, external, or mixed, the interface between the processor core and the memory must be stable, or this is as far as you get. Memory decoding must be properly configured to identify memory regions, wait states, timing margins measured and recorded, etc. If the memory is static RAM, this process is relatively straightforward. For dynamic RAM, the challenges are greater.

Traditionally, this was the real power of the in-circuit emulator (ICE) because the ICE could still function even if the target system memory interface was not functioning properly. The reason is the ICE processor could run out of local memory, so a test program could run on the ICE and perform read and write tests to memory on the target board. This way, you could write a tight loop to read and write to various memory locations and observe the signal fidelity and timing margin using an oscilloscope.

Perhaps its unthinkable to you to imagine that the hardware team would turn over a board in such an untested state to the software team, so perhaps I’m being overly dramatic here. In any case, the memory interface is one of the first tests that needs to be performed.

Step #2: Programming the hardware registers: Properly initializing the hardware registers, whether on the processor microcontroller or the custom devices, can be a black hole for time. There is a semiconductor company that will remain nameless that would double or triple the functionality of the on-chip register set. Deciphering exactly how to properly initialize the registers from the hardware manual can drive an engineer to tears. This is especially true of a new chip, where one of the variables is the typo density of the user’s manual.

One of the first companies to address this problem was Aisys. I crossed paths with them when I was responsible for third-party development tool support for AMD embedded processors. To the best of my knowledge, Aisys is no longer in business but their premier product, Driveway, was a software tool that would automatically create the driver code for popular microcontrollers at the time based upon a graphical and table-driven input specification.

Driveway was very expensive (more than $20 K per seat), but their value proposition was the time they saved the product development cycle, reducing the time it took to write and debug the driver software from months to days or weeks.

Similar products are available (for free) today. For example, the Peripheral Driver Generator from Renesas Electronics is a free download that can be used to generate driver code. To quote Renesas [11]:

The Peripheral Driver Generator is a utility that assists a product developer in creating various built-in peripheral I/O drivers of a microcomputer and the routines (functions) to initialize those drivers by eliminating the developer having to do manual coding. All the necessary source codes are prepared by the Peripheral Driver Generator according to user settings, so that the development time and development cost can be greatly reduced.

Other semiconductor companies have similar offerings. NXP semiconductors offers SPIGen, a free SPI bus code generator that can adapt to a wide variety of SPI protocol specifications [12].

These tools take much of the headache out of creating the initialization and driver code for a wide array of embedded microcontrollers. Another factor that should not be underestimated is the sheer volume of code examples of every possible description available on the Internet. I think the most significant thing my students learn from taking my microprocessor class is how to find code and application examples online. They can’t believe that I actually encourage them to use the code they find on the web, just so long as they cite their sources and give credit for assistance they receive.

Getting the driver code just right is one of the more challenging aspects of the HW/SW integration process. Just having one bit set wrong in one of the configuration registers can prevent the system from functioning, leaving you to wonder if you have a hardware or software fault to deal with. Actually, you’re correct, because it is a hardware fault and a software fault. For example, the NXP ColdFire MFC5206e microcontroller has 108 peripheral registers, controlling all the I/O and memory bus functionality of the device [12]. A one-bit typo in one of the subfields of a register will easily cause the memory processor interface to fail. Fortunately, this microcontroller has extensive on-chip debug support (see next chapter), enabling the designers to debug the processor without a functional memory system. Using the on-chip debug resources, the memory-mapped peripheral control registers can be read and modified.

With on-chip control of the CPU through the debug resources, it is a straightforward process to load the turn-on and test code necessary to check the external memory interface, measure signal fidelity and bus timing, and generally verify that the hardware is ready to start taking the rest of the driver software, followed by the application software. If an RTOS is going to be used, then the board support package (BSP) would be installed at this point. The BSP drivers might make use of, or replace, the low-level drivers that you would need to create when not using an RTOS.

The key to success in the HW/SW integration phase is to proceed in deliberate steps, keeping notes along the way. Start from the most basic assumption that nothing works, other than the fact that the board probably won’t catch fire when powered up (although this might be an erroneous assumption) and then start testing from the most basic to the more complex, always keeping notes and having a checklist of what tests to run next. Remember, your task is to reduce the number of possible variables that can be causing bugs to some manageable number. This is the key challenge of embedded systems with initial turn on of hardware and software.

Again, in the ideal case, the hardware team has already tested the hardware to the extent that they feel confident that the system is ready for the application software. If there are remaining hardware bugs, they’ll be the corner cases that weren’t tested for or missed communications between the teams, leading to errors in the device drivers. Also, marginal timing issues won’t surface here because the hardware is being tested in a room temperature environment. It won’t be until the system is subjected to thermal, humidity, and mechanical stresses that other hardware weaknesses become visible. Also, it won’t be until the next phase when the system validation tests are being run that radio frequency (RF) testing will uncover out-of-compliance RF emissions and possible random errors due to crosstalk. Lucky you.

Phase 6: Acceptance testing and validation: Full disclosure. I hated this phase of the development cycle. I couldn’t wait for it to be over. Environmental testing was the worst. We called it “shake and bake.” The product was put on a vibration table and vibrations were introduced until resonance was hit. A strobe lamp was synchronized to the vibration table’s frequency so you could see the components being distorted. I hated it. It was like watching my child being tortured.

Then came temperature and humidity cycling. This played havoc with any high-voltage circuitryk and we could hear the arcing in the chassis. It sounded like someone was cracking a whip. These tests were internal to HP and represented our validation of the robustness of the design.

I once asked the HP compliance engineer why they took the temperature up to 100°C because no lab instrument would ever get that hot. The reason was that was the benchmark temperature for the interior of the trunk of a dark blue car in Phoenix in the summer. The HP instrument didn’t have to run at that temperature, only survive the heat exposure sitting in the truck of a field sales engineer’s car. It was a good thing that we tested it because the plastic front grill on our ICE unit sagged from the heat and we had to use a different formulation for the plastic. Ditto for the cold temperature cycle, only this is Alaska in the winter.

The other series of tests were necessary for compliance with standards agencies such as the FCC and UL, and for the compliance agencies in Europe and Asia. For example, in Germany it is the TUV Rheinland that is responsible for electromagnetic compliance testing. Germany is an interesting situation because large factories would be located in small towns with residential housing butted right up to the building. RF interference could easily override broadcast TV and radio. TUV cars with rotating antennas would drive through the town and measure the RF emissions as they drove. If they detected RF amplitudes over the legal limit, a factory could be shut down.

While this really isn’t the appropriate section to discuss RF issues, it probably is as good as any and even though this book is ostensibly about debugging, I would like to include this section about some best practices for RF design that I learned over the years. This is not intended to be a complete treatise on RF design techniques, but just a few thoughts that are easy to swallow and be sensitized to. Also, I already had a few slides on it because I teach these in my microprocessor design class, so they are easy to include.

Earlier, I talked about timing margins, but I didn’t talk much about clock speed. Both are relevant here because they are tied up with RF issues. In embedded design, the general rule is to run the clock as slowly as possible and still accomplish the task at hand. In a PC, clock speed is a marketing tool, the faster the better. Overclocking, anyone?

The slower the clock, the less power is consumed. This is due to the CMOS technology that is used in modern microdevices. Also, slower parts are less costly than faster parts. With respect to RF, there are two related effects. The higher the clock speed, the greater the energy that is in the harmonics of the waveform. For a good square wave clock, it is easy to have energy out to the fifth harmonic.

The rising and falling edges also have an effect upon RF for other logic besides the clock. If you are using medium-speed devices, and you come up on timing issues where the worst-case propagation delay in one part violates the minimum set-up time requirement of another part, it might be tempting to just replace the offending part with a faster part. The faster part will generally have a short rise and fall time, and a faster rise time means more harmonics.

An article in EDN magazine [13] provides this simple rule of thumb that relates the rise time of a square wave to the effective bandwidth of the signal.

  • The bandwidth, in GHz = 0.35/Rise Time, in nanoseconds.

The article goes on to say,

Bandwidth is the highest sine wave frequency component that is significant in a signal. Because of the vagueness of the term “significant,” unless detailed qualifiers are added, the concept of bandwidth is only approximate.

Bandwidth is a figure of merit of a signal to give us a rough feel for the highest sine wave frequency component that might be in the signal. This would help guide us to identify the bandwidth of a measurement instrument needed to measure it, or the bandwidth of an interconnect needed to transport.

From the RF perspective, bandwidth tells us about the RF frequencies that we will have to deal with and manage.

Therefore, switch to the fast logic with the knowledge that you are tempting fate, or at least Murphy’s Law. Anyway, here is a list, in no particular order of precedence, of some of my general rules for good RF design practices:

  •  Use spread-spectrum clock oscillators to spread the RF energy out over a range of frequencies [14].
  •  Avoid current loops.
  •  Shield the clock lines on inner layers of the PC board, or run parallel guard traces.
  •  Avoid long clock lines.
  •  Avoid long bus runs on a board.
  •  Avoid using logic with fast logic edges: the ALS family is preferable to the FCT family if propagation delays are acceptable.
  •  Use RF suppression (ferrite) cores on cables.
  •  Shield locally, rather than the entire chassis.
  •  Run at the slowest acceptable clock speed.
  •  Terminate long traces in their characteristic impedance.

A few comments are in order here. It is generally much more cost effective to shield signals at the source rather than to have to come back after the fact and figure out how to shield an entire chassis. How long should a trace be before you need to add termination? We discussed this a bit in an earlier chapter, but it might surprise you to learn how long is “a long trace.” For a signal with a rise time of 500 ps, the longest unterminated trace should be less than approximately 1.67 in. [15]. Therefore, unterminated traces reduce the noise immunity and generate crosstalk and RF energy that needs to be suppressed.

Most of the time we ignore the terminating signal in digital systems because we have a fairly wider noise margin than analog folks. However, depending upon the inherent advantages of digital systems to overcome poor electronic design practices is just asking for trouble.

What happens when we discover a hardware defect at this stage of the process? PCB issues are relatively straightforward to deal with. You fix the bug and manufacture new boards. Sometimes, if the fix is small enough, you do some rework of the board. The manufacturing folks really dislike this solution, but when the time crunch hits, this may be the only solution. Various companies had various policies about reworking PC boards. One of my former employers had the “five green wires” rule. More than five pieces of rework and you did a new PC board. Of course, we were talking about small volumes, less than 100 units per month. I wouldn’t expect this rule to be too popular at mainstream electronics manufacturers.

If the hardware bug was in an FPGA, then fixing the defect is usually no more difficult than fixing a software bug. If the bug is in an ASIC, then fixing the bug could be much more involved. This is when the entire design could possibly unravel because the first thought will be, “Ok, just fix it in software. Do a workaround.” But the very reason we have hardware is to accelerate the algorithm and reduce the demands on the processor.

This is going to take us right back to partitioning decisions. If our system design strategy is to depend upon custom hardware to do the heavy lifting and the microprocessor to do the communications and housekeeping, then a flaw in the ASIC may be impossible to fix in software without greatly crippling overall system performance. Conversely, if our design strategy is to wring the last little bit of performance out of the software, to the extent that we are hand-crafting the compiler output in assembly language, then we would be less likely to have to deal with a hardware fix because the hardware is not the critical part of the equation.

This is also the phase where we are stress testing the product and if it will be in a mission-critical application, testing it to the proper compliance levels that the certifying agencies require (FAA, FDA). For example, one of the most well-known requirements documents is DO-178C, Software Considerations in Airborne Systems and Equipment Certification [16].

Applied Microsystems Corporation, AMC, developed a software analysis tool called CodeTEST that contained built-in test suites and report generators for certifying software to the requirements of the certifying agencies, DO-178Bl being one example.

I was responsible for the CodeTEST product line for a while during a 4-year stint at AMC. I left AMC to go into academia at the University of Washington just months before the company shut its doors and dissolved. CodeTEST was sold to Metrowerks, the software tools company. Metrowerks was then acquired by Motorola, who then spun off the semiconductor business to Freescale, which later somehow became NXP. Whew! The CodeTEST product line got lost in the shuffle around 2003–04.

CodeTEST was a combination hardware and software profiling tool. The software was used to preprocess the software for profiling, then the hardware tool was used to collect the profiling data in real time and with very minimal code intrusion. Then, the software took over again, postprocessed the data, and put it in the proper format for analysis or for certification.

The preprocessing involved placing “tags” at various locations in the code, such as function entry and exit points or program branches. These tags were simple “data writes” to a specific memory location or block of memory that was allocated to the CodeTEST hardware tool. The value of the data and the memory address provided the required information about where the tag came from. All the tags were time-stamped, and the data were buffered in the CodeTEST hardware and then sent to the host computer in burst packets when possible.

If you are having a case of déjà vu, don’t fret. I have discussed this technology twice before in this book. I mentioned it as a performance measurement technique using logic analyzers and as an accessory tool for HP emulators. CodeTEST differed because it attempted to supply a total solution, rather than pieces, the way the other products did. The preprocessing was transparent to the software developers because the magic occurred in the “makefile” where all compilation and linking was invoked to build the software image. The CodeTEST preprocessor was invoked here to add the appropriate tags to the source code before compilation took place.

With respect to compliance testing, the biggest advantage of CodeTEST was its ability to show how well the validation software testing was actually meeting the requirements for code certification. Code coverage was one of the most difficult requirements to meet because there are so many possible code paths in a program of any reasonable complexity. In fact, there have been various statistical calculations that show the number of distinct paths through a program is greater than the number of stars in the known universe. This sobering fact makes it a real challenge to design test software that will prove to the FAA that there are no dead spots lurking in the code that have never been tested and will pop up at the most inopportune times.m

The HP 67000 family had an extra bit in its emulation memory that was put there for code coverage measurements. Every time that memory location was accessed (hit), the bit was set. The number of set bits could be counted, and we would be able to easily figure out how well the test code was actually testing the product. We actually used this ourselves to test our emulators and it was surprising how low our coverage numbers were when we first started testing.

As I recall (please don’t quote me on this), our requirement was 85% coverage and initial tests usually resulted in 40% coverage.

In addition to using automated tools, we depended upon various forms of black box and gray box tests. “Abuse testing,” as it was known, was something every engineer had to do and was written into our schedules. We typically were flagged to test someone else’s product, not our own, because we knew where the warts were. The idea was simple: break the code so the product would freeze up or do something wrong. When a bug occurred through abuse testing, it was categorized at various levels. The highest levels were “critical” and “serious.” When those bugs occurred, the testing stopped, and the bug report was sent to the designer to be fixed. This reset the clock and testing started all over again from the beginning. In order to release the product, there had to be no serious or critical defects in some number of hours (for the sake of argument, 10 hours).

The abuse testing was augmented by keystroke recording so that the engineer didn’t have to restart testing by hand. The keystrokes were played back to the point of the original failure and the engineer took off from there. My particular favorite was the “falling asleep at the keyboard” test where I would put my head down on the keyboard and just let keys autorepeat for 5 or 10 min while I took a quick nap.

Phase 7: Product release, maintenance, and upgrade: The product has been signed off for release, marketing and sales are wound up and ready to go. All the swag has been purchased with the company logo, ready for the next conference and now, the most important person in the company takes over. “Who is that” you ask? According to this consultant at a seminar I attended, the most important person in the company is the hourly employee in the shipping department or on the loading dock who puts the boxes in the delivery van on the first step in the new product’s journey to your customer.

Now begins the real defect testing. Everything up to now has been sterile and controlled, but now, the masses will take over and they will do things that were never imagined by the designers. Today, with FLASH memory holding the operational code, bug fixing is simply a software download. I think we’re all familiar with this process. I also suspect that our ability to fix bugs in the field has mitigated the need to find and fix all the bugs in the factory. Has this made us sloppier? I don’t know. However, it has made upgrading easier. Remember all those features that we wanted to add during the specification phase but were ruled out? With customer feedback and social media, we have our market research just one click away to tell us what we need to fix and add to the product. No more focus groups needed. Of course, this is a gross oversimplification, but modern technology has certainly changed how we deliver “the whole product” to the customers. Typos in user manuals are easily fixed, and the PDF files of the manuals are updated on the web site. Paper manuals shipped with the product are a thing of the past.

Wrapping up this chapter, I think the key message here is that the process of integrating embedded software with the hardware should be an incremental process, rather than a major event at the back end of the development cycle. There are tools and processes available today that make incremental integration a straightforward process.

While I’m writing this chapter as if the reader is the design engineer, my real target is the students who will be entering the field in a year or less. Real engineering is not accomplished by pulling a few all-nighters before the project is due. Well, maybe sometimes we need to pull an all-nighter, but in any sane organization, that is the exception, not the rule. For the student who is out there interviewing for her first EE job, what is going to sell you to the company and make you stand above the other job seekers is how well you can present yourself as ready to step in and be productive from the first day in the R&D lab.

Imagine that you are being interviewed by an R&D manager and you are asked to discuss your senior project. Your response,

  • Well, my group was tasked with designing an autoranging LCR meter. We started by surveying the existing products in the market and researching the available technologies. We next flushed out the feature set we wanted to have and that we thought we could achieve in the time we had available.
  • Our next task was to partition the design into hardware and software and map out the major functional blocks and interfaces between these blocks. A big part of this initial design was to choose the right processor and software tools as well as search for as many design examples as possible. I also visited a local engineering company and showed them our front-panel mock-ups and got their feedback.
  • Once we were satisfied with our starting point, we developed the specification documentation, the test plan, the validation plan, and the initial project schedule. As we begin developing the hardware and software, we had to make some changes to our specs, which then impacted the partitioning and the schedule, but we froze the design shortly after that.
  • Our team did periodic code inspections, trading off with other teams, and we had a formal hardware design review before releasing the board to fabrication. We also wrote test software to simulate the hardware and the software was exercised against the hardware APIs until we had real hardware to test on.
  • Our LCR meter worked as designed. We came within 3 days of our scheduled completion date. The PCB required one patch, due to a typo in the data sheet for the LCD display we bought. Oh, by the way, here it is. We did a three-dimensional printing of the case. It will run for a year on three AAA batteries.