CHAPTER 21. Moving beyond 8-bit – Designing Embedded Systems with PIC Microcontrollers, 2nd Edition

CHAPTER 21. Moving beyond 8-bit

a survey of larger PIC microcontrollers
For many years Microchip Technology was the pillar of the 8-bit embedded world. While all other microcontroller manufacturers were moving up to 16- or 32-bit, Microchip made a virtue of staying loyal to the smaller size. They explored the minimalist microcontroller in the 10 and 12 Series, and developed something fast and sophisticated in the 18 Series.
But 8-bit is, well, just 8-bit, and once you get past single-bit variables like switches or LEDs, or low precision numbers or simple calculations, then you do feel the need for more bits. So Microchip took the plunge, with truly impressive speed offering us both 16- and 32-bit microcontrollers, and entered the world of Digital Signal Processing as well.
The aim of this chapter is for you to get an overview of the wealth of possibilities that are on offer with the larger PIC microcontrollers. In a single chapter it is impossible to go into anything like the detail that we did with the 8-bit devices, so the emphasis is on overview. There is, however, more attention given to the 16-bit devices, as these represent the most direct upgrade path from the 18 Series.
At the end of this chapter you will have gained an overview understanding of:
• The 16-bit PIC microcontrollers.
• The dsPIC digital signal controllers.
• The 32-bit PIC microcontrollers.
It is worth mentioning that in moving to these larger microcontrollers we take a major step forward in the computer science which is applied; the 32-bit core, for example, is very sophisticated indeed. Therefore we will touch on, sometimes only in passing, a number of new and clever computer concepts. In so doing, you may wish to do some background reading about these topics, by accessing one of the several good texts on computer design. Reference 21.1 is a classic in the field, and is suggested as a reading option.

21.1. The main idea – why we need more than 8-bit

Table A6.4 shows the memory sizes required to represent C variables. It is only char that is single-byte, and this of course only gives an integer range of 0 to 255, or −128 to +127. Most numbers we use will need a bigger range than this, and many will need something much bigger. Section 11.5.1 introduces briefly the idea of floating-point arithmetic, essential for representing wide-ranging fractional numbers. This requires 4 bytes to represent a single number, and all calculations will be with multi-byte numbers. Attempting such calculations on an 8-bit machine is of course possible, but it involves the stitching together of numerous instructions. This in turn results in slow calculations, and further delays as large numbers are sent over limited data buses. With the requirements for more sophisticated data manipulation in embedded systems, coupled with the ready availability of complex semiconductors, the move to 16- or 32-bit can be an attractive one. A progression route is now available from Microchip. At the time of writing they offer 16-bit microcontrollers with the prefix PIC24-, 16-bit ‘Digital Signal Controllers’ with the prefix dsPIC-, and 32-bit microcontrollers with the prefix PIC32-. We start by looking at the 16-bit devices.

21.2. A 16-bit PIC overview

There are four closely-related 16-bit families of microcontrollers offered by Microchip. These are summarised in Table 21.1. There are two microcontrollers, the PIC24F and the PIC24H families, and two Digital Signal Controllers, the dsPIC30 and dsPIC33 families. The table shows that these are distinguished by, among other things, operating speed and power-supply voltage. Just one of the families, the dsPIC30F, has a traditional 5 V supply; all others drop to a 3.6 V maximum. The ability to migrate easily between devices is a big advantage of the Microchip portfolio. Therefore it is encouraging to know that features of the 8-bit microcontrollers, notably the 18F Series, can readily be found in these 16-bit devices. Both the PIC24F and PIC24H microcontrollers form natural progression routes from the PIC 18 Series.
TABLE 21.1 Comparison of 16-bit PIC microcontroller characteristics
DMA, direct memory access; DSP, digital signal processing; MIPS, million instructions per second.
16-bit PIC familyShared featuresDistinctive featuresMemory
PIC24FSame core instruction set, same peripheral set, flash program memory, same development tools,Low cost, low power, 16 MIPS at 3.3 V, 2.0 V to 3.6 V operation, Packages from 28 to 100 pins.To 256K program, to 16K data.
PIC24Huniversal bit manipulation, single-cycle multiply, 32/16 and 16/16 divide support, optimised for C language,40 MIPS at 3.3 V, DMA, dual-port RAM, 3.0 V to 3.6 V operation, Packages from 18 to 100 pins, compatible pin-outs with PIC24F.To 256K program, to 16K data.
dsPIC30FnanoWatt technology.30 MIPS at 3.3 V, 2.5 V to 5.5 V operation, Packages from 18 to 80 pins.‘DSP engine’ added to PIC24 CPU, with DSP instructions added to instruction set.To 144K program, to 8K data,data EEPROM.
dsPIC33F40 MIPS at 3.3 V, 3.0 V to 3.6 V operation, Packages from 18 to 100 pins, compatible pin-outs with dsPIC30F.To 256K program, to 30K data.
The labelling convention of the 24 Series is shown in Figure 21.1. There is rather more useful information embedded here, compared to an 8-bit device, but it does make for a long name. The device illustrated has a program memory size of 64 Kbytes and a pin count of 44. A 28-pin version, the PIC24FJ64GA002, is also available. We take this very device as an opening example, to gain an overview of the family, in the next few pages. See Ref. 21.2 for the full data sheet; Ref. 21.3 is useful for those migrating from the 18 Series.
Figure 21.1
Coding of 16-bit PIC microcontrollers

21.3. The PIC24F family

The block diagram of the PIC24FJ64GA004, our example device, is shown in Figure 21.2. While the complexity is not to be ignored, it is reassuring to see a structure not entirely dissimilar from Figure 7.2 and Figure 13.2. One can even dare to look back toFigure 1.13 and see some similarities with that tiny device. Peripherals lie across the bottom of the diagram and up the right-hand side, all linked by the 16-bit data bus. The program memory lies towards the top left, with its Program Counter a little above it to the right; this forms a 23-bit address. An alternative address can be derived from the ‘PSV and Table Data Access' block to its left. Data memory lies top right. Its 16-bit address is formed from Read and Write Address Generation Units (AGUs). The CPU, made up of ALU, register array, multiplier and divide support, is seen middle-right. Finally, oscillator and power management functions are placed middle left. Let's turn first to the CPU to get some more detail.
Figure 21.2
Block diagram of the PIC24FJ64GA004 (supplementary labels in shaded boxes added by the author)

21.3.1. The CPU

The 16-bit nature of the microcontroller is ultimately defined by the 16-bit ALU and data bus, seen in the block diagram. In this we see that instead of just one Working (W) register, there are now sixteen, all of 16-bit. An alternative view of the CPU is given by the Programmer's Model, in Figure 21.3. This represents the registers that the programmer works with the most. All W registers can act as both address and data, and it is up to the instruction to select which one is used. Some registers have important secondary functions, for example as multiplier or divider operands, or Stack Pointer. The Status register also appears in the figure. Most of its bits will be familiar from their 16 or 18 Series equivalents, as seen for example in Figure 13.3. Those that are new are mentioned later in this chapter.
Figure 21.3
Programmer's model
The PIC24F instruction word is 24 bits, as Figure 21.2 shows. It operates a pipelined instruction flow similar to Figure 2.8. Only two clock oscillator cycles per instruction are required, however. This makes it twice as fast, for a given clock speed, when compared with any of the 8-bit devices. Instructions generally execute in a single cycle, except those which cause branching or certain Table and Move instructions.
Any advanced CPU needs to offer arithmetic operations beyond simple addition and subtraction. The 18 Series microcontrollers do this with an 8-bit × 8-bit multiplier. In the PIC24 we see a 17-bit × 17-bit hardware multiplier and support for divide operations, though not actually a hardware divider. Sixteen-bit numbers entering the multiplier may be in signed or unsigned format. They are extended to 17 bits, allowing a uniform multiplication process (the detail of which is beyond the scope of this book; Ref. 21.1 is useful in this area). Whatever the operand format, the result is always within 32 bits and is stored back in the register array.
Turning to division, in software this is done with one of several possible looping algorithms, variants of which are nicely described in Ref. 21.1. Division of 32-bit or 16-bit numbers is possible, in either case divided by a 16-bit number (the divisor). The PIC24 applies an algorithm which requires one cycle per bit of divisor. The CPU provides support for implementing this algorithm, by providing the RCOUNT register seen in Figure 21.3. This works in conjunction with the RA bit in the Status register. Divide instructions are provided in the instruction set. They must be placed in a loop, with the RCOUNT register acting as the loop counter. Effectively the instruction replicates one loop of the divide algorithm. Interestingly, an attempt to divide by zero, which would cause a theoretical result of infinity, causes an arithmetic Trap to occur, as described in Section 21.3.3.

21.3.2. Memory

The program memory map of the PIC24FJ64GA004 is shown in Figure 21.4 (a). Each memory location is 16 bits, yet instruction words are 24 bits. Each 24-bit instruction is therefore placed across two memory locations, with the upper byte of the upper word being unimplemented. The Program Counter therefore increments by two as every instruction executes. The lower word of each instruction has an even address, while the upper has an odd. This 16-bit organisation helps to keep program memory compatible with data memory, for example when accessing data from program memory.
Figure 21.4
(a) Program memory, PIC24FJ64GA004. (b) Interrupt vector table
As with earlier PIC microcontrollers, the reset vector is placed at location 00. Here, because the memory immediately following is made up of interrupt vectors, the user must program a goto instruction, directing program execution to the start of the program. However goto is a two-word instruction, with the first word being the instruction code and the second the target address. Hence it occupies four memory locations. The first two memory locations are used for the instruction itself and the actual reset address is placed at location 02, as shown. The reset vector is followed by a whole table of interrupt vectors, as seen in Figure 21.4 (b). These are described in the section which follows.
Finally, how do the 64 Kbytes of program memory embedded in the microcontroller number (Figure 21.1) relate to the addresses shown in the memory map? The address range indicated, up to 0ABFEH, translates to 44030D words, or 88060D bytes. Bearing in mind that one byte of every alternate word is not implemented, this translates approximately to the figure of 64 Kbytes that we are expecting.
The data memory map is shown in Figure 21.5. It is made up of 16-bit words, but each byte has its own address. This allows some compatibility with 8-bit devices. The data memory address is 16 bits wide, meaning 216 (64K) bytes can be addressed, or 215 (32K) words. The diagram shows that the less significant byte of any word has an even address, while the more significant has an odd. Address information can come from several sources, including from within an instruction or from a W register. The address is finalised in Address Generation Units (AGUs), seen in Figure 21.2, with one for Read and one for Write.
Figure 21.5
Data memory (supplementary labels in shaded boxes added by the author)
Looking at the general layout of the memory map, we can see that Special Function Registers are placed in the lowest two Kbytes. General purpose memory is placed above this, up to memory location 27FFH. This gives eight Kbytes of data memory.
Unlike the 8-bit PIC microcontrollers, there is no hardware stack. Instead, a software stack is implemented. This is placed in data memory, starting at 0800H (i.e. just above SFR space) and in address value growing upwards. Figure 21.3 shows that W15 is the Stack Pointer, in addition to its possible use as a W register. The upper stack limit is user-defined and is held in the SPLIM register, also seen in Figure 21.3. Stack errors occur if a push or pop instruction forces the stack to go beyond the limits defined by 0800H and SPLIM. The following section explains how this is flagged.
It can be seen from the data memory map that the whole upper half of the memory address range is reserved for a feature called ‘Program Space Visibility’ (PSV). In this interesting option, the programmer may select any 16 Kword block of program memory to be read through the PSV area. This allows a useful mechanism for transferring data from program memory to data memory, always a challenge in the Harvard structure. It is only the lower 16 bits of any instruction which are mapped across. This is enabled by setting the PSV bit in the CORCON register, seen at the bottom of Figure 21.3. The block of program memory is selected through the PSVPAG register (Program Space Visibility Page Address), again seen in Figure 21.3. This effectively acts as the upper eight bits of the program memory address.

21.3.3. Traps and interrupts

The interrupt structure is a sophisticated one, with numerous interrupts and priority levels. It comprehensively addresses a potential weak point in the 8-bit PIC microcontrollers, which is that the 16 Series only has one interrupt vector and the 18 Series only two. We have already seen the Interrupt Vector Table (IVT) in Figure 21.4 (b), with its place in program memory shown in the neighbouring diagram. The Table contains 118 possible vectors for interrupts and 8 for Traps. To explain this new terminology: a ‘Trap’ is a non-maskable interrupt source designed to detect hardware or software problems, for example oscillator failure or stack error. Figure 21.4 (b) shows the four Traps implemented in this processor. Of these, the possibilities of stack error and arithmetic error (in division) have already been mentioned. Trap Service Routines (TSRs) are written in a similar way to ISRs.
Every interrupt source is allocated a unique and fixed vector in the Table. Our example processor has 39 interrupt sources, so most of the 118 possible vectors are not used. Figure 21.4 (b) shows vectors for External Interrupt 0 and Input Capture 1 only. The other vectors appearing in the figure are actually unallocated for this device. Any vector that is to be used must be programmed with the 24-bit start address of the relevant ISR or TSR.
Interrupt prioritisation for each source is applied by setting a 3-bit value within the relevant SFR. There are seven priority levels; level seven is the highest and one the lowest. More than one interrupt can be assigned to the same priority level. If two interrupts of the same assigned priority occur simultaneously, it is the position of the vector in the vector table which determines priority. In this case lower-value vector addresses claim higher priority. By this reckoning, External Interrupt 0 has the highest priority. This ranking cannot be changed, but is only applied to arbitrate between two simultaneously occurring interrupts whose priority has been set the same. Priority levels for Traps are determined in hardware and fixed; there is only one Trap per level. While interrupts occupy priority levels one to seven, Traps occupy levels eight to 15. For example, the Stack Error Trap is level 12 and the Oscillator Failure Trap is level 14. Thus all Traps have priority over all interrupts.
Nested interrupts are by default allowed, although this can be disabled. If enabled, any interrupt can be interrupted by another with higher user-assigned priority. When an ISR or TSR is being executed, its priority level is indicated by bits IPL<2:0> in the Status register and bit IPL3 in the CORCON register, as seen in Figure 21.3. With priority levels of eight and higher, we can deduce that when IPL bit 3 is set, a TSR is in progress. If an interrupt occurs whose assigned priority level is higher than that currently displayed in the IPL bits, then it will be actioned and a nested interrupt will occur. The user can also write to these IPL bits, setting a level below which interrupts cannot occur. Hence all user interrupts are disabled if IPL<2:0> are set to 111. On the other hand, an interrupt source prioritised to level zero is effectively permanently disabled.
There is also an Alternate Vector Table, which can be seen in Figure 21.4 (a). This can be enabled in place of the IVT, generally to support emulation or debug, allowing alternative interrupt prioritisations or strategies to be explored.

21.3.4. Clock sources

Figure 21.6 shows the 24FJ64GA004 clock source structure. This demonstrates another evolutionary step in the development already seen in Figure 12.6 and Figure 13.15. Four possible clock sources are evident, two external and two internal, lying above each other to the left of the diagram. The external sources are the primary oscillator connected to pins OSCO/OSCI and the secondary oscillator on pins SOSCO/SOSCI. The secondary oscillator also connects to Timer 1 and the Real Time Clock and Calendar, as we shall see. Internal clock sources are provided by the FRC and LPRC oscillators. The FRC oscillator frequency can be divided down by binary values, up to 256. It and the primary oscillator can also be multiplied by four, by the Phase Locked Loop. An interesting development is that, after the clock multiplexer, the clock to the CPU can be optionally divided down, by binary values up to 128, leaving a faster clock running to the peripherals. This postscaler is used by entering the ‘Doze’ mode. Finally, there is a Fail Safe Clock Monitor. This important feature was described in Section 12.5.2.
Figure 21.6
The clock sources

21.3.5. Power supply

A close look at Figure 21.2 shows that there are two supply voltage inputs, VDDCORE and VDD. Broadly, these supply respectively the core and the peripherals. The voltage requirements for each are summarised in Table 21.2. This shows a much lower operating voltage than found in the PIC 8-bit microcontrollers. As we see, the microcontroller core can operate down to 2.0 V, though it needs at least 2.35 V to run at full clock frequency. Within the limits specified, VDD and VDDCORE may be supplied at the same voltage or VDD may be greater than VDDCORE; it must never be less.
TABLE 21.2 Power supply voltages
Supply voltageMinimum (operating frequency restricted)MaximumAbsolute maximum
VDDCORE with respect to VSS2.0 V2.75 V3.0 V
VDD with respect to VSSThe higher of 2.0 V or VDDCORE3.6 V4.0 V
An internal regulator is available, of nominal output 2.5 V. It draws its input from VDD and can be used to power VDDCORE. The regulator is enabled or disabled by the DISVREG pin, also seen in Figure 21.2.
The three possible power supply operating modes are shown in Figure 21.7. In the first, the voltage regulator is enabled by tying DISVREG low. The regulator powers the core, drawing its supply from the main VDD input. An external capacitor must then be connected to the VDDCORE/VCAP pin. In the second, the regulator is disabled by tying DISVREG high; VDD and VDDCORE are then independently powered from external supplies. In this configuration, it is important that VDD remains greater than or equal to VDDCORE. In the third the regulator is again disabled, and VDD and VDDCORE are both supplied from a single 2.5 V supply.
Figure 21.7
Power supply modes

21.3.6. The pins and ports

Figure 21.2 shows Port A having 10 bits, Port B with 16 and Port C with 10. Like the 18 Series, each port and hence each port pin driver circuit has three SFRs relating to it, TRISx, PORTx and LATx. The general structure of a port pin driver circuit is shown in Figure 21.8. The data direction is set by TRISx, with a Logic 1 on this causing the port to act as an input. A write to either PORTx or LATx determines the state of the data latch. However, a read from PORTx reads the actual port bit value, while a read from LATx reads the data latch value. All ports share pins with peripherals, and these can claim precedence over port settings. Therefore if the port function is to be used it is important to ensure that the pin has not already been allocated to a peripheral role.
Figure 21.8
A typical shared port structure
The PIC24 series gives an important step forward by allowing the user to select which pin connects to certain input/output functions. Thus one is no longer constrained to the pin-out predetermined by the manufacturer. This is called the ‘peripheral pin select’ feature. It is available on up to 26 pins, seen as RP0:RP25 in Figure 21.2. The actual number of any microcontroller is dependent on its pin count. Peripheral pin select is not available on analog input/output, but does include comparator outputs, which are of course digital. The selection of pin connections to peripherals is controlled by SFRs, mapping inputs and outputs independently.
A concern when designing with this family of microcontrollers is that the lower supply voltage may make it difficult to interface with external devices supplied from 5 V. It is interesting to note therefore that digital-only pins accept up to 5.5 V as input. Their lower output voltage can still cause an interfacing problem, however. A useful solution proposed by Microchip is to tie the output to 5 V with an external pull-up resistor and set the data latch (of Figure 21.8) to Logic 0. If the bit is then set to input by writing a Logic 1 to the TRISx register, the pull-up resistor will raise the output to Logic 1. If the bit is set to output, it will assert the Logic 0 already placed in the data latch. The bit value held in the TRISx register effectively becomes the bit output value. Of course this doesn't lead to ideal operating characteristics. Without an active pull-up, the rise-time of the 0 to 1 transition will depend on the time constant formed by the pull-up resistor and the line capacitance (as we saw with I2C lines in Section 10.6); when the line is at Logic 0 the pull-up resistor will cause an unfortunate current drain.

21.3.7. Peripherals and the Real-Time Clock and Calendar

The block diagram of Figure 21.2 shows a wide range of peripherals. We are familiar, from earlier chapters, with the concepts of almost all of these, so we do not go into detail here. A survey of Ref. 21.2 shows that most have grown in complexity, while continuing to apply the principles already covered. The quantity of peripherals has increased, with five Timers, two SPI ports, two I2C ports, and so on. The peripherals are summarised in Table 21.3.
TABLE 21.3 16-bit peripherals summary
PeripheralBrief description
Timer 116-bit timer with connections for external crystal (SOSC0/1); built-in digital comparator and period register allow easy set up of periodic interrupts.
Timer 2/3Can be configured as a single 32-bit timer, or as two 16-bit timers, both options with built-in comparator and period register.
Timer 4/5As Timer 2/3.
Input CaptureCaptures a Timer value (Timer 2 or 3) at instant of selected edge applied to input pin; input can be optionally divided by 4 or 16; captured values can be held in a four-level FIFO buffer.
Output CompareGenerates an output pulse when the value of a selected Timer is equal to a compare register, with the added option of interrupt on compare match.
Serial Peripheral Interface (SPI)Operates in 8-bit or 16-bit mode (both receive and transmit), with eight-level receive and transmit buffers.
I2CI2C module with independent Master and Slave logic, supporting 100 kHz and 400 kHz bit rates.
UART8- or 9-bit UART, with four-level receive and transmit buffers and support for LIN and IrDA.
Parallel Master Port/Parallel Slave PortHighly configurable 8-bit I/O parallel port with up to 16 bits of address, suitable for interfacing with conventional addressed parallel buses, configurable also as slave port.
Real-Time Clock and CalendarCalendar and clock with one second resolution, suitable for timing over long durations, optimised for low-power applications.
ADC10-bit ADC with up to 16 analog inputs and up to 500K samples per second, 16 word results buffer.
ComparatorHighly configurable analog comparators with a scalable voltage reference.
An interesting and important peripheral not yet seen in this book is the 100-year RTCC (Real-Time Clock and Calendar). It is designed to show times and dates over 100 years, storing seconds, minutes, hours, day of week, date, month and year, running from midnight on 1 January 2000 to midnight on 31 December 2099, with leap-year correction. Alarms can be set, to initiate actions at certain moments. It is designed to run from a 32.768 kHz oscillator connected to the secondary oscillator and is optimised for low power use.

21.3.8. Conclusion on the PIC24F family

Although this has been a quick overview, it is still possible to draw some conclusions. The PIC24F family represents a fairly direct, though substantial, upgrade from the 18 Series. Everything has got bigger, everything has got better. The overall architecture is recognisably ‘PIC', with a Harvard structure and familiar patterns of memory structure, and data and information flow. Significant developments have of course been made in the memory structure, for example the Program Space Visibility feature. The CPU is a recognisable PIC descendant, though now 16-bit and with the 16 W registers. The interrupt strategy has taken a leap forward and abandoned the rather modest structures used in the 8-bit PIC world. All peripherals have adapted to the 16-bit environment, but all carry and apply the concepts we have found in earlier chapters. Importantly, we have emulated the world of programmable logic devices by introducing remappable pins. Finally, there has been a further refinement of ancillary features, like the power supply and clock. With these it is possible to design a system which makes truly optimal use of power supply. Do we need anything more? Well, let's now envisage a situation in which we have a very demanding signal-processing application, or where 16-bit data representation is not enough.

21.4. The dsPIC digital signal controller

21.4.1. What is Digital Signal Processing?

Way back in Section 1.4 we talked about the microcontroller as a general-purpose single-chip computer. The Section went on to describe how the microcontroller formed an evolutionary branch from the microprocessor, responding to the need for small-scale intelligent control. Of course there were other requirements placed upon the microprocessor. It was already recognised that a signal could be repeatedly digitised, and that the stream of samples so produced could be processed digitally. The resulting digital signal could then if needed be converted back to analog form, with the sequence in simplest form shown in Figure 21.9. This led to a huge range of possibilities, among other things replicating and improving on a number of processes traditionally done in analog form, such as filtering. This body of technology and expertise came to be called Digital Signal Processing (DSP), and became an important specialism in its own right. Some microprocessors became customised to the very particular demands of DSP. Major textbooks have been published on DSP, for example Ref. 21.4.
Figure 21.9
Digital signal processing – simplest form!
Digital Signal Processing is used for a range of activities. Many of these replace functions already available in analog form, like amplifying, filtering or waveform generation. In general, DSP undertakes these functions better, for example with more precision in frequency and voltage. As a very simple example, a gain control can be implemented by multiplying each incoming sample by a fixed value before outputting the result. DSP also delivers performance beyond the capability of an analog system, for example by producing a filter that is of an order higher than an analog system allows. DSP is also widely used for transformation from the time to the frequency domain, for example to analyse a vibration signal to pinpoint the component vibration frequencies, and more advanced applications like convolution and correlation. For more information on these, read Ref. 21.4 or its equivalents.
Generally a DSP process depends on a continuous stream of samples; the algorithm in use operates on the most recent sample, along with a set of the previous samples stored in memory. The data sequence is often held in a circular buffer; every time a new sample is stored the oldest is discarded. Each sample needs to be multiplied by a coefficient determined by the algorithm, with the final result deduced by making additions or subtractions with the results of the multiplications. The algorithm is run every time a new sample is taken. Taking audio sampling as an example, a common frequency is 44.1 kHz. This allows a mere 22.7 μs between samples in which, to take and save the new sample, run the algorithm and output the result before the next sample must be made. This implies a requirement for high-speed data conversion and high-speed memory access and processing.
So what do we expect to find in a digital signal processor? A hardware multiplier is essential; a software multiply solution will be just too slow. Recall here that an m-bit number multiplied by an n-bit number produces an (n + m) bit result. In many cases the multiply result needs to be added into an accumulation of results. The process of making a series of multiplications, and accumulating the result through addition or subtraction, is a classic DSP process, often abbreviated to MAC (Multiply Accumulate). As they may add a long sequence of numbers together, accumulate stores may be far larger than the size of the multiply result, which itself is probably twice the size of the ALU. With all these complex numerical processes going on, there is a risk of signal overload, which can be a source of error. Therefore some means of sensing and correcting for this can also be expected.
For many years DSP and embedded systems evolved separately and seemed two rather independent activities. Embedded systems were input/output intensive and with modest processing requirements, while DSP devices focused on the high-speed processing of a small number of signals. But now the two fields have converged. Many embedded systems need to process signals with some sophistication. On the other hand, what are primarily DSP systems may need to deal with other inputs and outputs and generate control signals.
In their Digital Signal Controllers, the dsPIC family, Microchip have recognised and responded to this convergence of DSP and embedded system technology. Put another way, they recognise the increasing need in embedded systems for DSP capability. The Harvard architecture, as used by Microchip in all its products to date, is a natural for DSP, in that it allows simultaneous access to program and data memory. A RISC-based instruction set is also preferred, due to the high-speed operation that this allows. Therefore Microchip architectures seem well-placed to incorporate DSP capability.

21.4.2. The dsPIC30F and dsPIC33F

The dsPIC30F and dsPIC33F are two closely related families of PIC microcontrollers which have a powerful DSP capability. Table 21.1 has already summarised their main characteristics. The block diagram of the dsPIC30F3010 is almost the same as that of the PIC24F family, for example, as seen in Figure 21.2. It is therefore not reproduced in this introductory overview. There are, however, several important differences. One is that data memory is split into two parts, called X and Y. Each has its own address generator and data bus. Most operations act only on the X memory, and it remains the main microcontroller data memory. Certain DSP instructions, however, access both X and Y buses. The second difference is the presence of a ‘DSP engine’ alongside the main CPU. The DSP engine is an extra processing unit designed to provide DSP capability; it links to the W register block and responds to the specialist DSP instructions.

21.4.3. The Digital Signal Processing engine

Let's trace through the structure of the DSP engine, and see how it relates to some DSP issues already discussed. Its block diagram is shown in Figure 21.10. In the DSP engine we see the main elements of a digital processor, in its multiplier, adder (and, along with the ‘Negate’ unit, a subtracter) and its accumulators. The DSP engine links to the main CPU through the W registers, seen at the bottom of Figure 21.10. A link is also made here to the X and Y data buses mentioned above. Data can be pre-fetched from memory through these and placed in the W registers. Some of the W registers are allocated specific roles; for example registers W4 to W7 are used as operands and W8 to W11 as addresses. The multiplier is shared with the main CPU and has already been mentioned in that context. It can interpret numbers as integer or fractional. To facilitate this process, the multiplier is able to extend the 16-bit number representation to 17 bits. The 33-bit result of the multiply action is usually added into one of the 40-bit accumulators. The multiply result is therefore extended to this size, using the Sign Extend unit. This ensures that the data representation and value of the number is correctly retained, while it is expanded to the larger representation.
Figure 21.10
The Digital Signal Processing engine block diagram (supplementary labels in shaded boxes added by the author)
We can follow now how the multiplier output makes its way to one of the accumulators. A multiplexer selects the input to the barrel shifter from either one of the accumulators, or from the extended multiplier output. The barrel shifter allows shifts of up to 16 bits left or right in a single instruction. One use of this is to scale a number, by multiplying or dividing by powers of two. Following the 40-bit route out of the barrel shifter, data can be transferred via another multiplexer to the adder, going via the optional Negate unit. Here it can be added to (or subtracted from) the contents of one of the accumulators. Data paths for the addition of the two accumulators can also be seen. Accumulator results can be transferred to the X data bus, and hence back to data memory and the 16-bit environment, through a path shown to the right of the diagram.
The accumulators themselves can be viewed as 32-bit numbers, with eight guard bits added to avoid overflow. When a number is sign-extended, these upper eight bits simply hold eight sign bits, one for negative, zero for positive. The presence of these extra bits allows the results of additions or subtractions to overflow into the guard bits. This overflow can be detected and corrected, for example by right-shifting.
At a number of stages in the DSP engine data flow, we see implementation of Saturation or Rounding. These are standard numerical processes, which become particularly important in applications in which a large amount of data processing takes place. Both of these processes can be enabled, or disabled, through bits in the CORCON (Core Control) register. This is seen in Figure 21.3 for the PIC24F microcontroller; more bits are added for the dsPIC family. Saturation applies if a calculation produces a result that is beyond the range defined by the word size and the data representation used. If such an over-range is detected, then the maximum or minimum value, as appropriate, is substituted in its place. Rounding is applied when a word size is being reduced, for example from 32-bit to 16-bit.
The dsPIC devices use the instruction set of the PIC 16-bit microcontrollers, expanded by the addition of a set of DSP instructions. Examples of these instructions are given in Table 21.4. Notice first the simple data transfer instructions, like lac and movsac. These allow direct data transfer between the 16-bit registers and the 40-bit accumulators, making flexible use of techniques already mentioned to change the word size. Simple multiply instructions are represented by mpy and mpy.n, with settings in the CORCON register determining exact implementation of the instruction. Classic DSP instructions are represented by mac and msc. In both of these we see most elements of the DSP engine put to use, in the single instruction. Note also that these and other instructions allow operands to be pre-fetched into the W registers, simultaneously with the execution of the main instruction. This important capability allows greatly increased execution speed.
TABLE 21.4 Some example Digital Signal Processing instructions
Optionally pre-fetches operands for a subsequent operation.
Instruction mnemonicSummaryNote
addAdd accumulators OR Sign extend and zero backfill a 16-bit number, optionally shift it, and add to specified accumulator; i.e. original 16-bit number is added to the more significant word of accumulator.
lacOptionally shift the contents of a W register, zero backfill and sign extend, and store in specified accumulator.The data is assumed to be in fractional format.
macMultiply the contents of one W register by another, sign extend to 40 bits, and add to specified accumulator.The IF bit of CORCON determines fractional or integer multiply.
movsacRound specified (40-bit) accumulator, and store in specified (16-bit) W register.
mpyMultiply the contents of one W register by another, sign extend to 40 bits, and store in specified accumulator.The IF bit of CORCON determines fractional or integer multiply.
mpy.nMultiply the contents of one W register by the negative of another, sign extend to 40 bits, and store in specified accumulator.The IF bit of CORCON determines fractional or integer multiply.
mscMultiply the contents of one W register by another, sign extend to 40 bits, and subtract from specified accumulator.The IF bit of CORCON determines fractional or integer multiply.
sac.rOptionally shift specified accumulator, and store rounded version of higher byte to W register.Accumulator contents are unchanged after this operation.
sftacShift the 40-bit accumulator contents by up to 16 bits left or right, result stored back in accumulator.Saturation is implemented, if enabled in the CORCON register.
It is important to stress that the instructions shown are complex, each having a number of different options. Only summaries of their actions are given here. When programming, at least in Assembler, it is essential to refer to the Programmer's Manual, Ref. 21.6.

21.4.4. Conclusion on the dsPIC family

So now we have a powerful and fully featured microcontroller, with an embedded DSP capability thrown in. Experts of DSP will be quick to point out that with the dsPIC we do not get the full power of a dedicated DSP device and that some of its features are limited, but this is hardly the point. What we are getting is the added value of DSP capability in the embedded application. Microchip's special calling, of making digital processing capability available in smaller scale and at lower cost than was previously thought possible, has been delivered again.

21.5. The PIC32 32-bit microcontroller

The PIC32 32-bit microcontrollers are by far the most sophisticated of the Microchip range. They incorporate some features of smaller PIC microcontrollers, for example the peripherals, oscillator or power management design. Yet they also include features of large-scale digital systems, like JTAG (a digital system test protocol, of which more below) and DMA (Direct Memory Access). Notably, they used a CPU licensed from MIPS Technology, their MIPS32 M4K core, which of course is completely different from all Microchip offerings.
At the time of writing there is a limited number of microcontrollers in this range, with all PIC32 microcontrollers being described within a single data sheet, Ref. 21.7. There is also a family reference manual, Ref. 21.8. Coding for PIC32 microcontrollers is shown in Figure 21.11, with features of the largest and smallest in the range seen in Table 21.5.
Figure 21.11
Current coding of 32-bit PIC microcontrollers
TABLE 21.5 Example 32-bit PIC microcontrollers
PIC32 deviceShared featuresUnique features
PIC32MX320F032H (smallest in range)32-bit MIPS M4K Core, 32-bit data paths, 16- or 32-bit instructions, 2.3 to 3.6 V supply, single-cycle hardware multiply,32 Kbytes of program memory, 8 Kbytes of data memory, DC 40 MHz clock, 64-pin.
PIC32MX460F512L (largest in range)wide range of peripherals, pin and peripheral compatible with PIC24 devices, MPLAB available as development environment, with C32 C compiler and other third-party offerings.512 Kbytes of program memory, 32 Kbytes of data memory, USB On-the-Go, DC 80 MHz clock, 100-pin.

21.5.1. Overview of PIC32 architecture

The block diagram of the PIC32MX4XX microcontroller group is shown in Figure 21.12. Microcontrollers within the group are available with different data and program memory sizes, and differ from the PIC32MX3XX only in that the ’4XX has a USB port.
Figure 21.12
Block diagram of the PIC32MX4XX
The blocks at the top of the diagram, dealing with clock and power management, should be familiar to anyone progressing through this chapter. The oscillator block is similar in concept to Figure 21.6 but includes provision for USBCLK. The main system clock (SYSCLK) and the peripheral bus clock (PBCLK) appear at this point. Like the 16-bit microcontroller, there are separate supplies for VDD and VDDCORE, and an internal regulator controlled by the ENVREG pin. This acts with inverse logic to the DISVREG pin seen in Figure 21.7. Aside from this, the principles of the first two circuits of that diagram apply. The nominal value for VDDCORE is 1.8 V (with absolute maximum of 2.0 V), and 2.3 V to 3.6 V for VDD.
Looking round the sides of the diagram, the names of all peripherals should be familiar. It is gratifying to recognise that, in all the complexity of the 32-bit PIC microcontroller, the peripherals are compatible and very similar to the 16-bit versions. Therefore Table 21.3 applies. A USB On-the-Go port, with capability as outlined in Section 20.5.2, is included in the larger PIC32 microcontrollers.
It is the central features of this microcontroller which appear entirely new and different. Gone are the usual familiar Microchip features. To make sense of the structure, try starting with the ‘Bus Matrix’. This is the meeting point of all major system components and data paths. In one way the Bus Matrix is like a very sophisticated data bus, in that all data transfers pass through it. However, in reality it is a switching system which can establish one or more point-to-point contacts between different system elements at any one time; through these contacts data transfer can take place.
The CPU has two buses linking to the Bus Matrix. One is for instructions (IS) and the other for data (DS). The SYSCLK Peripheral Bus connects directly to the Bus Matrix, and in turn connects the DMA controller, USB, ICD and the parallel ports. This bus runs at the same speed as the CPU, and is used for peripherals which have high data throughput or which must run fast. A data transfer in the Bus Matrix can be initiated by either of the two CPU buses, the DMA controller, the USB or the ICD; these are called ‘Bus Masters’. Another bus, clocked by PBCLK, connects all other peripherals. This is potentially the slower bus, and connects to the Bus Matrix through the Peripheral Bridge; PBCLK can be run at the same speed as SYSCLK, or at the SYSCLK frequency divided by two, four or eight.

21.5.2. The Central Processing Unit

At the heart of the microcontroller lies the MIPS Technology 32-bit core. It is interesting to get a picture of the many other applications this core is used for by checking the MIPS Technology website, Ref. 21.9. The architecture was developed by John Hennessy of Stanford University. Interestingly, Hennessy is co-author of Ref. 21.1, which uses the MIPS core as an example. That book is therefore particularly appropriate as background reading for this device.
The MIPS CPU is a complex thing. It contains an ‘execution unit’ for mainstream CPU operations, a ‘multiply/divide unit’, doing just what its name suggests, and a ‘system control coprocessor,’ which handles some of the operational features like interrupts, address translation and debug. The execution unit has 32-bit registers, holding data and address information. There is also a ‘shadow set’ of register files, to ease context saving during interrupts.
The multiply/divide unit can execute 16-bit × 16-bit or 16-bit × 32-bit multiplications in one clock cycle, or 32-bit × 32-bit in two. Divide operations replicate the looping algorithm mentioned in Section 21.3.1. In addition to regular multiply instructions, the instruction set contains two instructions, madd (multiply and add) and msub (multiply and subtract), intended for DSP applications.
The CPU has a five-stage pipeline, illustrated in Figure 21.13. Most instructions execute in these five stages, each stage taking one instruction cycle. Notice the difference between this figure and the simple two-stage pipeline of Figure 2.8. There, fetch and execute were the only two pipeline stages. Here, fetch and execute remain broadly speaking the first two stages, but other useful housekeeping and data transfers (including load and store transfers) take place in the later stages, in parallel with other activities from other instructions.
Figure 21.13
The five-stage PIC32 pipeline
The CPU has three modes of operation, ‘kernel’, ‘user’ and ‘debug’. On reset, the CPU is in kernel mode, which is the most general-purpose and powerful. This gives access to the whole memory space and all peripherals. User mode restricts access to a range of resources. It does not have to be used at all, but it can be viewed as a safer operational mode for some activities, with transfer to kernel mode where needed. Debug mode is of course used by debuggers; it allows access to all kernel mode resources, including those specifically for debug.
An important feature of the microcontroller is its JTAG capability. JTAG, the Joint Test Action Group, was formed in the mid 1980s. At this time digital systems were becoming increasingly complex and it was no longer possible to access test points within a system. Therefore it became necessary to design test points and test facilities into the hardware itself. JTAG wished to develop an approach which was compatible across all manufacturers. Their proposal was adopted as IEEE Standard 1149.1, although the terminology JTAG is still commonly used. Integral to the approach is a ‘boundary scan’ mechanism, whereby signals at component boundaries are monitored or controlled. At chip level, the JTAG interface is implemented with a 4- or 5-wire interface. An enhanced JTAG standard is applied here.

21.5.2. The memory

Figure 21.12 shows separate blocks for program and data memory. All memory in the PIC32 microcontroller, including data, program, configuration and SFRs, is, however, mapped into a single memory space, with a 32-bit data width and 32-bit address bus. Thus, in theory a total memory space of four Gwords is available. One outcome of this is that program execution can take place from data memory. While all locations have physical addresses, a virtual address space is also set up, with a mapping translation unit to map between the two.
A potential bottleneck in program execution is program memory access time. Therefore a ‘prefetch cache’, or module, is introduced between the program memory and the Bus Matrix. This is a memory type which allows faster access than the program memory. Instructions can be fetched from memory and held in readiness for use. Speed is increased because the prefetch cache has a 128-bit data path from program memory, allowing four 32-bit words to be transferred simultaneously.

21.5.3. Conclusion on the PIC32 family

This is the very last microcontroller to be described in this book, and it is so complex that we can do it little justice in these few pages. What we have seen is Microchip leaping into this demanding field by buying the licence to a highly sophisticated 32-bit processor and placing it within a peripheral, clock and power supply context, which is a distinct evolutionary step from earlier Microchip designs.
Way back in Figure 1.8 and Figure 1.9 and their accompanying text, we defined the microcontroller and microcontroller families. It is pleasing to see that, through all the advancing levels of complexity that we have been through, we have remained more or less true to those ideas and diagrams.

21.6. A last and final conclusion

We have covered a remarkable range of ground over the past 20 chapters. Starting from almost nowhere, we have gradually developed a sophisticated picture of the structure of a microcontroller. We have programmed it in both Assembler and C, have interfaced it to a range of sensors and actuators, and linked it with a second microcontroller, itself linked to another microcontroller, thus creating a tiny network. We have gone on to place our programs under the discipline of a real-time operating system. We have successfully powered all of this from a modest battery supply. All this represents a tremendous achievement and, if you have followed it all, you have done well. The final two chapters have given some possible directions for future activity.
I hope you have enjoyed this voyage of exploration in the world of embedded systems, and wish you much enjoyment as you go on to design, build and program many more ‘thinking things’ of your own.


• There are many situations in the embedded world where 8-bit microcontrollers do not offer adequate performance.
• The PIC24F and 24H series represent an evolutionary progression route from the 18 Series into the 16-bit domain.
• The dsPIC30F and dsPIC33F families neatly add DSP capability to the 16-bit microcontrollers.
• The PIC32 microcontrollers represent a step function change from the 16-bit PIC microcontrollers; with their licensed 32-bit core and advanced features, they take working with PIC microcontrollers into a new domain of advanced digital systems.
• The peripherals, clock source and power supply management of the PIC32 microcontrollers are similar to those of the 16-bit PIC microcontrollers, so represent evolutionary development in that area.
21.1. Patterson, D.; Hennessey, J., Computer Organization and Design. 4th edn (2008) Morgan Kaufman ; ISBN 978-0-123-74493-7.
21.2. PIC24FJ64GA004 Family Data Sheet (2008). Microchip Technology Inc., Document No. DS39881C.
21.3. PIC18F to PIC24F Migration (2006). Microchip Technology Inc., Document No. DS39764A.
21.4. Bateman, A.; Paterson-Stephens, I., The DSP Handbook. (2002) Pearson Education ; ISBN 978-0-201-39851-9.
21.5. dsPIC30F3010/3011 Data Sheet (2008). Microchip Technology Inc., Document No. DS70141E.
21.6. dsPIC Programmer's Reference Manual (2005). Microchip Technology Inc., Document No. DS70030F.
21.7. PIC32MX3XX/4XX Family Data Sheet (2008). Microchip Technology Inc., Document No. DS61143E.
21.8. PIC32MX Family Reference Manual (2008). Microchip Technology Inc., Document No. DS61132B.
21.9. The MIPS Technologies website:
Note: All information to answer these questions should be available from this chapter; the data sheets make useful background reading, however.
1. Make a list of some embedded products available today, trying to identify some very simple ones, others of medium complexity, and others of high complexity (where complexity relates to processing and interface needs). Speculate on and explain which PIC microcontroller would be appropriate to power each device. Include for consideration all devices covered in the book.
2. At one moment in program execution the IPL<4:0> bits in a PIC24FJ64GA004 read 1100. What recent event is likely to have happened?
(a) What is the lowest frequency that can be supplied to the PIC24FJ64GA004 CPU:
(i) If the FRC oscillator is used?
(ii) If the LPRC oscillator is used?
(b) What is the highest frequency that can be supplied to a Timer through internal clock distribution if a 16 MHz clock is used as the primary oscillator?
(a) Sketch a circuit diagram of a PIC24FJ64GA004 port output interfaced to a 5 V system, as described in Section 21.3.6.
(b) If eight such lines are interfaced, and pull-up resistors of 24 kΩ are used, what is the worst-case extra power drain that they add?
5. In the DSP engine of Figure 21.10, trace the paths that data is likely to take and the system elements used for each of the DSP instructions lac, sac, msc and movsac.
(a) How many data and program memory locations would you expect to find in a PIC32MX460F512L?
(b) If SYSCLK is running at 24MHz, what is the slowest that PBCLK can run at?
(a) Why can the third circuit of Figure 21.7 not be applied to a PIC32 microcontroller?
(b) A PIC32 microcontroller is to be supplied from a single voltage only. Sketch a circuit showing all necessary connections.