17 Beyond Pentium–More Advanced Processors – The x86 Microprocessors: 8086 to Pentium, Multicores, Atom and the 8051 Microcontroller, 2nd Edition


Beyond Pentium—More Advanced Processors


  • The Microarchitecture of IA processors that came after Pentium.
  • Processors with P6 microarchitecture.
  • How MMX and SSE instructions work.
  • Features of the Netburst microarchitecture.
  • New features added in Nehalem.
  • IA advancements like HT, VT and Turbo Boost.
  • What is meant by the term ‘nanometer technology.

The x86 ISA is implemented by the world’s leading semiconductor company Intel as well as a few other small manufacturers. AMD is the most prominent name in the list of Intel’s competitors. In the following sections, the discussion of the evolution of the x86 architecture will be based on the Intel series of processors. In a later chapter, we will digress briefly in certain sections to focus on special architectural features of some AMD processors as well. However, it has to be kept in mind that though both Intel and AMD present the same ISA, they are different in the microarchitectural and chip level design and also have different internal bus structures.

In Chapter 15, the important new features of Pentium (compared with earlier processors) were covered. The x86 architecture that started with the 8086 is still progressing ahead in full speed with more new processors continually being launched. Every new processor boasts of new features while ensuring total backward compatibility. However, the market place is changing drastically for processor-based products. The PC revolution and evolution is giving way to the new trend of handheld devices, mobile phones and tablets. However, before we discuss that (Chapter 18), let us complete our discussion on the evolution of the x86-based desktop processor series up to the present time. We will see the powerful features of the latest processors and how high speed and high performance computing are possible with these processors.

In the forthcoming sections, we will use two words related to processors. One is the name of the processor and other the name of its ‘microarchitecture’. As we proceed, the meaning of this statement will become clear.

17.1 | Processors Based on the P6 Microarchitecture

In 1995, Intel released its sixth generation processor and called it Pentium Pro. (The microarchitecture of Pentium Pro is also referred to as P6). Following this, two new processors with names Pentium II and Pentium III were released. Both of them were based on the P6 microarchitecture, with minor enhancements/differences for each of them, the details of which are as listed in Table 17.1. Intel’s P6 microarchitecture, first instantiated in the Pentium Pro was, by any calculation, a super success story. Its performance was significantly better than that of the Pentium.

Table 17.1 Comparison of Three P6 Processors

We will now make a brief review of the enhancements of these processors over their predecessor, the first Pentium. The P6 microarchitecture introduced several unique architectural features that had never been seen in a PC processor before. All these features have been discussed thoroughly in Section 15.3. How the Pentium Pro instantiated these architectural advancements is explained there. As such, more explanation on the P6 microarchitecture is unnecessary. However, comparing the Pentium Pro with its predecessor Pentium will not seem out of place.

17.2 | Features of Pentium Pro

The Pentium Pro is the first mainstream CPU to radically change instruction execution. All instructions are translated into RISC-like microinstructions before executing them on a highly advanced internal core. The Pentium Pro achieved a performance approximately 50% higher than a Pentium of the same clock speed. In addition to its new way of processing instructions, the Pentium Pro incorporated several other technical advancements that contributed to its increased performance.

  1. Superpipelining (Section 15.1.11): The Pentium Pro dramatically increased the number of execution steps, to 14, from the Pentium-V.
  2. Integrated Level 2 Cache: The Pentium Pro features a dramatically higher performance secondary cache compared to all earlier processors. Instead of using ­motherboard-based cache running at the speed of the memory bus, it uses an integrated level 2 cache with its own bus, running at full processor speed, typically three times the speed that the cache runs on the Pentium. The Pentium Pro’s cache is also non-blocking, which allows the processor to continue without waiting on a cache miss.
  3. 32-Bit Optimization: The Pentium Pro is optimized for running 32-bit code (which most modern operating systems and applications use) and so gives a greater performance improvement over the Pentium when using the latest software.
  4. Wider Address Bus: The address bus on the Pentium Pro is widened to 36 bits, giving it a maximum addressability of 64 GB of memory.
  5. Greater Multiprocessing: Quad ‘multiprocessor’ configurations are supported with the Pentium Pro compared to only dual with the Pentium. This means that four Pentium Pro processors can be interconnected on the Pentium Pro processor bus.
  6. Out of Order Completion (Section 15.2.6): Instructions flowing down the execution pipelines can be executed ‘out of order’.
  7. Superior Branch Prediction Unit (Section 15.2.7): The branch target buffer is double the size of that of Pentium and its accuracy is higher, as much as 94%.
  8. Register Renaming (Section 15.2.8): This is one of the solutions for the pipeline stalling that might occur when storage conflicts prevent two sequential instructions to be executed in parallel. This feature improves parallel performance of the pipelines.
  9. Speculative Execution (Section 15.3): The processor uses speculative execution to reduce pipeline stall time in its RISC core.

17.3 | Pentium-II and Pentium-III

These two processors have P6 as its microarchitectural core, but provide a few new instructions and enhancements over and above that of the Pentium Pro processor.

17.3.1 | MMX

In 1997, a new set of instructions catering to multimedia called Multimedia

Extension (MMX) were introduced in Pentium processors. Such P5 processors were designated as Pentium MMX. MMX is also a feature in P-II and P-III as shown in Table 17.1.

  1. The MMX architecture adds eight 64-bit registers to Pentium. The MMX instructions refer to these registers as MM0, MM1, MM2, MM3, MM4, MM5, MM6 and MM7.
  2. These are strictly data registers, and they cannot be used to hold addresses or are they suitable for calculations involving addresses.
  3. They are not a set of new registers, actually. They are the same registers used by the Floating Point Unit (FPU) of any Pentium. In the FPU, there are eight 80-bit registers (ST(0) to ST(7)) that operate in a stack-like fashion. The MMX registers use the lower 64 bits of the 80-bit FPU registers. (Obviously, FPU and MMX instructions cannot be mixed.)

What does MMX do?

It performs SWAR (SIMD within a register), where SIMD stands for ‘Single Instruction Multiple Data’. Let us examine this concept in more detail.

For example, consider monochrome image data. Here, every pixel is represented by a single byte. There are various computations that are to be done with pixel data.

In a 64-bit register, there can be 8 bytes representing 8 pixels. Thus, a single register holds multiple data. The content of this register can perform an operation with 8 bytes of another 64-bit register. In effect, given the data types that the MMX instruction set supports, it is possible to process up to the following:

  1. eight ‘byte’ objects in parallel
  2. four ‘word’ objects in parallel
  3. two ‘double words’ in parallel

Figure 17.1 shows a 64-bit register, ‘packing’ eight bytes or four words of two double words in it. Figure 17.2 shows operations being done in parallel between two MMX registers. A single instruction operates on four words. Thus, four operations get executed in parallel. This amounts to SIMD. Similar is the case for packed bytes and packed double words as well.

Figure 17.1 | Data types introduced with MMX

Figure 17.2 | SIMD Execution model

It is seen that many kinds of data benefit by the use of MMX instructions. For example, most programs use a stream of bytes or words to represent audio and video data. For such data, MMX instructions can accelerate the program by almost a factor of four to eight. | Saturation Arithmetic

When packed bytes/words/double words are added/subtracted, there is the possibility of ‘overflow’. How is this handled?

The answer is the use of ‘saturation arithmetic’. In saturation mode, the results of an operation that overflow or underflow are clipped (saturated) to some maximum or minimum value depending on the size of the object and whether it is signed or unsigned.

  1. The result of an operation that exceeds the range of a data-type saturates to the maximum value of the range.
  2. A result that is less than the range of a data-type saturates to the minimum value of the range.

Table 17.2 shows these saturation values.

Table 17.2 | Saturation values for packed data arithmetic

Note 1 Just because MMX instructions use 64 bits, Pentium does not become a 64-bit processor. It is still an IA-32 processor with all its general purpose processors being 32 bits long. Figure 17.3 shows the MMX programming environment of IA-32 processors with MMX added to it.

Figure 17.3 | MMX programming environment

17.4 | Streaming SIMD Extensions (SSE)

The streaming SIMD extensions (SSE) were introduced into the IA-32 architecture in the Pentium-III processor family. These extensions enhance the performance of IA-32 processors for advanced 2-D and 3-D graphics, motion video, image processing, speech recognition, audio synthesis, telephony and video conferencing. SSE is a newer SIMD extension to the Intel Pentium-III and AMD AthlonXP microprocessors.

Unlike MMX extensions, which occupy the same register space as the normal FPU registers, SSE adds a separate register space to the processor. Figure 17.4 shows the execution environment for the SSE extensions. All SSE instructions operate on the XMM registers, MMX registers and/or memory. MXCSR is a control/status register. SSE includes 50 new instructions, which enable simultaneous, advanced calculations of more floating-point numbers with a single instruction. It accelerates performance on a wide variety of applications like video, audio, 3D graphics and image processing. Later processors added more capabilities to the SSE set under the titles of SSE2, SSE3 and SSES4.

Figure 17.4 | SSE Execution Environment

17.5 | Pentium-IV

In November 2000, Intel introduced its seventh generation processor Pentium-IV, which had high clock speeds (starting from 1.5 GHz) as an important feature, and a totally new microarchitecture. This was designated as ‘Netburst’ and the following new features define this microarchitecture.

  1. Hyperpipelined Technology
  2. 400 MHz System Bus
  3. Execution Trace Cache
  4. Rapid Execution Engine

Some features of the Netburst microarchitecture need explanation, because of the ‘novelty’ of the concepts involved. Let us focus on these novel features alone.

  1. Hyperpipelined Technology: The number of stages of the pipeline is ‘20’ for this microarchitecture.
  2. 400 MHz System Bus: The bus that communicates between the processor and memory is the system bus. For Netburst, the system bus is timed by a clock of 100 MHz, but it is ‘quad pumped’ which means that data transfer occurs four times in a clock cycle at the Low to High transition, the High level, the High to Low transition and the Low level. This makes it equivalent to a 400 MHz clock.
  3. Execution Trace Cache: In Netburst, there is no L1 instruction cache. Instead a ‘trace cache’ is used. This cache does not store instructions as such; instead, it has decoded micro-ops stored as program-ordered sequences called ‘traces’. This means that this cache is placed after the instruction decoder. What advantage does this cache give? The trace cache is a complex piece of hardware, but a simple explanation of its operational advantage can be given as follows.

    Let us consider a case in which looping occurs such that the same set of instructions are to be executed repeatedly. Normally, instructions are taken from each L1 cache; they are decoded and then executed. Thus, the same set of instructions are repeatedly fetched and decoded. However, here, since the micro-ops are available in the trace cache, no re-decoding is necessary. Further, in normal cases if a branch misprediction occurs, the execution must go back all the way to the point of the wrongly predicted branch and this includes decoding again. Here, since the micro-ops of the sequence are available in the trace cache, a considerable amount of delay is reduced. Since the micro-ops are cached in the trace cache (and reused when necessary), the decode bandwidth is adequate to match the instruction issue rate (three micro-ops/cycle). The trace cache of Pentium-IV can store up to 12 K micro-ops.

    There is a dedicated Branch Target Buffer (BTB) associated with the trace cache to handle the activities for which it has been designed. This branch predictor gives directions as to where instruction fetching needs to go next in the Trace Cache. This Trace Cache predictor (labeled Trace BTB in Figure 17.5) is smaller than the front-end predictor, since its main purpose is to predict the branches in the subset of the program that is currently in the Trace Cache. The Trace-Cache BTB, together with the front-end BTB, uses a highly advanced branch prediction algorithm that reduces the branch misprediction rate by about 1/3 compared to the predictor in the P6 microarchitecture.

    Figure 17.5 | Basic block diagram depicting the Netburst microarchitecture

  4. Rapid Execution Engine: Arithmetic Logic Units (ALUs) run at twice the processor frequency; this means that the integer units operates at 3 GHz for a processor with a system clock speed of 1.5 GHz.

17.5.1 | Microcode ROM

While you examine block diagrams of various Intel processors, it is likely that you notice a block named Microcode ROM. What exactly is this and why it is used? x86 instructions have different levels of complexity and different lengths as well. After decoding, each instruction is converted to different numbers of micro-ops. Most instructions may be converted 3 micro-ops or less, but there are a few which will be converted to more than 3 micro-ops.

In Netburst, the micro-ops of instructions that are decoded to 3 micro-ops or less are stored in the trace cache. However, complex instructions that are decoded to 4 or more micro-ops are stored in a ‘Microcode ROM’. It is highly likely that such instructions are the infrequently used complex instructions. For example, a string instruction is one that is used infrequently. Its code gets decoded to more than 3 micro-ops. When such a complex instruction is encountered, the required micro-ops are taken from the Mircocode ROM, rather than from the Trace Cache.

In effect, the micro-ops from the trace cache and the Microcode ROM are buffered in a simple, ‘in-order’ micro-op queue that helps smoothen the flow of ops going to the ‘out of-order’ execution engine. This sort of an arrangement is done, so that the ‘regular’ instruction micro-ops are packed uniformly in the trace cache and the infrequent ones are not allowed to disrupt the regularity of the trace cache. Figure 17.6 shows the trace cache and the Microcode ROM for the Netburst microarchitecture.

Figure 17.6 | Micro-op sequencing in Netburst

17.5.2 | Hyperthreading

It was in Pentium-IV CPUs that SSE2 and SSE3 instruction sets were introduced to accelerate media processing, 3D graphics, floating point operations, etc. Later, the versions of Pentium-IV featured Hyperthreading Technology (HTT), a feature that makes one physical CPU to work as two logical CPUs.

What is hyperthreading?

It is Intel’s term for ‘Simultaneous Multithreading (SMT)’ widely discussed in computer architecture literature.

In this context, a thread is understood to be the same as a task/process; this means that it is a ‘program which is running’. A single core can run only one thread, but when executionresources are duplicated, two threads can run simultaneously. In Intel’s Hyperthreading (HT) technology, all the execution resources are not replicated. Some resources like caches, execution resources and buses are shared, but each logical core has its own architectural state and its own general purpose and control registers. HT has to be enabled using BIOS, and needs OS support. An OS that supports multithreading can see two logical cores when two threads are running simultaneously. Hyperthreading has the advantage that if one thread stalls, another thread can take up the execution resources completely.

17.5.3 | Problems Encountered by Netburst

Pentium-IV was the first processor with Netburst. Many other Netburst-based processors followed soon after.

However, this microarchitecture did not turn out to be the huge success it was envisaged to be. Netburst was targeted for high-clock frequency operations and it could operate at these frequencies, but a considerable amount of heat was generated and this turned out to be the undoing of the processor. Intel stopped using Netburst by around 2004, and diverted its design efforts in more promising directions.

17.6 | The Continued Dominance of x86

Around the turn of the century, when the x86-32 ISA was in dominance, Intel started a review of its x86 philosophy and considered the idea of abandoning it for an entirely new type of ISA. With this plan in mind, it teamed up with HP and started the design of a processor that is based on the Very Large Instruction Word (VLIW) philosophy and the EPIC (Section 15.4) microarchitecture. The outcome was a 64-bit processor named ‘Itanium’ released in 2001. However, it turned out to be a disappointment in terms of performance. It did not perform as well as the Pentium-IV, which had been released a year back. Intel decided not to put it up in the desktop segment and instead found a space for it in the server market.

Another development that occurred around the same time was the release of an AMD processor with x86-64 architecture. The processor was named K8. The release of this processor made Intel give up its plan of abandoning x86. Thus, to live up to the market competition, Intel was forced to develop its own x86-64 version. The first 64-bit x86 processor of Intel was Xeon that was based on Netburst. This was launched in 2004. With this, the competition in the market for x86-64 architectures scaled up and Pentium D and Pentium Extreme edition based on Netburst followed soon after. Meanwhile AMD and VIA also released newer versions of x86-64 and this kept the x86 architecture running. In short, we can safely conclude that the x86 ISA is here to stay for many more years. Intel designates its x86-32 architecture as IA-32 and its x86-64 architecture as Intel 64.

17.7 | ‘Core’ Microarchitecture

Intel was forced to withdraw its Netburst microarchitecture because of heating problems. The next version of its microarchitecture was named ‘Core’ that was a re-architected version of P6. The 32-bit version of this was named ‘Core’ and 64-bit versions were called ‘Core2’. Please note that it is the name of the microarchitecture that is called ‘Core’. There is bound to be a bit of confusion regarding the use of the word ‘core’ as we will see in ensuing sections.

17.8 | Multicore Processors

Around this time, multicore processors (Section 16.1) made their foray into the field. Intel’s first multicore processor was the dual-core Pentium D. The company’s Pentium D processors consisted of two Pentium-IV cores placed closely together on a single silicon die. Later, processors were mostly multicore versions such as Core duo, Core2 duo and Core2 quad. At present, the microprocessor market is flooded with multicore processors most of which are either dual core (duo) or four core (quad). Manufacturers can offer a specific product with either dual or quad-core versions. Thus, we may opt for a Core2 duo or Core2 quad, based on the quantum of performance we want. Both of them have the same ‘Core2’ microarchitecture, but the quad-core version delivers higher performance, though with a higher power dissipation factor.

17.9 | Nehalem Microarchitecture

The next major architectural change came with the release of the ‘Nehalem’ microarchitecture in 2008. Nehalem chips are multicore ‘chip multiprocessors’ with some major architectural enhancements, which necessitate more discussion about it.

The first and foremost change is that Nehalem incorporated an integrated memory controller, i.e., the memory controller is brought inside the chip, rather than having it in another chip on the motherboard. This allowed memory to be accessed directly from the CPU and that made memory accesses very fast. In previous processors, memory accesses were subject to the speed limitations of a ‘Front Side Bus’. Nehalem also paved the way for the use of a ‘Quick Path Interconnect’ instead of the FSB. These ideas are made clearer in Section

The second change brought about in Nehalem was to integrate a ‘Level 3 cache’ into the chip. Previous processors had to use an L3 cache on the motherboard. This was also a welcome change.

Figure 17.7 shows a four-core processor based on the Nehalem microarchitecture. Note the Integrated Memory Controller (IMC) interfaced to a dual channel (Section 18.) DDDR3. Further, note the presence of two QPI links.

Figure 17.7 | A quad-core Nehalem processor

  1. A Nehalem chip is divided into two broad domains, namely, the ‘core’ and the ‘uncore’.
  2. Components in the core domain operate with the same clock frequency as that of the actual computation core.
  3. The uncore domain operates with a different clock frequency.

With the release of the Nehalem microarchitecture, a new series of processor names also started to be used. They are Core i3, Core i5 and Core i7. Core i3 is meant for relatively low end applications, while Core i7 is for high end applications.

The Nehalem microarchitecture is designated as the ‘first generation’ of the Core processor family.

17.10 | Sandy Bridge and IvyBridge

The second generation of the Core (i3, i5 and i7) processors is based on the Sandy Bridge microarchitecture, which was released in 2011. What Sandy Bridge has to offer as enhancements are the integration into the chip of the ‘graphics controller’ and PCIe (PCI Express) controller. In 2012, a smaller die version of the Sandy Bridge was released and it was named ‘IvyBridge’. This is the ‘third generation’ Core processor.

17.11 | Fourth-Generation Core Processor Family

Haswell, released in 2013 is the ‘fourth’ generation, while ‘Broadwell’ scheduled for release in 2014 is the fifth generation. What do these microarchitectures have to offer in terms of improved performance? An improved graphics performance is the first feature advertised, which is followed by a second more important aspect. This is its TDP.

TDP stands for ‘Thermal Design Power’. It is expressed in watts and represents the maximum amount of power the cooling system in a computer must dissipate. TDP does not represent the maximum wattage the system can withstand, but rather the maximum power it would draw when running the applications for which it is designed for.

Over the years, this factor has become more and more important as computation has shifted from desktop systems to mobile systems, i.e., laptops. To ensure that battery power is conserved as much as possible, it is necessary that all parts of the system work at low power levels. In this context, designing processors with lower and lower TDPs is the competition in the market. Haswell is available in different versions, i.e., for desktops, laptops, tablets, etc. More details of power management is discussed in section 18.4.4.

One arena of application envisaged for Haswell is Intel’s ‘UltraBook’, which is an Intel notebook meant for high performance; however, it is made to be extremely portable by virtue of its form factors. Intel specifies that UltraBooks are to be thin – 0.9 inches at the maximum. Given below are the lowest TDP classes of Haswell and the application scenarios envisaged for each of them.

  1. 13.5 W and 15 W TDP classes: Haswell-ULT (For Intel's UltraBook platform.)
  2. 10 W TDP class: Haswell-ULX (For tablets and certain UltraBook-class imple­mentations.)


  1. ULT = Ultra Low TDP
  2. ULX = Ultra Low eXtreme TDP.
  3. Haswell-ULT and Haswell-ULX are to be available in dual-core only (for low power requirements)
  4. All other versions will be available in dual-core or quad-core variants.

17.12 | Important Technological Features in IA Processors

Now let us take a look at some of the terms associated with Intel architecture that are frequently encountered in system descriptions. We choose only a few of them.

  1. Virtualization: Intel Virtualization Technology (VT) allows systems to work with more than one operating system. It makes a system appear as multiple systems to software. Intel-VT provides hardware-assisted support for this feature.
  2. Turbo Boost: Intel Turbo Boost technology allows the processor core to run ‘automatically and opportunistically’ at a higher clock frequency if it is operating within its temperature, power and current limits. Increasing the clock frequency can raise performance and it is done as and when needed. It adds a factor of ingenuity to performance improvement (but at the cost of higher power dissipation).
  3. Hyperthreading: Intel Hyperthreading Technology has already been covered in Section 17.5.2.
  4. Advanced vector extensions: This is the latest extension of the IA instruction set. It uses 256-bit vector data to perform operations of integer, floating point, fused multiply add instructions, etc. This is useful for math and DSP operations.

17.13 | Nanometer Technology

These days, it is not uncommon for processor manufacturers to refer to their products in terms of ‘nanometer technology’. We hear of processors and other products belonging to say 45 nm, with the next version being of 32 nm and then of 22 nm, 14 nm etc. What exactly does this term mean, and what are the implications of the lower numbers quoted? Let us find out.

A trend that has kept the semiconductor technology going is miniaturization. Producing more powerful processors that are smaller in physical size is continually aimed for. Besides having smaller processor dies, there is the advantage of being able to operate at lower voltages and consequently obtain lower power dissipation.

Examples of technology generations are 180 nm, 130 nm, 90 nm, 65 nm, 45 nm, 32 nm, 22 nm, etc. A technology node name is roughly related to the minimum feature size possible. It refers to the average half-pitch (i.e., half the distance between identical features) of a memory cell at the specified technology level. Besides line width, some other parameters such as the MOSFET gate oxide thickness, gate length and power supply voltage are also reduced with scaling (reduction in dimensions). The reductions are chosen such that the transistor current density increases with each new node. Further, the smaller transistors and shorter interconnects lead to smaller capacitances, which lead to a drop in circuit delays. Historically, integrated circuit speed has increased roughly 30% at each new technology node.

In short, the specific ‘nanometer technology’ is roughly related to the size of the MOS transistors, which are the basic units used in any chip. We say it is the technology node name. For example, we use the words 22 nm technology, 45 nm technology, etc. Over the years, as hardware complexity increased, the number of transistors comprising any processor chip has also increased. This is compensated by a reduction in the size of transistors (scaling). This has made chips smaller even as the transistor count has gone up.

As an example, let us examine the different versions of Pentium-IV. The version released in Nov 2000 was of 180 nm technology. This was followed by versions with 130 nm, 90 nm and 65 nm released in the years 2002, 2004 and 2006, respectively.

Table 17.3 shows this trend from the Nehalem microarchitecture onwards.

Table 17.3 Intel’s Tick-Tock

Intel calls this sequence as ‘tick-tock’. The same microarchitecture in a smaller die version is a tick, while a new microarchitecture is called a tock. Thus, Sandy Bridge is a tock because it delivers a new microarchitecture, while IvyBridge is a tick and it advances manufacturing technology. Haswell is a tock, while Broadwell is to be tick.

17.14 | Difference Between Core i3, i5 and i7 Processors

In a nutshell, the application field for the three types of processors are given in the following:

Intel Corei7: High-end

Intel Corei5: Mainstream

Intel Core i3: Entry-level

However, in what way are the three different? For the same microarchitecture, they differ by the features included and the performance envisaged. There are no hard and fast rules regarding the differences, but the following points offer a guideline.

  1. Core i5 and i7 CPUs usually have 4 cores, while i3 CPUs only have 2.
  2. Core i5 and i7 usually operate at higher frequencies than i3.
  3. Both Core i5 and i7 have Turbo Boost, while Core i3 does not.
  4. All CPUs have the same graphics usually, although the speed of that graphics will depend on the individual CPU.
  5. Core i3 has smaller L3 caches compared to the other two.
  6. Core i5 does not support hyperthreading.

Conclusion: With this, we conclude our discussions on the x86 processors of Intel, which are used in PCs and servers. We started with the first x86 processor, i.e., 8086; and we have concluded with the most recently released one, which is Haswell. Looking ahead, many changes are expected in the field of computing. The most important one is the shift from PCs to tablets, i.e., from personal computers to very personal computers. The next is that embedded devices for every need and necessity are being designed. Intel has brought out its offering in the field of embedded systems and tablets; that is, the Atom Processor, which will be the topic of the next chapter.


  • The Pentium Pro was the sixth generation x86 processor and it was a very great success compared to Pentium.
  • Some processors based on P6 microarchitecture had MMX and SSE instructions.
  • MMX and SSE are meant for faster DSP operations.
  • MMX does not add new registers but SSE adds a new set of registers.
  • The Netburst microarchitecture followed P6.
  • Many processors were released with Netburst, but because of heating issues, it turned out to be an unsuccessful microarchitecture.
  • The Core i7, i5, d i3 processes names started with the Nehalem microarchitecture.
  • Sandy Bridge and IvyBridge are the second and third generations.
  • Haswell is the fourth generator and Broadwell is to be the fourth generations
  • Intel follows its Tick-Tock model of design.


  1. Which was the x86 microarchitecture that followed Pentium?
  2. Name three processors that were based on Netburst.
  3. Name four features of Pentium Pro that were the reasons for Pentium pro’s dramatic performance improvement over Pentium.
  4. Which was the first processor to have MMX?
  5. How does MMX make computation faster for media data?
  6. In what way is the SSE programming environment different from that of MMX?
  7. What is meant by a quad-pumped system bus?
  8. How does the content of Microcode ROM differ from that of the trace cache?
  9. Which were first x86-64 processors of Intel and AMD?
  10. How does TDP matter for a computer system?
  11. What is Itanium and where is it used now?
  12. Name two important enhancements introduced in Nehalem.


    1. What is the sum obtained by adding the two sets of eight packed bytes (unsigned) given below?

      SET 1: FE 34 56 78 ED C4 09 65

      SET 2: 34 E5 C7 05 56 23 56 AE

      Hint: Recollect the principle of saturation arithmetic

    2. What will be the result if the bytes are ‘signed’?
  1. Make a study of Intel’s UltraBook in terms of concept, implementation, market penetration and the names of the vendors offering this product. Name a few notebooks that offer stiff competition to UltraBook.
  2. Find out the type of transistors used in Haswell and the plus point of these kinds of transistors.