       Intel Pentium Processor Technical Backgrounder
                         March 1994
                     Intel Corporation
-----------------------------------------------------------------
Intel's Pentium Processor: World's Best Performance for all PC Software

The Pentium processor family includes the highest performance members of
Intel's family of microprocessors. The Pentium processor family includes
the Pentium processor at iCOMP index 510/60 MHz, Pentium processor at
iCOMP index 567/66 MHz, Pentium processor at iCOMP index 735/90 MHz, and
Pentium processor at iCOMP index 815/100 MHz. While incorporating new
features and improvements made possible by advances in semiconductor
technology, the Pentium processor is fully software compatible with
previous members of the Intel microprocessor family, preserving the value
of users' software investments.

The Pentium processor incorporates a superscalar architecture, improved
floating point unit, separate on-chip code and write-back data caches,
64-bit external data bus, and other features designed to provide a
platform for high-performance computing. The newest members of the Pentium
processor family, the Pentium processor at iCOMP index 735/90 MHz and the
Pentium processor at iCOMP index 815/100 MHz, include additional features
such as SL technology for power management, an on-chip multiprocessor
interrupt controller and dual processor mode.

The State of Processor Design Art

In recent years, developments in the art of semiconductor design and
manufacturing have made it possible to produce increasingly more powerful
microprocessors in smaller and smaller packages. Chief among these
developments has been the decreasing size of transistors. Microprocessor
designers are now working with BiCMOS (bipolar complementary metal-oxide
semiconductor) process technology with features of less than a micron
(one-millionth of a meter) in size.

The use of sub-micron devices allows designers to fit more of them on a
chip. The number of transistors in each member of the Intel microprocessor
family has continued to grow, culminating in the Pentium processor
(510/60, 567/66) implemented in 5V, 0.8 micron BiCMOS technology with 3.1
million transistors and the Pentium processor (735/90, 815/100)
implemented in 3.3V, 0.6 micron BiCMOS technology with 3.3 million
transistors.

The increase in transistors has made it possible to integrate components
that were previously external to the processor (such as math coprocessors,
caches and multiprocessor interrupt controllers) and place them on-board
the chip. Placing components on-board decreases the time required to
access them and increases performance dramatically. Another way to
decrease the distance between components (and therefore increase the speed
of communication between them) is to provide multiple levels of metal for
interconnection. Intel's current 0.6 micron BiCMOS microprocessor
technology utilizes a 4 metal layer, the layout of which requires special
computer-aided design tools.

The Pentium processor utilizes the latest in microprocessor design
technology to provide performance comparable to that of alternative
architectures used in scientific and engineering workstations, while
maintaining compatibility with the immense installed base of software now
available for the Intel family of microprocessors.

Intel's Microprocessor family

The history of the personal computer industry is intimately associated with
the history of Intel's microprocessor family. In 1985, Intel introduced
the ground-breaking Intel386 DX processor, a 32-bit microprocessor that
executed 3 to 4 million instructions per second (MIPS). Available in
speeds ranging from 16 MHz up to 33 MHz, the 80386 addresses up to 4
gigabytes of physical memory, and up to 64 terabytes of "virtual memory"
(a technology borrowed from mainframe computers that allows systems to
work with programs and data larger than their actual physical memory).

The 80386 provided for true, robust multitasking and the ability to create
"virtual 8086" systems, each running securely in its own l-megabyte
address space. Like its predecessors, the i386 DX microprocessor spawned a
new generation of personal computers, which had the ability to run 32-bit
operating systems and ever-more complicated applications, all the while
maintaining compatibility with previous members of the Intel family.

In 1989, Intel shipped the Intel486 DX microprocessor, which incorporated
an enhanced 386-compatible core, math coprocessor, cache memory, and cache
controller--a total of 1.2 million transistors--all on a single chip.
Operating at an initial speed of 25MHz, the Intel486 DX processor
processed up to 20 MIPS. At its current peak speed of 50 MHz, the Intel486
DX processor processes up to 41 MIPS. By incorporating RISC principles in
its processor core (specifically, instruction pipelining), the Intel486 DX
processor is able to execute most instructions in a single clock cycle. In
spite of these powerful new features, the Intel486 DX microprocessor
maintains full software compatibility with previous members of the Intel
family, thereby preserving customers' investment in software.

With the 1992 introduction of the Intel486 DX2 microprocessor, Intel
increased the speed of the 486 family by as much as 70 percent. The DX2
family features a technology called "speed doubling," which allows the
processor to operate twice as fast internally as externally. The Intel486
DX2 processor is also pin-compatible with the Intel486 DX processor. At
its current peak speed of 66 MHz, the Intel486 DX2 processor executes up
to 54 MIPS.

The Pentium processor is the next step in Intel's commitment to provide the
highest possible performance at the best price, while maintaining software
compatibility with previous Intel processors.

First Superscalar Compatible Processor

The heart of the Pentium processor is its superscalar design, built around
two instruction pipelines, each capable of performing independently. These
pipelines (named the u and v pipes) allow the Pentium processor to execute
two integer instructions in a single clock cycle, nearly doubling the
chip's performance relative to an Intel486 chip at equal frequency.

The Pentium processor's pipelines are similar to the single pipeline of the
Intel486 processor, but they have been optimized to provide increased
performance. Like the Intel486 processor's pipeline, the pipelines in the
Pentium processor execute integer instructions in 5 stages: Prefetch,
Instruction Decode, Address Generate, Execute, and Write Back. When an
instruction passes from Prefetch to Instruction Decode, the pipeline is
then free to begin another operation.

In many instances, the Pentium processor can issue two instructions at
once, one to each of the pipelines, in a process known as " instruction
pairing. " In this case, the instructions must both be "simple", and the
v-pipe always receives the next sequential instruction after the one
issued to the u-pipe. Each pipeline has its own ALU (arithmetic logic
unit), address generation circuitry, and interface to the data cache.

While the Intel486 microprocessor incorporated a single 8 Kbyte cache, the
Pentium processor features two 8K caches, one for instructions and one for
data. These caches act as temporary storage places for instructions and
data obtained from slower, main memory; when a system uses data, it will
likely use it again, and fetching it from an on-chip cache is much faster
than fetching it from main memory.

The Pentium processor's caches are 2-way set-associative caches, an
improvement over simpler, direct-mapped designs. They are organized with
32-byte lines, which allows the cache circuitry to search only 2 32-byte
lines rather than the entire cache. The use of 32-byte lines (up from
16-byte lines on the 486 DX) is a good match of the Pentium processor's
bus width (64 bits) with burst length (4 chunks).

When the circuitry needs to store instructions or data in a cache that is
already filled, it discards the least recently used information (according
to an "LRU" algorithm) and replaces it with the information at hand.

The data cache has two interfaces, one to each of the pipelines, which
allows it to provide data for two separate operations in a single clock
cycle. When data is removed from the data cache (and only then), it is
written into main memory, a technique known as write-back caching.
Write-back caching provides better performance than simpler write-through
caching, in which the processor writes data to external memory each time
it writes data to its internal cache (though the Pentium processor can be
dynamically configured to support write-through caching). To ensure that
the data in the cache and in main memory are consistent with one another
(especially a concern with multiprocessor systems), the data cache
implements a cache consistency protocol known as MESI. This protocol
defines four states (Modified, Exclusive, Shared, Invalid), which are
assigned to each line of the cache based on actions performed on that line
by a microprocessor. By obeying the rules of the protocol during memory
read/writes, the Pentium processor maintains cache consistency and
circumvents problems that might be caused by multiple processors using the
same data.

The use of separate caches for instructions and data works in conjunction
with other elements of the Pentium processor's design to provide increased
performance and faster throughput compared to the Intel486 microprocessor.
For example, the first stage of the pipeline is Prefetch, during which
instructions are obtained from the instruction cache. With a single cache,
conflicts might occur between instruction prefetches and data accesses.
Providing separate caches for instructions and data precludes such
conflicts and allows both operations to take place simultaneously.

The Pentium processor also increases performance by using a small cache
known as the Branch Target Buffer (BTB) to provide dynamic branch
prediction. When an instruction leads to a branch, the BTB "remembers" the
instruction and the address of the branch taken. The BTB uses this
information to predict which way the instruction will branch the next time
it is used, thereby saving time that would otherwise be required to
retrieve the desired branch target. When the BTB makes a correct
prediction, the branch is executed without delay, which enhances
performance.

The combination of instruction pairing and dynamic branch prediction can
speed operations considerably. For example, a single iteration of the
classic Sieve of Eratosthenes benchmark requires 6 clock cycles to execute
on the Intel486 microprocessor. The same code executes in only 2 clock
cycles on the Pentium processor.

Improved Floating Point Unit

The floating point unit in the Pentium processor has been completely
redesigned over that in the Intel486 microprocessor. It incorporates an
8-stage pipeline, which can execute one floating point operation every
clock cycle. (In some instances, it can execute two floating point
operations per clock--when the second instruction is an Exchange.)

The first four stages of the FPU pipeline are the same as that of the
integer pipelines. The final four stages consist of a two-stage Floating
Point Execute, rounding and writing of the result to the register file,
and Error Reporting. The FPU incorporates new algorithms that increase the
speed of common operations (such as ADD, MUL, and LOAD) by a factor of 3
times.

Performance Improvements

The Pentium processor's new architectural features--its superscalar design,
separate instruction and data caches, write-back data caching, branch
prediction, and redesigned FPU--will enable the development of new
applications software, in addition to improving the performance of current
applications in a manner that is completely transparent to the end user.

The external data bus to memory is 64-bits wide, doubling the amount of
data that may be transferred in a single bus cycle from a Intel486
processor. The Pentium processor supports several types of bus cycles,
including burst mode, which loads large (256-bit) portions of data into
the data cache in a single bus cycle. The 64-bit data bus allows the
Pentium processor to transfer data to and from memory at rates up to 528
Mbyte/second, a more than 5-fold increase over the peak transfer rate of
the 66 MHz Intel486 DX2 processor (105 Mbyte/second).

Several instructions (such as MOV and ALU operations) have been hardwired
into the Pentium processor, which allows them to operate more quickly. In
addition, numerous
microcode instructions execute more quickly due to the Pentium processor's
dual pipelines. Finally, the Pentium processor features an increased page
size, which results in less page swapping in larger applications.

The result of the Pentium processor's new architectural features and
enhancements to the Intel486 architecture is performance improvement
ranging from 3 to 5 times (5 to 10 times for floating point intensive
applications) when compared to a 33 MHz Intel486 DX processor and 2.5
times when compared to the 66 MHz Intel486 DX2 processor.

Such dramatic performance improvements will meet the demands of computing
in a number of areas: advanced multitasking operating systems that support
graphical user interfaces, such as Windows NT*, OS/2*, and new Unix
implementations; compute-intensive graphics applications such as 3-D
modeling, computer-aided design/engineering (CAD/CAE), large-scale
financial analysis, high-throughput client/server; handwriting and voice
recognition; network applications; virtual reality; electronic mail that
combines many of the above areas; and new applications yet to be
developed.

The Pentium processor also provides performance monitoring features that
will make it easier for developers to take fullest advantage of the
Pentium processor's superscalar architecture. System developers will be
able to monitor the "hit rates" of the instruction and data caches, as
well as the length of time the Pentium processor spends waiting for the
external bus, which will help in the optimal design of external memory.
The ability to measure address generation interlocks and parallelism will
help compiler authors develop the most effective methods for instruction
scheduling.

Data Integrity

The Pentium processor employs a number of techniques to maintain the
integrity of the data with which it is working. Error detection is
performed on two levels: via parity checking at the external pins; and
internally, on the on-chip memory structures (cache, buffers, and
microcode ROM.)

For situations where data integrity is especially crucial, the Pentium
processor supports Functional Redundancy Checking (FRC). FRC requires the
use of two Pentium chips, one acting as the master and the other as the
"checker". The two chips run in tandem, and the checker compares its
output with that of the master Pentium processor to assure that errors
have not occurred. The use of FRC results in an error detection rate that
is greater than 99 percent.

The Pentium processor includes a number of built-in features for testing
the reliability of the chip. These include: a Built-In Self Test that
tests 70 percent of the Pentium processor's components upon resetting the
chip; an implementation of the IEEE 1149.1 standard (Test Access Port and
Boundary Scan Architecture), which provides a standard interface for
manufacturers to test the external connections to the Pentium processor;
and Probe Mode, which provides access to the software visible Pentium
processor registers for the purpose of determining the current state of
the processor.

SL Enhanced Power Management Features

The Pentium processors (735/90, 815/100) incorporate new SL technology
features for superior power-management capabilities. These features
operate at two levels: at the system level, controlling the way power is
used by the entire system (including peripherals); and at the
microprocessor level. Power management at the processor level involves
putting the processor into low power state during non-processor intensive
tasks (such as word processing), or into a very low-power state when the
computer is not in use ("sleep" mode).

Intel's SL technology centers around SMM (system management mode) to
control power at the system level. This mode provides intelligent system
management that allow the microprocessor to slow down, suspend, or
completely shut down various system components so as to maximize energy
savings. All members of the Pentium processor family include SMM.

The Pentium processors (735/90, 815/100) are implemented with fully static,
0.6 micron, 3.3V, BiCMOS process technology. The static design allows the
processor frequency to be reduced to 0 MHz, where the processor uses very
little power. 3.3V technology and design enhancements result in typical
power consumption of four watts or less for the Pentium processors
(735/90, 815/100).

Other SL technology features on the Pentium processors (735/90, 815/100)
are Stop Clock and Auto Halt. Stop Clock is a microprocessor input that
provides fine-tuned control over the processor's clock frequency, enabling
a variety of energy-conservation techniques. When Stop Clock is enabled,
the internal frequency of the processor can be lowered to 0 MHz. Auto
Halt, when executed, provides a HALT and the processor automatically
enters its low-power sleep mode.

SL technology allows computer manufacturers to design intelligent
power-management features in hardware, making the feature independent of
software. Power management becomes an integral part of the system,
regardless of what operating system or application is being used. Power
management works better because SL Technology isolates the
power-management features from conflicting with software.

Multiprocessor Support

All members of the Pentium processor family include write-back data caches
and the MESI protocol to support multiprocessor systems. Write-back caches
reduce processor writes resulting in less bus contention between multiple
processors. The MESI protocol is used to maintain cache consistency among
several processors. The Pentium processors (735/90, 815/100) include two
new multiprocessor features, a multiprocessor interrupt controller on-chip
and the dual processor mode.

The Pentium processor's (735/90, 815/100) on-chip MP interrupt controller
is called the Advanced Programmable Interrupt Controller (APIC) and can
support up to 60 processors. The APIC supports symmetric multiprocessing
meaning that all processors look equal to the operating system.
Multiprocessor operating systems such as Windows NT, OS/2, and new Unix
implementations support the symmetrical multiprocessor model. Intel's APIC
architecture requires that a system include a single I/O APIC and a local
APIC for each processor. The local APIC is integrated onto the Pentium
processors (735/90, 815/100).

The Pentium processors (735/90, 815/100) also include the dual processor
mode. Dual processor mode enables two processors to share a single
second-level cache for a low-cost multiprocessor system. Pentium
processors (735/90, 815/100) include on-chip logic to maintain cache
consistency between the two processors and to arbitrate for the common bus
to the second-level cache. The on-chip APICs handle interrupts. A single
processor system design can be made multiprocessor-ready by adding a
second socket, an I/O APIC to the chipset, and a few simple BIOS changes.

Dual Processor Mode enable low cost, shared cache, multiprocessor systems
for workstations and low-end servers. A dedicated cache for each processor
is required for maximum performance in high-end application servers.

High Performance While Maintaining Compatibility

The Pentium processor family is a high-performance microprocessor family
that incorporates the latest state-of-the-art design principles to meet
the needs of today's applications and newly developing areas of
applications software, while nevertheless maintaining complete
compatibility with the $50 billion installed base of software currently
running on members of the Intel family.

Users will experience dramatic performance improvements while running their
current software, and can anticipate new applications that take advantage
of the Pentium processor's high-performance features.

 ============================================================
 From the  'New Product Information'  Electronic News Service
 ============================================================
 This information was processed from data provided by the
 above mentioned company. For additional details, contact 
 the company at the address or telephone number indicated.
 OmniPage Pro is now used for converting all printed input! 
 ============================================================
 All submissions for this service should be addressed to:
 BAKER ENTERPRISES,  20 Ferro Dr,  Sewell, NJ  08080  U.S.A.
 Email: RBakerPC (AOL/Delphi), rbakerpc@delphi.com (Internet)
 ============================================================
