# SR1-GX 64-bit Superscalar RISC CPU Core with Vector3D™ Media Extensions

#### **OVERVIEW**

SandCraft's SR1-GX is a high-performance, low-power processor core for consumer system-on-chip designs targeted at advanced digital set-top boxes, digital TV, 3D game consoles, thin client, and digital information appliance applications. It is based upon the MIPS IV™ ISA with additional extensions, such as integer multiply and multiply-accumulate instructions for DSP functions and new Vector3D™ media instructions for high floating point performance.

The SR1-GX core architecture is designed to be extendable and portable across different process technologies and foundries. When implemented in a 0.18 µm technology, it enables up to 800 Dhrystone 2.1 MIPS integer performance and 1.6 GFLOPs floating point performance, in designs running at 400 MHz. Depending on the specific process technology, a total core die area of between 21 mm² to 29 mm² (including 32 Kbytes of cache and floating point unit) is achievable, with power dissipation of less than 1.4 W at 400 MHz and a supply voltage of 1.8V.

#### **KEY FEATURES / BENEFITS**

#### **Dual-issue Superscalar Pipelines**

- 2 instructions fetched per cycle
- Out-of-order instruction dispatch (peak: 4 integer and 2 floating point instructions per cycle)
- Multiple buffers and queues for de-coupling instruction fetch, dispatch, execution, and commit stages
- Improves instruction execution parallelism and efficiency for high throughput

## Highly Efficient Dynamic Branch Prediction

- 3K entry branch history table with 2-bit saturating counter
- Keeps pipeline full by minimizing branch mis-predicts

#### **Primary Features**

#### Vector3D™ Multimedia Instructions

- Enhanced floating point architecture
- SIMD instructions (e.g., 2 singleprecision multiply-adds per cycle)
- → 1.6 GFLOPS performance at 400 MHz

#### Media Link™ Bus Interface

- 64-bit, bi-directional link to independent co-processors: FPU, customer-defined vector and media units
  - De-coupled for performance

#### Configurable Caches

- 8K/16K/32K/64K bytes
- Direct-mapped/2-way/4-way/8-way set-associative
- **→** Flexible options

#### SR1-GX Block Diagram





### **Enhanced Floating-Point** Architecture Fully-pipelined for high performace Meets IEEE-754 FP specification Single and double-precision support in hardware 4-clock latencies and 1-cycle repeat rates for add.s, mul.s, and madd.s operations Industry-leading floating point performance Vector3D™ Media Extensions Media extensions to MIPS IV<sup>™</sup> floating point instructions Enables SIMD-type operations on two singleprecision floating point values simultaneously (e.g., 2 multiply-adds per cycle) Enables 1.6 GFLOPS at 400 MHz Allows efficient mixing of vector and scalar computations 64 single-precision registers available Efficient data streaming supported Accelerates most floating-point intensive media applications such as graphics geometry processing and audio processing Media Link ™ Interface On-chip 64-bit interface to floating point unit or custom vector or DSP units Permits specialized instructions and commands to be defined and transmitted across the bus Operates at the processor pipeline frequency (400 MHz) De-coupled from integer core for better performance Separate set of instruction and data buffers Extendable interface enables customization to a specific application Non-blocking Load/Store unit Up to 4 non-blocking loads and stores or Up to 4 cache data prefetches Minimizes effects of cache misses and improves performance

#### **Enhanced DSP Support**

- 16 High-performance Integer Multiply and Multiply accumulate Instructions
  - -64-bit multiply: latency 6 cycles, repeat rate 4 cycles
    - -32-bit multiply: latency 4 cycles, repeat rate 2 cycles
    - -32-bit Macc: latency 4 cycles, repeat rate 2 cycles
    - -16-bit Macc: latency 3 cycles, repeat rate 1 cycle
- 32-bit and 64-bit rotate right instructions
- Count leading zeroes and ones for data normalization
- Support for saturating arithmetic
- Useful for image processing, DSP functions, audio and video processing

#### **Test and Debug Features**

- Supports N-wire/N-trace debug features
  - Full external access to processor state and system memory
  - Ćan set multiple breakpoints on instruction address, data address, and data value
  - Single-step through code
- Instruction trace capabilities
- Debug instructions
- IEEE 11491 JTAG
- Built-in-Self-Test (BIST) for caches
- Automatic-Test-Pattern-Generation (ATPG)
- Speeds system debug and bring-up, improves test coverage

#### R5000™ Compatible SysAD System Bus

- 4-entry transaction buffer
  - -256 bits
  - -Holds up to 4 outstanding read or uncached write operations
- Split response on read transactions
- Interleaved write operations between read request and response
- Out-of-order completion of outstanding read requests
- Improves system performance

© 1999, SandCraft, Inc. Rev. 01199 1

TM - Media Link, Vector3D, and "Engines for the Digital Age" are trademarks of SandCraft, Inc. MIPS IV and R5000 are trademarks of MIPS Technologies, Inc. N-Wire/N-Trace is a trademark of Hewlett-Packard Company.