NVIDIA TESLA PERSONAL SUPERCOMPUTER



The Tesla Personal Supercomputer is a desktop computer that is backed by Nvidia and built by Dell, Lenovo and other companies.
                                                          
 It is meant to be a demonstration of the capabilities of Nvidia's Tesla GPGPU brand; it utilizes NVIDIA's CUDA parallel computing architecture and powered by up to 960 parallel processing cores, which allows it to achieve a performance up to 250 times faster than standard PCs, according to Nvidia.

At the heart of the new Tesla personal supercomputer are three or four Nvidia Tesla C1060 computing processors, which appear similar to a high-performance Nvidia graphics card, but without any video output ports.

At the heart of the new Tesla personal supercomputer are three or four Nvidia Tesla C1060 computing processors, which appear similar to a high-performance Nvidia graphics card, but without any video output ports. Each Tesla C1060 has 240 streaming processor cores running at 1.296 GHz, 4 GB of 800 MHz 512-bit GDDR3 memory and a PCI Express x16 system interface. While typically using only 160-watts of power, each card is capable of 933 GFlops of single precision floating point performance or 78 GFlops of double precision floating point performance.

ABOUT NVIDIA

NVIDIA (Nasdaq: NVDA) is the world leader in visual computing technologies and the inventor of the GPU, a high-performance processor which generates breathtaking, interactive graphics on workstations, personal computers, game consoles, and mobile devices. NVIDIA serves the entertainment and consumer market with its GeForce® products, the professional design and visualization market with its Quadro® products, and the high-performance computing market with its Tesla™ products. NVIDIA is headquartered in Santa Clara, California, and has offices throughout Asia, Europe, and the Americas.


Certain statements in this press release including, but not limited to, statements as to: the benefits, features, impact, and capabilities of the Tesla GPU computing processor and CUDA architecture; are forward-looking statements that are subject to risks and uncertainties that could cause results to be materially different than expectations.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

Important factors that could cause actual results to differ materially include: development of more efficient or faster technology; adoption of the CPU for parallel processing; design, manufacturing or software defects; the impact of technological development and competition; changes in consumer preferences and demands; customer adoption of different standards or our competitor's products; changes in industry standards and interfaces; unexpected loss of performance of our products or technologies when integrated into systems; as well as other factors detailed from time to time in the reports NVIDIA files with the Securities and Exchange Commission including its Form 10-K for the fiscal period ended January 25, 2009.

Copies of reports filed with the SEC are posted on our website and are available from NVIDIA without charge. These forward-looking statements are not guarantees of future performance and speak only as of the date hereof, and, except as required by law, NVIDIA disclaims any obligation to update these forward-looking statements to reflect future events or circumstances.                                    


GPU COMPUTING

GPU computing is the use of a GPU (graphics processing unit) to do generalpurpose scientific and engineering computing.

                        
The model for GPU computing is to use a CPU and GPU together in a heterogeneous computing model. The sequential part of the application runs on the CPU and the computationally-intensive part runs on the GPU. From the users perspective, the application just runs faster because it is using the high-performance of the GPU to boost performance. 

The application developer has to modify their application to take the compute-intensive kernels and map them to the GPU. The rest of the application remains on the CPU. Mapping a function to the GPU involves rewriting the function to expose the parallelism in the function and adding keywords to move data to and from the GPU.

GPU computing is enabled by the massively parallel architecture of NVIDIA™s GPUs called the CUDA architecture. The CUDA architecture consists of 100s of processor cores that operate together to crunch through the data set in the application.
                
The Tesla 10-series GPU is the second generation CUDA architecture with features optimized for scientific applications such as IEEE standard double precision floating point hardware support, local data caches in the form of shared memory dispersed throughout the GPU, coalesced memory accesses and so on.

HISTORY OF GPU COMPUTING
Graphics chips started as fixed function graphics pipelines. Over the years, these graphics chips became increasingly programmable, which led NVIDIA to introduce the first GPU or Graphics Processing Unit. In the 1999-2000 timeframe, computer scientists in particular, along with researchers in fields such as medical imaging and electromagnetics started using GPUs for running general purpose computational applications. They found the excellent floating point performance in GPUs led to a huge performance boost for a range of scientific applications. This was the advent of the movement called GPGPU or General Purpose computing on GPUs.      
                  
The problem was that GPGPU required using graphics programming languages like OpenGL and Cg to program the GPU. Developers had to make their scientific applications look like graphics applications and map them into problems that drew triangles and polygons. This limited the accessibility of tremendous performance of GPUs for science.
NVIDIA realized the potential to bring this performance to the larger scientific community and decided to invest in modifying the GPU to make it fully programmable for scientific applications and added support for high-level languages like C and C++. This led to the CUDA architecture for the GPU.

CUDA PARALLEL ARCHITECTURE
AND PROGRAMMING MODEL

The CUDA parallel hardware architecture is accompanied by the CUDA parallel programming model that provides a set of abstractions that enable expressing fine-grained and coarse-grain data and task parallelism. The programmer can choose to express the parallelism in high-level languages such as C, C++, Fortran or driver APIs such as OpenCL and DirectX 11 Compute.

The first language support NVIDIA provided is for the C language. A set of C for CUDA software development tools enable the GPU to be programmed using C with a minimal set of keywords or extensions. Support for Fortran, OpenCL, et cetera will follow soon.

The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel. Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel.

The CUDA GPU architecture and the corresponding CUDA parallel computing model are now widely deployed with 100s of applications and nearly a 1000 published research papers. CUDA Zone lists many of these applications and papers.

PROCESSOR OF NVIDIA TESLA

Tesla C1060 Computing Processor

At the heart of the new Tesla personal supercomputer are three or four Nvidia Tesla C1060 computing processors, which appear similar to a high-performance Nvidia graphics card, but without any video output ports. Each Tesla C1060 has 240 streaming processor cores running at 1.296 GHz, 4 GB of 800 MHz 512-bit GDDR3 memory and a PCI Express x16 system interface. While typically using only 160-watts of power, each card is capable of 933 GFlops of single precision floating point performance or 78 GFlops of double precision floating point performance.

6.1 KEY FEATURES

GPU

�� Number of processor cores: 240
�� Processor core clock: 1.296 GHz
�� Voltage: 1.1875 V
�� Package size: 45.0 mm × 45.0 mm 2236-pin flip-chip ball grid array (FCBGA)

Memory

�� 800 MHz
�� 512-bit memory interface
�� 4 GB: Thirty-two pieces 32M × 32 GDDR3 136-pin BGA, SDRAM


External Connectors

�� None

Internal Connectors and Headers

�� One 6-pin PCI Express power connector
�� One 8-pin PCI Express power connector
�� 4-pin fan connector  

Internal Connectors andHeaders
The Tesla C1060 board supports the following internal connectors and headers.

�� 8-pin PCI Express power connector (can be used with a 6-pin power cable)
�� 6-pin PCI Express power connector
�� 4-pin fan connector

External PCI Express Power Connectors

The Tesla C1060 is a performance-optimized, high-end board and utilizes power from the PCI Express connector as well as external power connectors. The board can be used in two different ways.

�� One 8-pin PCI Express power connector or
�� Two 6-pin PCI Express power connectors


 4-Pin Fan Connector

The Tesla C1060 board uses a 4-pin fan to control the fan speed of the thermal solution. The details of the connector (P/N: PH-T-4) are given in Figure 6. This part is a 2.0 mm (0.079") pitch disconnectable connector.
  
6.4 THERMAL SPECIFICATIONS

      Thermal Qualification Summary

The information contained in this summary report is intended to provide users of the Tesla C1060 computing processor with thermal information necessary to assist in thermal management efforts. This information is not intended to provide a specific thermal management solution. However, it does show an approach that result in the reliable operation of the Tesla C1060.

The product and cooling solutions used are:

�� Device product: Tesla C1060 board
      �� Cooling solution: Fan sink solution, Cooler Master TM72 NV P/N: 580-10607-

2000-000. The cooling solution assembly includes a heat sink, fan, backplate, thermal grease interface material, and screws.

�� Result: Under the operating conditions described in the following tables, the Tesla C1060 passed thermal qualification.
Cooling Solution

NVIDIA will utilize a CoolerMaster TM72 active fan sink (Figure ) to cool the GPU, memories and power supply components.

TECHNICAL SPECIFICATIONS

Tesla Architecture

  • Massively-parallel many-core architecture
  • 240 scalar processor cores per GPU
  • Integer, single-precision and double-precision floating point operations
  • Hardware Thread Execution Manager enables thousands of concurrent threads per GPU
  • Parallel shared memory enables processor cores to collaborate on shared information at     local cache performance
  • Ultra-fast GPU memory access with 102 GB/s peak bandwidth per GPU
  • IEEE 754 single-precision and double-precision floating point
  • Each Tesla C1060 GPU delivers 933 GFlops Single Precision and 78 GFlops Double Precision performance
         Software Development Tools

  • C language compiler, debugger, profiler, and emulation mode for debugging
  • Standard numerical libraries for FFT (Fast Fourier Transform), BLAS (Basic Linear Algebra Subroutines), and CuDPP (CUDA Data Parallel Primitives)
Product Details
  • 3 or 4 Tesla C1060 Computing Processors with 4GB of dedicated memory per GPU
  • 2.33 GHz+ Quad-core AMD Phenom or Opteron, -- OR -- Quad-core Intel Core 2 or Xeon
  • Minimum system memory: 12 GB for 3 Tesla C1060s and 16 GB for 4 Tesla C1060s (at least 4GB per Tesla C1060)
  • 12GB+ system memory (at least 4GB per Tesla C1060) 1200-1350 Watt Power supply
  • Acoustics < 45dbA
Supported Platforms
  • Microsoft® Windows® XP 64-bit and 32-bit (64-bit recommended)
  • Linux® 64-bit and 32-bit (64-bit recommended)
  • Red Hat Enterprise Linux 4 and 5
  • SUSE 10.1, 10.2 and 10.3      
FEATURES & BENEFITS

Your own Supercomputer
·         Dedicated computing resource for every computational researcher and technical       professional.
·         250 times faster than the average PC.

Cluster Performance on your Desktop
·         The performance of a cluster in a desktop system. Four Tesla GPU computing processors deliver close to 4 Teraflops of performance.
    
Designed for Office Use
·         Plugs into a standard office power socket and quiet enough for use at your desk.
·         The NVIDIA’s Tesla computer could prove invaluable to medical researchers and accelerate the discovery of cancer treatments.

Massively Parallel Many Core GPU Architecture
·         240 parallel processor cores per GPU that can execute thousands of concurrent threads.

Solve Large-scale Problems using Multiple GPUs
·         Scale your application to multiple GPUs and harness the performance of thousands of processor cores to solve large-scale problems.

Widely accepted, easy to learn CUDA C Programming Environment
·         Easily express application parallelism to take advantage of the GPU™s many-core architecture using the NVIDIA® CUDA C programming environment.

4 GB High-Speed Memory per GPU
·         Dedicated compute memory enables larger datasets to be stored locally for each processor to maximize benefit from the 102 GB/s memory transfer speeds and minimize data movement around the system.

IEEE 754 Floating Point Precision (single-precision and double-precision)
·         Provides results that are consistent across platforms and meet industry standards.

64-bit ALUs for Double-Precision Math
·         Meets the precision requirements of your most demanding applications with 64-bit ALUs

CONCLUSION
The revolutionary Tesla supercomputer was launched in London. The NVIDIA’s Tesla computer could prove invaluable to medical researchers and accelerate the discovery of cancer treatments. 
The technology represents a great leap forward in the history of computing. PHD students at Cambridge and Oxford Universities and MIT in America are already using GPU-based personal supercomputers for research.Scientists believe the new systems could help find cures for diseases.
Although at £4,000 it is beyond the reach of most consumers, the high-performance processor could become invaluable to universities and medical institutions.
The Tesla Personal Supercomputer doesn't make supercomputing clusters obsolete but it's a major breakthrough for millions of researchers who can take advantage of the huge heterogeneous computing power of this system

No comments:

Post a Comment

leave your opinion