Friday, August 31, 2012

The Rise of Extraordinary Computers


Today Supercomputing is considered to be an important backbone of almost all scientific domains without which the world you see out would just be a dream. From where did the roots of such exemplary power rose? To read the history of supercomputing anyone should go back to the 1960s when a legendary man named Saymour Cray lived with his unquenchable thirst of designing extreme powered computers. He was affectionately known as the ‘Father of Supercomputing’. The CDC (Control Data Corporation) 6600, released in 1964, is generally considered the first supercomputer was one of his innovations.
The Beginning:
        As said above the beginning of supercomputing era appeared around 1960 by which the world saw the release of one its biggest dreams ‘the CDC 6600’. Cray completed the CDC 1604, the first solid state computer, and the fastest computer in the world at a time when vacuum tubes were found in most large computers, in the year 1960.
The term solid state says that the computer is made from semiconductors. This term is used in order to represent the transition of computing system from using vacuum tubes to semiconductor materials.
Around 1960 Cray decided to design a computer that would be the fastest in the world to a greater extent than 1604. After four years of experimentation along with Jim Thornton, and Dean Roush and about 30 other engineers Cray completed the CDC 6600 in 1964. Given that the 6600 outran all computers of the time by about 10 times, it was dubbed a supercomputer and defined the supercomputing market when one hundred computers were sold at $8 million each. The 6600 gained speed by "farming out" work to peripheral computing elements, freeing the CPU (Central Processing Unit) to process actual data. The Minnesota FORTRAN compiler for the machine was developed by Liddiard and Mundstock at the University of Minnesota and with it the 6600 could sustain 500 KFLOPS on standard mathematical operations. In 1968 Cray completed the CDC 7600, again the fastest computer in the world. At 36 MHz, the 7600 had about three and a half times the clock speed of the 6600, but ran significantly faster due to other technical innovations. Cray left CDC in 1972 to form his own company. Two years after his departure CDC delivered the STAR-100 which at 100 megaflops was three times the speed of the 7600. Along with the Texas Instruments ASC, the STAR-100 was one of the first machines to use vector processing - the idea having been inspired around 1964 by the APL programming language.
The CRAY Era:
         Around 1976, Cray delivered the 80 MHz Cray 1 , and it became one of the most successful supercomputers in history. The Cray 1 was a vector processor which introduced a number of innovations such as chaining in which scalar and vector registers generate interim results which can be used immediately, without additional memory references which reduce computational speed.
Chaining is a technique used in computer architecture in which scalar and vector registers generate intermediate results which can be used immediately, without additional memory references which reduce computational speed.
A vector processor also known as array processor, is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items.
In 1982, a 105 MHz shared-memory parallel vector processor  ‘The Cray X-MP’ was released,  with better chaining support and multiple memory pipelines (a concept of overlapping the execution of instructions). All three floating point pipelines on the XMP could operate simultaneously. The Cray-2 released in 1985 was a 4 processor liquid cooled computer and Fluorinert was pumped through it as it operated. It could perform to 1.9 gigaflops and was the world's fastest until 1990 when ETA-10G from CDC overtook it. The Cray 2 was a totally new design and did not use chaining and had high memory latency, but used much pipelining and was ideal for problems that required large amounts of memory. The software costs in developing a supercomputer should not be underestimated, as evidenced by the fact that in the 1980s the cost for software development at Cray came to equal what was spent on hardware. That trend was partly responsible for a move away from the in-house, Cray Operating System to UNICOS (UNIx based Cray Operating System) based on Unix. The Cray Y-MP, also designed by Steve Chen, was released in 1988 as an improvement of the XMP and could have eight vector processors at 167 MHz with a peak performance of 333 megaflops per processor.  In the late 1980s, Cray's experiment on the use of gallium arsenide semiconductors in the Cray-3 did not succeed. Cray began to work on a massively parallel computer in the early 1990s, but died in a car accident in 1996 before it could be completed.
The Massive Processing Era:
The Cray-2 which set the frontiers of supercomputing in the mid to late 1980s had only 8 processors. In the 1990s, supercomputers with thousands of processors began to appear. Another development at the end of the 1980s was the arrival of Japanese supercomputers, some of which were modelled after the Cray-1. The SX-3/44R was announced by NEC Corporation in 1989 and a year later earned the fastest in the world title with a 4 processor model. However, Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in 1994. It had a peak speed of 1.7 gigaflops per processor. The Hitachi SR2201 on the other obtained a peak performance of 600 gigaflops in 1996 by using 2048 processors connected via a fast three dimensional crossbar network. In the same timeframe the Intel Paragon could have 1000 to 4000 Intel i860 processors in various configurations, and was ranked the fastest in the world in 1993. The Paragon was a MIMD (Multiple Instruction Multiple Data) machine which connected processors via a high speed two dimensional mesh, allowing processes to execute on separate nodes; communicating via the Message Passing Interface (technique used to pass data between processors). By 1995 Cray was also shipping massively parallel systems, e.g. the Cray T3E with over 2,000 processors, using a three dimensional torus interconnect.
Interconnect with its various form like mesh, torus etc are the way of connecting the various number of processors as a network of processors having communications between them.
 The Paragon architecture soon led to the Intel ASCI Red supercomputer which held the top supercomputing spot to the end of the 20th century as part of the Advanced Simulation and Computing Initiative. This was also a mesh-based MIMD massively-parallel system with over 9,000 compute nodes and well over 12 terabytes of disk storage, but used off-the-shelf Pentium Pro processors that could be found in everyday personal computers. ASCI Red was the first system ever to break through the 1 teraflop barrier on the MP-Linpack benchmark in 1996; eventually reaching 2 teraflops.
The PETAFLOP Computing Era:
         The 21st century saw a significant progress and it was shown that the power of a large number of small processors can be harnessed to achieve high performance, e.g. as in System X's use of 1,100 Apple Power Mac G5 computers quickly assembled in the summer of 2003 to gain 12.25 Teraflops. The efficiency of supercomputers continued to increase, but not dramatically so. The Cray C90 used 500 kilowatts of power in 1991, while by 2003 the ASCI Q used 3,000 kW while being 2,000 times faster, increasing the performance by watt 300 fold. In 2004 the Earth Simulator supercomputer built by NEC at the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) reached 131 teraflops, using 640 nodes, each with eight proprietary vector processing chips. The IBM Blue Gene supercomputer architecture found widespread use in the early part of the 21st century, and 27 of the computers on the TOP500 list used that architecture. The Blue Gene approach is somewhat different in that it trades processor speed for low power consumption so that a larger number of processors can be used at air cooled temperatures. It can use over 60,000 processors, with 2048 processors "per rack", and connects them via a three-dimensional torus interconnect. Progress in China has been rapid, in that China placed 51st on the TOP500 list in June 2003, then 14th in November 2003 and 10th in June 2004 and then 5th during 2005, before gaining the top spot in 2010 with the 2.5 petaflop Tianhel supercomputer. In July 2011 the 8.1 petaflop Japanese K computers became the fastest in the world using over 60,000 commercial scalar SPARC64 VIIIfx processors housed in over 600 cabinets. The fact that K computer is over 60 times faster than the Earth Simulator, and that the Earth Simulator ranks as the 68th system in the world 7 years after holding the top spot demonstrates both the rapid increase in top performance and the widespread growth of supercomputing technology worldwide.

This is a list of the computers which appeared at the top of the Top500 list since 1993.




Year
Supercomputer
Peak speed

Location
1993
Fujitsu Numerical Wind Tunnel
124.50 GFLOPS
National Aerospace LaboratoryTokyoJapan
1993
Intel Paragon XP/S 140
143.40 GFLOPS
DoE-Sandia National LaboratoriesNew MexicoUSA
1994
Fujitsu Numerical Wind Tunnel
170.40 GFLOPS
National Aerospace LaboratoryTokyoJapan
1996
Hitachi SR2201/1024
220.4 GFLOPS
University of TokyoJapan
Hitachi CP-PACS/2048
368.2 GFLOPS
University of TsukubaTsukubaJapan
1997
Intel ASCI Red/9152
1.338 TFLOPS
DoE-Sandia National LaboratoriesNew MexicoUSA
1999
Intel ASCI Red/9632
2.3796 TFLOPS
2000
IBM ASCI White
7.226 TFLOPS
DoE-Lawrence Livermore National LaboratoryCaliforniaUSA
2002
NEC Earth Simulator
35.86 TFLOPS
Earth Simulator CenterYokohamaJapan
2004
IBM Blue Gene/L
70.72 TFLOPS
2005
136.8 TFLOPS
DoE/U.S. National Nuclear Security Administration,
Lawrence Livermore National LaboratoryCaliforniaUSA
280.6 TFLOPS
2007
478.2 TFLOPS
2008
IBM Roadrunner
1.026 PFLOPS
DoE-Los Alamos National LaboratoryNew MexicoUSA
1.105 PFLOPS
2009
Cray Jaguar
1.759 PFLOPS
DoE-Oak Ridge National LaboratoryTennesseeUSA
2010
2.566 PFLOPS
National Supercomputing CenterTianjinChina
2011
Fujitsu K computer
10.51 PFLOPS
RIKENKobeJapan
2012
IBM Sequoia
16.32 PFLOPS
Lawrence Livermore National LaboratoryCaliforniaUSA