Sunday, December 9, 2012


Processor technology had moved from single processor to multiprocessor system with two variants either many single processors connected together or many processors embedded to a single chip popularly known as Chip-Multiprocessors (CMP). I am sure the system in which you are viewing this blog has CMP architecture. The former variant which is generally found in supercomputers has a connection between many individual processors via some interconnection network and making them to communicate for executing a particular task. These two forms of processor classes form the base for High Performance Computing. The two architectures greatly differ in the way they perform and also in the view presented to the programmer for performing their task giving rise to a new form of computing known as parallel computing or parallel programming which is the root power of High Performance Systems. The way you program so far like in C, C++, Java etc are sequential programming model where you will think of a single processing system with memory around it feeding and getting data to and fro from the processing unit. Shocking news is that the world is slowly moving to parallel models of programming in which the programmer should think about many processing units sharing a memory system or having the memory system distributed among all processing units according to the aforementioned two variants of processor collaboration. So get ready to face this shift.

Parallel Programming:
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel") since there are many processing unit to work  on many problems. There are several different forms of parallel computing: bit-level, instruction level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multicore processors.

Types of Parallelism:

Bit-level parallelism:
            Word Size (your processor labelled as 32 bit or 64 bit) is the amount of the amount of information the processor can manipulate per cycle and it has a very great implication on the speed of the processor. Increasing the word size correspondingly reduces the number of the instructions to be executed to complete a task. For example, where an 8-bit processor must add two 16-bit integers, the processor must first add the 8 lower-order bits from each integer using the standard addition instruction, then add the 8 higher-order bits using an add-with-carry instruction and the carry bit from the lower order addition; thus, an 8-bit processor requires two instructions to complete a single operation, where a 16-bit processor would be able to complete the operation with a single instruction.
Instruction-level parallelism:
A computer program is, in essence, a stream of instructions executed by a processor. These instructions can be re-ordered and combined into groups which are then executed in parallel without changing the result of the program. This is known as instruction-level parallelism. Advances in instruction-level parallelism dominated computer architecture from the mid-1980s until the mid-1990s. Instruction-level parallelism is realized by pipelines in the processor architecture. Think of the pipeline at an automobile manufacturing site. At a particular instance the number of instructions in execution will be equal to the number of pipeline stages the single-issue processor architecture has, issuing one instruction per cycle. On a double-issue processor, this number will be double the number of pipeline stages and so on. Processors having the capability of issuing more than one instruction per clock cycle are known as superscalar processors.

Data parallelism:
Data parallelism is parallelism inherent in program loops, which focuses on distributing the data across different computing nodes to be processed in parallel. Parallelizing loops often leads to similar (not necessarily identical) operation sequences or functions being performed on elements of a large data structure. Many scientific and engineering applications exhibit data parallelism.

Task parallelism:
            Task parallelism is the characteristic of a parallel program that "entirely different calculations can be performed on either the same or different sets of data". This contrasts with data parallelism, where the same calculation is performed on the same or different sets of data. Task parallelism does not usually scale with the size of a problem.

Parallel Computer Classes:
            Keep in mind the classification given below are not mutually exclusive.
Multicore computing:
A multicore processor is a processor that includes multiple execution units ("cores") on the same chip. These processors differ from superscalar processors, which can issue multiple instructions per cycle from one instruction stream (thread); in contrast, a multicore processor can issue multiple instructions per cycle from multiple instruction streams. Each core in a multicore processor can potentially be superscalar as well—that is, on every cycle, each core can issue multiple instructions from one instruction stream. Simultaneous multithreading (of which Intel's HyperThreading is the best known) was an early form of pseudo-multicoreism. A processor capable of simultaneous multithreading has only one execution unit ("core"), but when that execution unit is idling (such as during a cache miss), it uses that execution unit to process a second thread. IBM's Cell microprocessor, designed for use in the Sony PlayStation 3, is another prominent multicore processor.
Symmetric multiprocessing:
A symmetric multiprocessor (SMP) is a computer system with multiple identical processors that share memory and connect via a bus. Bus contention prevents bus architectures from scaling. As a result, SMPs generally do not comprise more than 32 processors. "Because of the small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely cost-effective, provided that a sufficient amount of memory bandwidth exists."
Distributed computing:
A distributed computer (also known as a distributed memory multiprocessor) is a distributed memory computer system in which the processing elements are connected by a network. Distributed computers are highly scalable.
Cluster computing
A cluster is a group of loosely coupled computers that work together closely, so that in some respects they can be regarded as a single computer. Clusters are composed of multiple standalone machines connected by a network. While machines in a cluster do not have to be symmetric, load balancing is more difficult if they are not. The most common type of cluster is the Beowulf cluster, which is a cluster implemented on multiple identical commercial off-the-shelf computers connected with a TCP/IP Ethernet local area network. Beowulf technology was originally developed by Thomas Sterling and Donald Becker. The vast majority of the TOP500 supercomputers are clusters.
Massive parallel processing
A massively parallel processor (MPP) is a single computer with many networked processors. MPPs have many of the same characteristics as clusters, but MPPs have specialized interconnect networks (whereas clusters use commodity hardware for networking). MPPs also tend to be larger than clusters, typically having "far more" than 100 processors. In an MPP, "each CPU contains its own memory and copy of the operating system and application. Each subsystem communicates with the others via a high-speed interconnect.”

Blue Gene/L, the fifth fastest supercomputer in the world according to the June 2009 TOP500 ranking, is an MPP.
Grid computing
Distributed computing is the most distributed form of parallel computing. It makes use of computers communicating over the Internet to work on a given problem. Because of the low bandwidth and extremely high latency available on the Internet, distributed computing typically deals only with embarrassingly parallel problems. Many distributed computing applications have been created, of which SETI@home and Folding@home are the best-known examples. Most grid computing applications use middleware, software that sits between the operating system and the application to manage network resources and standardize the software interface. The most common distributed computing middleware is the Berkeley Open Infrastructure for Network Computing (BOINC). Often, distributed computing software makes use of "spare cycles", performing computations at times when a computer is idling.