NAS Parallel Benchmarks


What are Benchmarks?

a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term ‘benchmark’ is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves.

The NAS Parallel Benchmarks (NPB) are a small set of programs designed to help evaluate the performance of parallel supercomputers. The benchmarks, which are derived from computational fluid dynamics (CFD) applications, consist of five kernels and three pseudo-applications. The NPB come in several “flavors.” NAS solicits performance results for each from all sources.

Continue reading

China Wrests Supercomputer Title From U.S.


A Chinese scientific research center has built the fastest supercomputer ever made, replacing the United States as maker of the swiftest machine, and giving China bragging rights as a technology superpower.

The Tianhe-1A computer in Tianjin, China, links thousands upon thousands of chips.

The computer, known as Tianhe-1A, has 1.4 times the horsepower of the current top computer, which is at a national laboratory in Tennessee, as measured by the standard test used to gauge how well the systems handle mathematical calculations, said Jack Dongarra, a University of Tennessee computer scientist who maintains the official supercomputer rankings.

Although the official list of the top 500 fastest machines, which comes out every six months, is not due to be completed by Mr. Dongarra until next week, he said the Chinese computer “blows away the existing No. 1 machine.” He added, “We don’t close the books until Nov. 1, but I would say it is unlikely we will see a system that is faster.”


To see the Top 500 list.

Networks for Multicomputers


An alternative form of multiprocessors to a shared memory multiprocessor can be created by connecting completer computers through an interconnection network. Each computer consists of a processor and local memory but this memory is not accessible by other processors.

The interconnection network provides for processors to send messages  to other processors. The message carry data from one processor to another as dictated by the program. Such multiprocessor systems are usually called message-passing multiprocessor, or simply multicomputers, especially if they consist of self-contained computers that could operate separately.

Programming a message-passing multicomputer still involves dividing the problem into parts that are intended to be executed simultaneously to solve the problem. Programming could use a parallel or extended sequential language, but a common approach is to use message-passing library routines that are inserted into a conventional sequential program for message-passing. Often, we talk in terms of processes. A problem is divided into a number of concurrent processes than computers, more than one process would be executed on one computer, in a time-shared fashion.

Processes communicate by sending messages; this will be the only way to distribute data and results between processes.

The purpose of the interconnection network is to provide a physical path for message sent from one computer to another computer.

Key issues in network design are the bandwidth, latency and cost.

The bandwidth is the number of bits that can be transmitted in unit time, given as bits/sec.

The network latency is the time to make a message transfer through the network.

The communication latency is the total time to send the message, including the software overhead and interface delays.

Message latency, or startup time, is the time to send a zero-length message, which is essentially  the software and hardware overhead in sending a message(finding the route, packing, unpacking, etc) onto which must be added the actual time to send the data along the interconnection path.

The number of physical links in a path between two nodes is an important consideration because it will be a major factor in determining the delay for a message. The diameter is the minimum number of links between the two farthest nodes(computers) in the network. Only the shortest routes are considered. How efficiently a parallel problem can be solved using a multicomputer with a specific network is extremely important. The diameter of the network gives the maximum distance that a single message must travel and can be used to find the communication lower bound of some parallel algorithms.

Mesh Network

A two dimensional mesh can be created by having each node in two dimensional array connect to its four nearest neighbors.

The mesh and torus network are popular because of their ease of layout and expandability. If necessary, the network can be folded; that is, rows are interleaved and columns, are interleaved so that the wraparound connections simply turn back through the network rather than stretch from one edge to the opposite edge.

BlueGene/L uses a three-dimensional (3D) torus network in which the nodes (red balls) are connected to their six nearest-neighbor nodes in a 3D mesh. In the torus configuration, the ends of the mesh loop back, thereby eliminating the problem of programming for a mesh with edges. Without these loops, the end nodes would not have six near neighbors.

Continue reading

Cluster vS Grid


There are many differences between Grid and Clusters. The following table shows comparison of Grid and Clusters.

Population Commodity Computers Commodity and High-end computers
Ownership Single Multiple
Discovery Membership Services Centralized Index  and Decentralized Info
User Management Centralized Decentralized
Resource management Centralized Distributed
Allocation/ Scheduling Centralized Decentralized
Inter-Operability VIA and Proprietary No standards being developed
Single System Image Yes No
Scalability 100s 1000?
Capacity Guaranteed Varies, but high
Throughput Medium High
Speed(Lat. Bandwidth) Low, high High, Low

A cluster is a group of computers organized together to perform the same set of functions. For instance, you may have a cluster running your database software. Or you can have a cluster running your corporate e-mail server software.

A grid is a collection of resources. A grid can be composed of multiple clusters. In Oracle’s grid world, you can have a cluster of servers running the database software and a cluster of servers running the Application Server software. Since they are all in the grid, you can move resources from one cluster to another should demands dictate.


Oracle RAC Cluster Tips by Burleson Consulting

Oracle Website

MPI – Get Started


To foster more widespread use and portability, a group of academics and industrial partners came together to develop what they hoped would be a “standard” for message-passing system.

They called MPI(Message Passing-Interface). MPI provides library routines for message-passing and associated operations.

A fundamental aspect of MPI is that it defines a standard but not the implementation, just as programming languages are defined but not how the compilers for the languages are implemented.

An important factor in developing MPI is the desire to make message-passing portable and easy to use.

Some changes were also made to correct technical deficiencies in earlier message-passing systems such as PVM.

Several free implementations of the MPI standard exist, including:

There also numerous vendor implementations, from HP, IBM, and others.

Implementation for windows clusters exist.

A list of MPI implementations and can be found Here.

A key factor in choosing an implementation is continuing support, because a few early implementation are now not supported at all.

Parallel Programming – Techniques and Applications 2th Edition, Barry Wilkinson

Parallel Virtual Machine


Now let us relate the basic message-passing ideas to a cluster of computers(cluster computing). There have been software packages for cluster computing, originally described as for network of workstations. Perhaps, the first widely adopted software for using a network of workstations as a multicomputer platform was PVM(Parallel Virtual Machine) developed by Oak Ridge National Laboratories in the late 1980s and used widely in the 1990s. PVM provides a software environment for message-passing between homogeneous or heterogeneous computers and has a collection of library routines that the user can employ with C or Fortran programs. PVM became widely used, partly because it was made readily available at no charge. PVM used dynamic process creation from the start.

Parallel Programming – Techniques and Applications 2th Edition, Barry Wilkinson