What is High-Performance Computing (HPC)

June 18, 2017

high-performance-computing-1 — Image courtesy from google

High-performance computing (HPC) is the use of parallel processing for running advanced application programs efficiently, reliably and quickly. The term applies especially to systems that function above a teraflop or 10¹² floating-point operations per second.

The term HPC is occasionally used as a synonym for supercomputing, although technically a supercomputer is a system that performs at or near the currently highest operational rate for computers. Some supercomputers work at more than a petaflop or 10¹⁵ floating-point operations per second.

The most common users of HPC systems are scientific researchers, engineers and academic institutions. Some government agencies, particularly the military, also rely on HPC for complex applications. High-performance systems often use custom-made components in addition to so-called commodity components.

As demand for processing power and speed grows, HPC will likely interest businesses of all sizes, particularly for transaction processing and data warehouses. An occasional techno-fiends might use an HPC system to satisfy an exceptional desire for advanced technology.

Definition: Transaction

In computer programming, a transaction usually means a sequence of information exchange and related work (such as database updating) that is treated as a unit for the purposes of satisfying a request and for ensuring database integrity. For a transaction to be completed and database changes to made permanent, a transaction has to be completed in its entirety. A typical transaction is a catalog merchandise order phoned in by a customer and entered into a computer by a customer representative. The order transaction involves checking an inventory database, confirming that the item is available, placing the order, and confirming that the order has been placed and the expected time of shipment. If we view this as a single transaction, then all of the steps must be completed before the transaction is successful and the database is actually changed to reflect the new order. If something happens before the transaction is successfully completed, any changes to the database must be kept track of so that they can be undone.

A program that manages or oversees the sequence of events that are part of a transaction is sometimes called a transaction monitor. Transactions are supported by Structured Query Language, the standard database user and programming interface. When a transaction completes successfully, database changes are said to be committed; when a transaction does not complete, changes are rolled back. In IBM’s Customer Information Control System product, a transaction is a unit of application data processing that results from a particular type of transaction request. In CICS, an instance of a particular transaction request by a computer operator or user is called a task.

Less frequently and in other computer contexts, a transaction may have a different meaning. For example, in IBM mainframe operating system batch processing, a transaction is a job or a job step.

Definition: Data warehouse

A data warehouse is a federated repository for all the data that an enterprise’s various business systems collect. The repository may be physical or logical.

Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the point-of-view of the end user who may need access to specialized, sometimes local databases. The latter idea is known as the data mart.

There are two approaches to data warehousing, top down and bottom up. The top down approach spins off data marts for specific groups of users after the complete data warehouse has been created. The bottom up approach builds the data marts first and then combines them into a single, all-encompassing data warehouse.

Typically, a data warehouse is housed on an enterprise mainframe server or increasingly, in the cloud. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted for use by analytical applications and user queries.

The term data warehouse was coined by William H. Inmon, who is known as the Father of Data Warehousing. Inmon described a data warehouse as being a subject-oriented, integrated, time-variant and nonvolatile collection of data that supports management’s decision-making process.

Definition: Parallel processing

In computers, parallel processing is the processing of program instructions by dividing them among multiple processors with the objective of running a program in less time. In the earliest computers, only one program ran at a time. A computation-intensive program that took one hour to run and a tape copying program that took one hour to run would take a total of two hours to run. An early form of parallel processing allowed the interleaved execution of both programs together. The computer would start an I/O operation, and while it was waiting for the operation to complete, it would execute the processor-intensive program. The total execution time for the two jobs would be a little over one hour.

The next improvement was multiprogramming. In a multiprogramming system, multiple programs submitted by users were each allowed to use the processor for a short time. To users it appeared that all of the programs were executing at the same time. Problems of resource contention first arose in these systems. Explicit requests for resources led to the problem of the deadlock. Competition for resources on machines with no tie-breaking instructions lead to the critical section routine.

Vector processing was another attempt to increase performance by doing more than one thing at a time. In this case, capabilities were added to machines to allow a single instruction to add (or subtract, or multiply, or otherwise manipulate) two arrays of numbers. This was valuable in certain engineering applications where data naturally occurred in the form of vectors or matrices. In applications with less well-formed data, vector processing was not so valuable.

The next step in parallel processing was the introduction of multiprocessing. In these systems, two or more processors shared the work to be done. The earliest versions had a master/slave configuration. One processor (the master) was programmed to be responsible for all of the work in the system; the other (the slave) performed only those tasks it was assigned by the master. This arrangement was necessary because it was not then understood how to program the machines so they could cooperate in managing the resources of the system.

Solving these problems led to the symmetric multiprocessing system (SMP). In an SMP system, each processor is equally capable and responsible for managing the flow of work through the system. Initially, the goal was to make SMP systems appear to programmers to be exactly the same as single processor, multiprogramming systems. (This standard of behavior is known as sequential consistency). However, engineers found that system performance could be increased by someplace in the range of 10-20% by executing some instructions out of order and requiring programmers to deal with the increased complexity. (The problem can become visible only when two or more programs simultaneously read and write the same operands; thus the burden of dealing with the increased complexity falls on only a very few programmers and then only in very specialized circumstances.) The question of how SMP machines should behave on shared data is not yet resolved.

As the number of processors in SMP systems increases, the time it takes for data to propagate from one part of the system to all other parts grows also. When the number of processors is somewhere in the range of several dozen, the performance benefit of adding more processors to the system is too small to justify the additional expense. To get around the problem of long propagation times, message passing systems were created. In these systems, programs that share data send messages to each other to announce that particular operands have been assigned a new value. Instead of a broadcast of an operand’s new value to all parts of a system, the new value is communicated only to those programs that need to know the new value. Instead of a shared memory, there is a network to support the transfer of messages between programs. This simplification allows hundreds, even thousands, of processors to work together efficiently in one system. (In the vernacular of systems architecture, these systems “scale well.”) Hence such systems have been given the name of massively parallel processing (MPP) systems.

The most successful MPP applications have been for problems that can be broken down into many separate, independent operations on vast quantities of data. In data mining, there is a need to perform multiple searches of a static database. In artificial intelligence, there is the need to analyze multiple alternatives, as in a chess game. Often MPP systems are structured as clusters of processors. Within each cluster the processors interact as in a SMP system. It is only between the clusters that messages are passed. Because operands may be addressed either via messages or via memory addresses, some MPP systems are called NUMA machines, for Non-Uniform Memory Addressing.

SMP machines are relatively simple to program; MPP machines are not. SMP machines do well on all types of problems, providing the amount of data involved is not too large. For certain problems, such as data mining of vast data bases, only MPP systems will serve.

Definition: Application program

An application program (sometimes shortened to application) is any program designed to perform a specific function directly for the user or, in some cases, for another application program. Examples of application programs include word processors; database programs; Web browsers; development tools; drawing, paint, and image editing programs; and communication programs. Application programs use the services of the computer’s operating system and other supporting programs. The formal requests for services and means of communicating with other programs that a programmer uses in writing an application program is called the application program interface (API).

Definition: Teraflop

A teraflop is a measure of a computer’s speed and can be expressed as:

A trillion floating point operations per second
10 to the 12th power floating-point operations per second
2 to the 40th power flops

Today’s fastest parallel computing operations are capable of teraflop speeds. Scientists have begun to envision computers operating at petaflop speeds.

Definition: Supercomputer

A supercomputer is a computer that performs at or near the currently highest operational rate for computers. Traditionally, supercomputers have been used for scientific and engineering applications that must handle very large databases or do a great amount of computation (or both). Although advances like multi-core processors and GPGPUs (general-purpose graphics processing units) have enabled powerful machines for personal use (see: desktop supercomputer, GPU supercomputer), by definition, a supercomputer is exceptional in terms of performance.

At any given time, there are a few well-publicized supercomputers that operate at extremely high speeds relative to all other computers. The term is also sometimes applied to far slower (but still impressively fast) computers. The largest, most powerful supercomputers are really multiple computers that perform parallel processing. In general, there are two parallel processing approaches: symmetric multiprocessing (SMP) and massively parallel processing (MPP).

As of June 2016, the fastest supercomputer in the world was the Sunway TaihuLight, in the city of Wixu in China. A few statistics on TaihuLight:

40,960 64-bit, RISC processors with 260 cores each.
Peak performance of 125 petaflops (quadrillion floating point operations per second).
32GB DDR3 memory per compute node, 1.3 PB memory in total.
Linux-based Sunway Raise operating system (OS).

Notable supercomputers throughout history:

The first commercially successful supercomputer, the CDC (Control Data Corporation) 6600 was designed by Seymour Cray. Released in 1964, the CDC 6600 had a single CPU and cost $8 million — the equivalent of $60 million today. The CDC could handle three million floating point operations per second (flops).

Cray went on to found a supercomputer company under his name in 1972. Although the company has changed hands a number of times it is still in operation. In September 2008, Cray and Microsoft launched CX1, a $25,000 personal supercomputer aimed at markets such as aerospace, automotive, academic, financial services and life sciences.

IBM has been a keen competitor. The company’s Roadrunner, once the top-ranked supercomputer, was twice as fast as IBM’s Blue Gene and six times as fast as any of other supercomputers at that time. IBM’s Watson is famous for having adopted cognitive computing to beat champion Ken Jennings on Jeopardy!, a popular quiz show.

Top supercomputers of recent years:

Year	Supercomputer	Peak speed (Rmax)	Location
2016	Sunway TaihuLight	93.01 PFLOPS	Wuxi, China
2013	NUDT Tianhe-2	33.86 PFLOPS	Guangzhou, China
2012	Cray Titan	17.59 PFLOPS	Oak Ridge, U.S.
2012	IBM Sequoia	17.17 PFLOPS	Livermore, U.S.
2011	Fujitsu K computer	10.51 PFLOPS	Kobe, Japan
2010	Tianhe-IA	2.566 PFLOPS	Tianjin, China
2009	Cray Jaguar	1.759 PFLOPS	Oak Ridge, U.S.
2008	IBM Roadrunner	1.026 PFLOPS	Los Alamos, U.S.
2008	IBM Roadrunner	1.105 PFLOPS	Los Alamos, U.S.

In the United States, some supercomputer centers are interconnected on an Internet backbone known as vBNS or NSFNet. This network is the foundation for an evolving network infrastructure known as the National Technology Grid. Internet2 is a university-led project that is part of this initiative.

At the lower end of supercomputing, clustering takes more of a build-it-yourself approach to supercomputing. The Beowulf Project offers guidance on how to put together a number of off-the-shelf personal computer processors, using Linux operating systems, and interconnecting the processors with Fast Ethernet. Applications must be written to manage the parallel processing.

Definition: Petaflop

A petaflop is a measure of a computer’s processing speed and can be expressed as:

A quadrillion (thousand trillion) floating point operations per second (FLOPS)
A thousand teraflops
10 to the 15th power FLOPS
2 to the 50th power FLOPS

In June, 2008, IBM’s Roadrunner supercomputer was the first to break what has been called “the petaflop barrier.” In November 2008, when the annual rankings of the Top 500 supercomputers were released, there were two computers to do so. At 1.105 petaflops, Roadrunner retained its top place from the previous list, ahead of Cray’s Jaguar, which ran at 1.059 petaflops.

Breaking the petaflop barrier is expected to have profound and far-reaching effects on the future of science. According to Thomas Zacharia, head of computer science at Cray’s Oak Ridge National Laboratory in Tennessee, “The new capability allows you to do fundamentally new physics and tackle new problems. And it will accelerate the transition from basic research to applied technology.”

Petaflop computing will enable much more accurate modeling of complex systems. Applications are expected to include real-time nuclear magnetic resonance imaging during surgery, computer-based drug design, astrophysical simulation, the modeling of environmental pollution, and the study of long-term climate changes.

Definition: Techno-fiend

In information technology, a techno-fiend is someone who is addicted to finding out and knowing how things work in one or more aspects of cyberspace . Techno-fiends frequently know about and consult the places where you can find out. (See “Selected Links” below.) Some techno-fiends also frequent Usenet or other online discussions. Techno-fiends usually suspect that there’s some place or someone with information that they should know about but don’t.

Subjects that compel the attention of techno-fiends include: Web site design and browser behavior, Web server installation and management, any new emerging standard (a techno-fiend will read the main standard and even some of the ancillary standards), and any new technology, especially hardware technologies.

In general (with some exceptions), techno-fiends tend to be lay people rather than experts (whose motivation for understanding how things work is professional and somewhat economically motivated). A techno-fiend is less dedicated to a subject or a technology than a geek or a hacker , who both tend to be among the professionals. However, you can be an expert in one area and a techno-fiend in another.

Source: searchenterpriselinux.techtarget.com

What is High-Performance Computing (HPC)

Definition: Transaction

Definition: Data warehouse

Definition: Parallel processing

Definition: Application program

Definition: Teraflop

Definition: Supercomputer

Notable supercomputers throughout history:

Top supercomputers of recent years:

Definition: Petaflop

Definition: Techno-fiend

Interview

Navigating STMicroelectronics’ Microcontroller Innovations and Ultra Low Power MCUs: A Conversation...

STMicroelectronics continues to demonstrate the strength of innovation by fostering innovation...

STMicroelectronics’ Holistic Commitment to Empowering Edge AI Innovation

STMicroelectronics Advancing Power Electronics for Aircraft Electrification

STMicroelectronics: STM32 MCUs support wireless connectivity

New Edge AI evaluation kit accelerates ML application development using microcontroller, connectivity,...

ESDS and United We Stand Foundation Launch Mega Tree Plantation Drive...

Industry-standard switching & simulation systems from Pickering Interfaces on show at...