Project Description

Optimiza – Optimización de aplicacións irregulares en arquitecturas emerxentes de altas prestacións CPU/GPU

Period: 2010-01-01 – 2012-12-30 (36 months).

Funded by: Xunta de Galicia


The industry transition to multicore processors (multicore) is considered one of the most important milestones in the computing history. However, more powerful processors have not been translated into a significant increase in application performance. On the other hand, current multi-core systems are architectural very diverse so applications must be tunned using optimization techniques specific to the considered architecture. These two issues will be addressed within the framework of this research project.

One of the main reasons for this diversity of architecture is the need to find a balance between the capacities of memory and processor. Moore’s Law show how every year and a half the performance of processors is doubled, while the memory required ten years to double its performance (a problem known as the Memory Barrier Wall). Thus, the bandwidth and memory latency were the major constraint on performance in traditional architectures. In the case of multicore architectures, due to issues of cost and efficiency, the hardware industry trend is to increase the number of cores versus increased bandwidth. Therefore, the memory hierarchy will remain the key issue in applications performance on future multi-core architectures [1].

Particularly serious is the case of irregular applications. In these applications the principle of locality, which is based on the efficient operation of the memory hierarchy is not met. Therefore, the performance obtained when running these applications on multicore architectures will be much lower than standard applications (typically around 10% of the peak performance of the machine). Irregular applications are between the most demanding scientific applications, being present, among others, on electronic devices simulation, fluid mechanics or n-body problems (astrophysics, molecular dynamics, etc.).. We take as example the use of CESGAsupercomputer Finisterrae in the last year, where 3 of the 5 most used applications were irregular. They are also very present in industries such as electronics design (eg., Simulation of transistors), pharmaceuticals (eg., Analysis of proteins and biomolecules), media (eg., 3D image rendering ) or aircraft, among many others.

In this context, the objective proposed in the project is to apply the adquired knowledge of the research group in the parallelization and optimization of irregular applications to the context of new architectures that will dominate the market for high performance computing in the coming years: multicore hybrid CPU / GPU. In such systems, in addition to standard multi-core processors (CPUs) there are one or more GPUs. Current GPUs have dozens of simple processors (typically over 100) that allow its use, as well as specific units for graphics rendering, for general purpose parallel computing. Thus, these systems pose a challenge for programmers because of the addition of new levels of memory hierarchy, placing software development as the key to obtain high performance. Therefore, the development of tools and libraries that will simplify the use of these systems and will be able to get a high performance, demonstrates great importance to the scientific community and industry. Specifically, the project will develop techniques for automatic migration of memory pages guided by information provided by the hardware counters of the processors, and a mathematical library of irregular matrix algebra adapted and optimized for use with GPUs.

[1] K. Asanovic et al. The Landscape of Parallel Computing Research: The View from Berkeley. Technical Report. Univ.of California, Berkeley (USA), 2006.


This project contribute to the field of High Performance Computing, new processor architectures for supercomputers and its application to problems of industry and scientific community with a need for high computational resources.
The overall project goal is to apply the experience of the research group in the parallelization and optimization of irregular applications to the context of new architectures that will dominate the market for high performance computing in the coming years: the hybrid multicore CPU / GPU. With this purpose will be developed for industry and the scientific community tools and math libraries that, on the one hand, simplify the use of these systems and, secondly, will be able to achieve high performance.

In a more concrete way we can summarize the project’s objectives as follows:

1. Evaluation of GPUs and their programming models as appropriate platform for the development / optimization of irregular applications in the field of high performance computing.

2. Extension of the memory hierarchymodels previously developed by the research team members to new multicore CPU/GPU hybrid architectures.

3. Develop software tools that allow the use of the memory hierarchy by irregular applications and facilitate the programmability of such systems:
– In the case of multi-core CPUs will focus on developing automated migration of memory pages techniques. These techniques are based on new models and can be implemented on multiprocessor NUMA (Non-Uniform-Memory-Access) systems. The migration technique is transparent to the user.
– In the case of GPUs will develop a math library for sparse matrix algebra. The sparse matrix algebra codes are the basic computing cores of most irregular applications found in industry and scientific applications. This library allows the use of mathematical functions optimized for GPUs by exploiting the
performance and parallelism capabilities that such systems can provide. Special attention will be devoted to optimizing memory accesses.

4. Finally, the project will let to evaluate and improve the knowledge about such new-generation hybrid architectures, facilitating decision-making on the future architecture of CESGA Finisterrae2 supercomputer.