Descargar 51
Vistas totales 387
Tamaño del archivo 1.34 MB
File Type pdf
Fecha de Creación 20/9/2018
Última actualización 31/10/2023

One of the most powerful characteristics of Google API for Machine Learning TensorFlow is its capability for distributed computation that allow users to automatically distribute the training process in different computing machines. Despite the fact that the implementation of these characteristics is relatively straightforward, their deployment in a typical High Performance Infrastructure based on queue management systems presents several issues. This report describes a complete python package that solves these issues in the Finis Terrae II supercomputer, which uses Slurm as queue system. In order to show the performance of the distributed TensorFlow in Finis Terrae II, an industrial case based on the experiment 707 (CyPLAM) from FORTISSIMO II was trained using the developed python package.