Solicitamos su permiso para obtener datos estadísticos de su navegación en esta web. Si continúa navegando consideramos que acepta el uso de cookies. OK | Política de cookies | Política de Privacidad
Language: English

Máster HPC

  • Máster HPC


  • Suscribete a Novas CESGA

HPC User Portal

  • HPC User Portal

Follow us ...

  • Twitter FacebbokFlickrYouTube CESGA

SME Services

  • Servicios para Empresas

díxitos Julio 2019

  • díxitos Xullo 2019


Big Data Workshop: Introduction to CESGA’s Hadoop 3 platform

Nova: Big Data Workshop: Introduction to CESGA’s Hadoop 3 platform

 0 vote(s)

Wednesday 15/05/2019 10:42

This workshop will be a introduction to CESGA's Big Data platform. This platform was updated to Hadoop 3 and includes a new version of Spark 2.4.

This workshop will serve as an introduction to the tools available within the Hadoop 3 platform. The workshop will also serve as the basis for subsequent workshops on specific tools included in the platform, such as Spark.


Date: 11 june 10:00 a.m.  to 13:00 p.m.

Place: CESGA, Avda. de Vigo s/n, Campus Vida - Santiago de compostela

Recipients: The workshop is intended for both current users of the Big Data platform and new users who need access to Big Data tools.

What will I learn during the workshop?

At the end of the workshop you will know:

  • How to connect to the Hadoop 3 platform
  • How to transfer data efficiently
  • What tools that are available
  • How to launch these tools

What will NOT be taught during the workshop?

Given that this is an introductory workshop and the wide variety of tools available, this workshop will not explain how to use each of the tools, but it will simply show how to access them, their main features and how they are launched.

Subsequently, specific workshops will be held that will focus on specific tools such as Spark, where its use will be taught in detail.


1. Introduction to the Big Data service

1.1. Basic concepts

1.2. Hardware description

1.3. Software description

2. Connection to the Hadoop 3 service

2.1. VPN

2.2. Command line access: SSH

2.3. Access through the web interface: WebUI> HUE

2.4. Access through a remote desktop

3. Data transfer

3.1. File systems quote

3.2. Migration of data from the old platform

3.3. How to transfer data efficiently using the DTN service

3.4. How to transfer data using SCP

4. Basic elements

4.1. HDFS: Distributed storage

4.2. YARN: Execution and monitoring of works

5. Tools available

5.1. Spark

5.2. Jupyter

5.3. Hive

5.4. Impala

 5.5 Sqoop

5.6. Modules

6. Where to get additional information

6.1. Tutorials

6.2. User's Guide

6.3. Official documentation