Big Data Workshop: Introduction to CESGA’s Hadoop 3 platform
This workshop will be a introduction to CESGA’s Big Data platform. This platform was updated to Hadoop 3 and includes a new version of Spark 2.4.
This workshop will serve as an introduction to the tools available within the Hadoop 3 platform. The workshop will also serve as the basis for subsequent workshops on specific tools included in the platform, such as Spark.
Date: 11 june 10:00 a.m. to 13:00 p.m.
Place: CESGA, Avda. de Vigo s/n, Campus Vida – Santiago de compostela
Recipients: The workshop is intended for both current users of the Big Data platform and new users who need access to Big Data tools.
What will I learn during the workshop?
At the end of the workshop you will know:
- How to connect to the Hadoop 3 platform
- How to transfer data efficiently
- What tools that are available
- How to launch these tools
What will NOT be taught during the workshop?
Given that this is an introductory workshop and the wide variety of tools available, this workshop will not explain how to use each of the tools, but it will simply show how to access them, their main features and how they are launched.
Subsequently, specific workshops will be held that will focus on specific tools such as Spark, where its use will be taught in detail.
1. Introduction to the Big Data service
1.1. Basic concepts
1.2. Hardware description
1.3. Software description
2. Connection to the Hadoop 3 service
2.2. Command line access: SSH
2.3. Access through the web interface: WebUI> HUE
2.4. Access through a remote desktop
3. Data transfer
3.1. File systems quote
3.2. Migration of data from the old platform
3.3. How to transfer data efficiently using the DTN service
3.4. How to transfer data using SCP
4. Basic elements
4.1. HDFS: Distributed storage
4.2. YARN: Execution and monitoring of works
5. Tools available
6. Where to get additional information
6.2. User’s Guide
6.3. Official documentation