你将学到什么
Python Programming
Apache Hadoop
Mapreduce
Apache Spark
课程概况
This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You’ll feel empowered to have conversations about big data and the data analysis process.
课程大纲
第 1 周 Hadoop Basics
Lesson 1: Big Data Hadoop Stack
Lesson 2: Hands-On Exploration of the Cloudera VM
Quiz: Basic Hadoop Stack
第 2 周 Introduction to the Hadoop Stack
Lesson 1: Overview of the Hadoop Stack
Lesson 2: The Hadoop Execution Environment
Lesson 3: Overview of Hadoop based Applications and Services
Quiz: Overview of Hadoop Stack
Quiz: Hadoop Execution Environment
Quiz: Hadoop Applications
第 3 周 Introduction to Hadoop Distributed File System (HDFS)
Lesson 1: HDFS Architecture and Configuration
Lesson 2: HDFS Performance and Tuning
Lesson 3: HDFS Access, Commands, APIs, and Applications
Quiz: HDFS Architecture
Quiz: HDFS performance,tuning, and robustness
Quiz: Accessing HDFS
第 4 周 Introduction to Map/Reduce
Lesson 1: Introduction to Map/Reduce
Lesson 2: Map/Reduce Examples and Principles
作业: Running Wordcount with Hadoop streaming, using Python code
Quiz: Lesson 1 Review
作业: Joining Data
第 5 周 Spark
Lesson 1: Introduction to Apache Spark
Lesson 2: Resilient Distributed Datasets and Transformations
Lesson 3: Job scheduling, Actions, Caching and Shared Variables
Quiz: Spark Lesson 1
Quiz: Spark Lesson 2
作业: Simple Join in Spark
Quiz: Spark Lesson 3
作业: Advanced Join in Spark