In this online course, analytics professionals will be introduced to Hadoop, and provided with an exemplar workflow for using Hadoop. They also will be introduced to writing MapReduce jobs, and leveraging Hadoop Streaming to conclude work in an analytics programming language such as Python.
In this course you will learn
- What Hadoop is hand how to leverage it to perform analytics
- The software components of the Hadoop Ecosystem
- How to manage data on a distributed file system
- How to write MapReduce jobs to perform computations with Hadoop
- How to utilize Hadoop Streaming to output jobs
- Week 1: A Distributed Computing Environment
- Week 2: Working with Hadoop
- Week 3: Computing with MapReduce
- Week 4: Towards Last Mile Computation
October 30, 2015 to November 27, 2015
About 15 hours per week, at times of your choosing.
INR 32,940 (assuming $ = INR 60)
Part Time/Full Time:
Data scientists and statisticians with programming experience who need to deal with large data sets and want to learn about Hadoop’s distributing computing capability should take Introduction to Analytics using Hadoop. This course is particularly suited to data scientists that need to access and analyze large amounts of unstructured or semi-structured data that do not fit well into traditional relational databases.
These are listed for your benefit so you can determine for yourself, whether you have the needed background, whether from taking the listed courses, or by other experience.
- Command line experience on Linux, to manage system processes, find appropriate files and set permissions.
- Familiarity with Python or another programming language to leverage Hadoop streaming to perform computations
- Apache Hadoop
- Benjamin Bengfort