Blogs / Tutorials
resources to learn big data, data science

Introduction to Hadoop

Hadoop is a complete eco-system of open source projects which provides a framework to deal with Big Data. Here’s is a simple explanation of Hadoop using interesting examples.

 

Introduction to MapReduce

Following the series of introduction, here’s a simple explanation of MapReduce, a programming model used for processing large data sets.

 

Hadoop beyond traditional MapReduce – Simplified

After acquiring basic knowledge of Hadoop and MapReduce, it’s time to move to advanced concepts. This article covers topic such as hadoop extended system, apache pig, hive, impala, sqoop, flume, hive and other related concepts.

 

Tricking your elephant to do data manipulations (using MapReduce)

This article highlights the applications of MapReduce with HDFS using various tips and tricks helpful to perform big data computations faster.

 

All out beginner’s guide to MongoDB

Here’s a complete beginners guide to learn MongoDB. The main intent of this article is to explain the working process of MongoDB and its related components in the simplest possible manner.

 

Getting Mongo-ed in NoSQL manager, R & Python

After you’ve read about the basics of MongoDB, this should be your immediate step to learn about the use of MongoDB in R, Python and NoSQL. Its ability to effortlessly integrate with 3rd party technologies  makes MongoDB one of the hot choices in Big Data industry.

 

Learn Cloud Computing in R

This article explains the concept of cloud computing in R Programming and R studio using a stepwise methodology. Furthermore, you will also learn about the benefits of using R over cloud as compared to the traditional desktop or Local client / Server architecture.

 

 

Awesome Big Data – GitHub Repository

Here’s a github repository featuring all the resources necessary to master big data technologies. It appears to be an exhaustive resource guide for big data, however you makes sure that you don’t get lost in this abundant list of resources and stay focussed on what you wish to learn.

 

Learning Path SparkR

Here is a resource to get you started to learn sparkR, an R library to leverage the Apache Spark technology.

 

Comprehensive Introduction to Apache Spark

Here’s a comprehensive article from basic of distributed computing to learning to utilize Apache Spark technology for massive gains in terms of speed and scalability.

 

 

Best of YouTube videos

Learn Big Data Analytics using Top YouTube Videos, TED Talks & other resources

 

Trainings & Certifications:

  • Big Data University: Big Data University is a cloud-based online education site which offers both free and paid courses taught by a group of professionals and educators who have extensive experience with Hadoop, Big Data and DB2. It aims at making Big Data Education freely available to everyone so that it can lead to insights and discoveries in varied fields such as Healthcare and the Enviroment. Most courses include lab classes that you can perform on the Cloud, on VMWare images, or by locally installing the required software. Completely free of cost, learners get a certificate on passing the final exam.
  • Cloudera: Cloudera provides globally recognized certification for Big Data. Cloudera certifies true specialists who have demonstrated their abilities to execute at the highest level on both traditional exams and hands-on challenges with live data sets.
  • Coursera: Do you need to understand big data and how it will impact your business? This Specialization is for you. You will gain an understanding of what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Previous programming experience is not required! You will be guided through the basics of using Hadoop with MapReduce, Spark, Pig and Hive. By following along with provided code, you will experience how one can perform predictive modeling and leverage graph analytics to model problems. This specialization will prepare you to ask the right questions about data, communicate effectively with data scientists, and do basic exploration of large, complex datasets.