High-quality machine learning and deep learning content – that’s the piece de resistance our community loves. That’s the peg we hang our hat on at Analytics Vidhya and that has propelled our community to stardom in this field.
We love bringing the latest and greatest machine learning techniques, breakthroughs, and developments to you in the form of articles, courses, quizzes, and other mediums. In short, if you have a machine learning question, are stuck in your career, or simply want to follow the latest trends, Analytics Vidhya has the solution for you!
We take immense pride in the content that we churn; it has truly helped millions of people to enrich their knowledge in the machine learning space. The quality of our courses has been highly appreciated by the community and our blogs have broken the views record in 2019 – all thanks to you, our beloved reader.
Before we set our sail into the new dawn of 2020 (a leap year!) we wanted to share the best of the year with our amazing community. I have divided the most popular articles we’ve published in 2019 into the below categories:
- Computer Vision
- Machine Learning Algorithms and Techniques
- Natural Language Processing (NLP)
- Reinforcement Learning
- Others Topics, such as Python, R, Statistics, etc.
Let’s dive straight in!
Computer Vision Articles
Computer Vision is revolutionizing sectors from agriculture to banking, from hospitality to security, and much more. The demand for computer vision professionals is off the charts! In this section, we’re thrilled to present the most popular computer vision articles from 2019 written by leading data scientists and deep learning experts at Analytics Vidhya.
“Deep learning models required hours or days to train, especially on our local machines”. That’s a widespread belief among a lot of data science enthusiasts.
This is a myth. So, we wrote this article to showcase that it’s possible to build your own neural network from the ground up in a matter of minutes without needing to lease out Google’s servers!
Fast.ai’s students designed a model on the Imagenet dataset in 18 minutes – and this article by Pulkit Sharma has enabled you to do something similar.
Pulkit first sets the tone by making us understand how image classification models are typically designed. He categorized them into 4 stages:
- Loading and pre-processing Data – 30% time
- Defining Model architecture – 10% time
- Training the model – 50% time
- Estimation of performance – 10% time
Then, he progresses by setting up the structure of Image Data as the data needs to be in a particular format in order to solve an image classification problem.
Getting hands-on is always the best way to learn, so this article has a really cool challenge for you to understand image classification. The problem statement that is solved here is – to build a model that can classify a given set of images according to the apparel (shirt, trousers, shoes, socks, etc.).
It’s actually a real-world problem faced by many e-commerce retailers which makes it an even more interesting computer vision problem. In the end, solving this problem will give your learning an ultimate boost!
It’s a Record-Breaking Crowd! A Must-Read Tutorial to Build your First Crowd Counting Model using Deep Learning
I love this article. The applications this computer vision use case has is awesome.
We face a crowd pretty much everywhere we go, whether it is an international conference, a cricket match, a political rally, or a simple visit to the mall. It’s really difficult to know the headcount of people at specific intervals of time, especially when we calculate it manually. Well, this can be done using Deep Learning and Computer Vision.
In another awesome article by Pulkit Sharma, you will understand the different Computer Vision Techniques for Crowd Counting. Learn the architecture and training methods of CSRNet and build your own crowd counting model in Python.
Crowd counting has so many diverse applications and is already seeing adoption by organizations and government bodies.
It is a useful skill to add to your computer vision portfolio. Quite a number of industries will be looking for data scientists who can work with crowd counting algorithms. Learn it, experiment with it, and give yourself the gift of deep learning!
There are certain common challenges computer vision enthusiasts and even experts face in almost any project in this space:
- How do we clean image datasets? Images come in different shapes and sizes
- The ever-present problem of acquiring data. Should we collect more images before building our computer vision model?
- Is learning deep learning compulsory for building computer vision models? Can we not use machine learning techniques?
- Can we build a computer vision model on our own machine? Not everyone has access to GPUs and TPUs!
If you also have these questions running in your mind, then this article by Saurabh Pal will resolve all of the above and more for you.
Sourabh answers most of these questions using the awesome OpenCV library. It truly stands out like a beacon for computer vision tasks and is easily the most popular computer vision library around. You can try that hands-on and build your own applications while reading the article.
This is the perfect article if you’re new to computer vision.
Self-driving cars are fascinating but they are built for the ideal scenario – where roads are in complete symmetry and there are no sudden obstacles.
But in the real world, we hardly have this infrastructure that can comply with the need for a self-driving car environment. Here’s the good news – we have a computer vision framework that can be a big boon to change the above scenario.
That’s Mask R-CNN – the state-of-the-art framework that we can use to build such a system. It’s a technique that can detect the exact shape of the road so our self-driving car system can safely navigate the sharp turns as well.
This article by Pulkit Sharma comes in handy if you want to build your own image segmentation model using Mask R-CNN! In the end, perhaps you can also try to integrate that into a self-driving car system! 🙂 Sounds super exciting, right?
Not all of us have unlimited resources like the big technology behemoths such as Google and Facebook. So how can we work with image data if not through the lens of deep learning?
We can leverage the power of machine learning! That’s right – we can use simple machine learning models like decision trees or Support Vector Machines (SVM). If we provide the right data and features, these machine learning models can perform adequately and can even be used as a benchmark solution.
So in this beginner-friendly article, Aishwarya Singh will help you understand the different ways in which you can generate features from images in Python. You can then use these methods in your favorite machine learning algorithms!
I feel this is a very important part of a data scientist’s toolkit given the rapid rise in the number of images being generated these days. We would recommend you to get started with this now!
It’s great to learn about image classification but what do you do if there are multiple object categories in an image? Making an image classification model is a good start. What if you would want to take on a more challenging task – building a multi-label image classification model?
Well, there is a difference between multi-label and multi-class image classification which is also clarified by our author Pulkit Sharma here.
He didn’t want to explain a multi-image classification model using a toy data set so he used all of our favorite movie/TV series posters containing a variety of people. He guides you to build your very own multi-label image classification model to predict the different genres just by looking at the poster.
Machine Learning Articles
Machine learning is ubiquitous in the industry these days. Organizations around the world are scrambling to integrate machine learning into their functions and new opportunities for aspiring data scientists are growing multifold. So we have hand-picked incredible articles from our 2019 repository.
As a data scientist, do you hurriedly map predicted values on unseen data before even finishing your model? Well, this isn’t ideal and should be avoided in real-world scenarios.
In our industry, we consider different kinds of metrics to evaluate our models. The choice of metric completely depends on the type of model and the implementation plan of the model.
After you are finished building your model, these 11 metrics will help you in evaluating your model’s accuracy (based on the problem at hand):
- Confusion Matrix
- F1 Score
- Gain and Lift charts
- Kolmogorov Smirnov chart
- Area Under the ROC curve (AUC – ROC)
- Log Loss
- Gini Coefficient
- Concordant – Discordant ratio
- Root Mean Squared Error (RMSE)
- Root Mean Squared Logarithmic Error (RMSLE)
- R-Squared/Adjusted R-Squared
In addition, the metrics covered in this article are some of the most used metrics of evaluation in a classification and regression problems so check them out and start using them!
Recommendation Engines are truly fascinating and we can group similar items, products, and users together. This grouping, or segmenting, works across industries. And that’s what makes the concept of clustering such an important one in data science.
Clustering helps us understand our data in a unique way – by grouping things together into – you guessed it – clusters. In this article Pulkit Sharma we will cover k-means clustering and it’s components comprehensively.
We look at clustering, why it matters, its applications and then deep dive into k-means clustering (including how to perform it in Python on a real-world dataset). We have a live coding window where you can build your own k-means clustering algorithm as well!
I am sure this question must be fluttering in your mind:
What’s the use of learning the mathematics behind machine learning algorithms? We can easily use the widely available libraries in Python and R to build models!
This article will get this question out of the way right now – you need to understand the mathematics behind machine learning algorithms to become a data scientist. There is no way around it. It is an intrinsic part of a data scientist’s role and every recruiter and experienced machine learning professional will vouch for this.
So in this article, Sharoon Saxena discusses the mathematical aspects you need to know to become a machine learning expert, including linear algebra, probability, multivariate calculus, and statistics.
He also debugs this myth – mathematics in data science is not about crunching numbers, but about what is happening, why it’s happening, and how we can play around with different things to obtain the results we want.
This is a very, and I repeat a very important learning path to become a champion data science professional.
This has been our most popular tutorial series in 2019 and our data science community loves it. So, here we are sharing with you the most popular GitHub articles for 2019 written by Pranav Dar!
Are you ready to take that next big step in your machine learning journey? Working on toy datasets and using popular data science libraries and frameworks is a good start. But if you truly want to stand out from the competition, you need to take a leap and differentiate yourself. A brilliant way to do this is to do a project on the latest breakthroughs in data science. So why wait? Get started today.
Natural Language Processing (NLP) Articles
From the super-efficient ULMFiT framework to Google’s BERT, NLP is truly in the midst of a golden era. Are you ready to be part of this revolution? Then gear-up and sharpen your NLP skills from the best articles written by NLP experts from Analytics Vidhya.
A rapid increase in NLP adoption has happened largely thanks to the concept of transfer learning enabled through pretrained models. Transfer learning, in the context of NLP, is essentially the ability to train a model on one dataset and then adapt that model to perform different NLP functions on a different dataset.
This breakthrough has made things incredibly easy and simple for everyone, especially folks who don’t have the time or resources to build NLP models from scratch. It’s perfect for beginners as well who want to learn or transition into NLP.
In this article, Pranav Dar has showcased the top pretrained models in NLP. You can use them to start your NLP journey and replicate the state-of-the-art research in this field.
If you’re an NLP enthusiast, you’re going to love this section that talks about 5 state-of-the-art multi-purpose NLP model frameworks:
- Google’s BERT
- OpenAI’s GPT-2
Apart from this, you will also get sharp insights into word embeddings and other pre-trained models. This is a treat for all the NLP enthusiasts to go and indulge in!
Have you ever been stuck at work while a pulsating cricket match was going on? Building a chatbot that could fetch me the scores from the ongoing IPL (Indian Premier League) tournament would be a lifesaver.
Using the awesome Rasa stack for NLP, you can build a chatbot that you could use on your computer anytime. No more looking down at the phone and getting distracted.
And the cherry on top? You can deploy the chatbot to Slack, the popular platform for de facto team communications. That’s right – you could check the score anytime without having to visit any external site. Sounds too good an opportunity to pass up, right?
So get set and read this article by Mohd Sanad where you will get to learn the end-to-end process of building a chatbot using Rasa.
One thing has always been a thorn in an NLP practitioner’s mind is the inability (of machines) to understand the true meaning of a sentence. Yes, we are talking about context. Traditional NLP techniques and frameworks were great when asked to perform basic tasks. Things quickly went south when we tried to add context to the situation.
The NLP landscape has significantly changed in the last 18 months or so. NLP frameworks like Google’s BERT and Zalando’s Flair are able to parse through sentences and grasp the context in which they were written.
One of the biggest breakthroughs in this regard came thanks to ELMo, a state-of-the-art NLP framework developed by AllenNLP. ELMo is a novel way to represent words in vectors or embeddings. These word embeddings are helpful in achieving state-of-the-art (SOTA) results in several NLP tasks. NLP scientists globally have started using ELMo for various NLP tasks, both in research as well as the industry.
By the time you finish this article by Prateek Joshi, you too will have become a big ELMo fan – just as we did.
Reinforcement learning is an intriguing and complex field. We at Analytics Vidhya are strongly behind the incredible potential of this domain and the breakthroughs and research by behemoths like DeepMind support our thought process.
Games! The term itself is enough to ignite the spark of enthusiasm inside us – there is nothing quite like it!
So, when we read about the incredible algorithms DeepMind was coming up with (like AlphaGo and AlphaStar), we were hooked. This triggered our star author Ankit Choudhary to pen down this article that will help you design these systems on your own machine.
Deep reinforcement learning is relevant even if you’re not into gaming. The scope of Deep reinforcement learning is IMMENSE. This is a great time to enter into this field and make a career out of it.
This article will help you take your first steps into the world of deep reinforcement learning. We have used one of the most popular algorithms in reinforcement learning -deep Q-learning – to understand how deep RL works. And the icing on the cake? We will help you implement all our learning in an awesome case study using Python.
Other Popular Data Science Articles (Python, R, Statistics, etc.)
Data exploration comprises of many things, such as variable identification, treating missing values, feature engineering, etc. Detecting and treating outliers is also a major cog in the data exploration stage. The quality of your inputs decides the quality of your output!
PyOD is one such library to detect outliers in your data. It provides access to more than 20 different algorithms to detect outliers and is compatible with both Python 2 and 3. An absolute gem!
In this article by Lakshay Arora you will get to understand outliers and learn to detect them using PyOD in Python. Well, you might ask why only PyOD? Here is the answer:
PyOD is a scalable Python toolkit for detecting outliers in multivariate data. It provides access to around 20 outlier detection algorithms under a single well-documented API.
Without further ado, we suggest you get on this incredible learning experience on implementing PyOD in Python.
If you ask anyone about programming languages to learn in data science you will get suggestions to learn Python or R. But there are other groundbreaking programming languages that you can learn to master your skills as a data science professional.
Just a fair warning – this article can spark a debate in our community but that’s even better! We love comparisons, whether it’s Nissan vs Skoda vs Suzuki etc. they always ignite a healthy discussion – so bring it on!
So, here are the top 6 programmings you should know:
- Go (Golang)
Don’t you love how vast the field is for data science languages? Python and R are wonderful in their own right. But our aim here was to bring out other languages that we can use to perform data science tasks. To know more about each of these languages we suggest you read the article.
R offers a galaxy of packages for performing machine learning tasks, including ‘dplyr’ for data manipulation, ‘ggplot2’ for data visualization, ‘caret’ for building ML models, etc.
There are even R packages for specific functions, including credit risk scoring, scraping data from websites, econometrics, etc. In this article, we will showcase eight R packages that have gone under the radar among data scientists but are incredibly useful for performing specific machine learning tasks.
The R Packages we cover in this article include:
- Data Visualization
- Machine Learning
- Other Miscellaneous R Packages
- Bonus: More R Packages!
Isn’t that incredibly useful? I am sure this has got you super excited to know more so just quickly jump into the article and take a deep dive on all of the R packages listed above!
Classifying time series data? Is that really possible? What could potentially be the use of doing that? These are just some of the questions you must have had when you read the title of this article.
The time-series data most of us are exposed to deals primarily with generating forecasts. Whether that’s predicting the demand or sales of a product, the count of passengers in an airline or the closing price of a particular stock, we are used to leveraging tried and tested time series techniques for forecasting requirements. And so it’s important to learn this concept.
Working with complex time series datasets is still a niche field, and it’s always helpful to expand your repertoire to include new ideas. With this article, Aishwarya Singh aims to introduce you to the novel concept of time series classification.
You will first understand what this topic means and it’s applications in the industry. But you won’t stop at the theory part – you will get our hands dirty by working on a time series dataset and performing binary time series classification. Learning by doing – this will help you understand the concept in a practical manner as well.
When was the last time you learned a new Python trick? As data scientists, we are accustomed to working with familiar libraries and calling the same functions every time. It’s time to break the monotony!
Python isn’t just limited to Pandas, NumPy and scikit-learn (though they are absolutely essential in data science)! There is a whole world of Python tricks we can use to improve our code, speed up our data science tasks, and become much more efficient in writing code.
So let’s get a glimpse of the Python tricks (with examples) listed in the article by Lakshay Arora.
- Trick 1: zip: Combine Multiple Lists in Python: Quite often we end up writing complex for loops to combine more than one list together. Sounds familiar? Then you’ll love the zip function.
- Trick 3: category_encoders: Encode your Categorical Variables using 15 Different Encoding Schemes.
- Trick 4: progress_apply: Monitor the Time you Spend on Data Science Tasks.
- Trick 5: pandas_profiling: Generate a Detailed Report of your Dataset.
These I am sure, have already got you excited, well, there are 5 more essential Python tricks listed in the article and that too with examples! So check them out right away!
Get Started with PyTorch – Learn How to Build Quick & Accurate Neural Networks (with 4 Case Studies)!
PyTorch was one of the most popular frameworks in 2018. It quickly became the preferred go-to deep learning framework among researchers in both academia and the industry. After using PyTorch all throughout the year, we can confirm that it is a highly flexible and easy-to-use deep learning library.
PyTorch is a Python-based scientific computing package that is similar to NumPy but with the added power of GPUs. It is also a deep learning framework that provides maximum flexibility and speed during implementing and building deep neural network architectures.
In this article by Shivam Bansal, we will explore what PyTorch is all about. But your learning won’t stop with the theory – we will code through 4 different use cases and see how well PyTorch performs. Building deep learning models has never been this fun!
What an awesome year 2019 was! So many techniques to learn in the field of machine learning. I am sure you will find many more informative articles on any topic of your interest at Analytics Vidhya.
The best part? We just don’t give information about a topic – we encourage you to get your hands dirty and practically implement them. And we are forevermore thankful to you, our readers for believing in the content we generate.
What was your favorite article from 2019?You can also read this article on our Mobile APP