The Winning Approaches from codeFest 2018 – NLP, Computer Vision and Machine Learning!
Analytics Vidhya’s hackathons are one of the best ways to evaluate how far you’ve traveled in your data science journey. And what better way than to put your skills to the test against the top data scientists from around the globe?
Participating in these hackathons also helps you understand where you need to improve and what else you can learn to get a better score in the next competition. And a very popular demand after each hackathon is to see how the winning solution was designed and the thought process behind it. There’s a lot to learn from this, including how you can develop your own unique framework for future hackathons.
We are all about listening to our community, so we decided to curate the winning approaches from our recently concluded hackathon series, codeFest! This was a series of three hackathons in partnership with IIT-BHU, conducted between 31st August and 2nd September. The competition was intense, with more than 1,900 aspiring data scientists going head-to-head to grab the ultimate prize!
Each hackathon had a unique element to it. Interested in finding out more? You can view the details of each competition below:
It’s time to check out the winners’ approaches!
Liguipedia – Natural Language Processing
The participants were given a list of tweets from customers about various tech firms who manufacture and sell mobiles, computers, laptops, etc. The challenge was to find the tweets which showed a negative sentiment towards such companies or products.
The metric used for evaluating the performance of the classification model was weighted F1-Score.
Abhinav and Abhishek have summarized their approach in a very intuitive manner, explaining everything from preprocessing and feature engineering to model building.
- Converted words to lower case
- Replaced URLs with the word ‘URL‘ and @handle with the word ‘USER_MENTION‘
- Removed RT (retweet), – , and ‘
- Replaced #hashtag with hashtag
- Replaced more than two dots with space, multiple spaces with a single space
- Replaced emojis with either EMO_POS or EMO_NEG
- Strip space, ” and ‘ from tweet
- Removed punctuation
- Converted more than 2 letter repetitions to 2 letter (funnnnny –> funny)
- Used stemmer
- Top 15,000 unigrams in case of sparse vector representation (one-hot encoding), 90,000 in case of dense vector representation
- They used tf-idf method in case of a sparse vector representation
- For dense vectors, they used Glove embedding (trained on tweets)
- Naive Bayes
- Maximum entropy classifier
- Decision Tree
- Random Forest
- Support Vector Machine
- Multi-layer perceptron
- Convolutional neural networks (We experimented with 1, 2, 3 and 4 layers)
- LSTM (using the last layer obtained for classification)
- LSTM (with attention mechanism)
They hypertuned each of the above classifiers and found that LSTM (with attention mechanism) produced the best result.
- They gave one weight to each classifier – Naive Bayes, Maximum entropy classifier, Decision Tree, Random Forest, XGBoost, SVM, Multi-layer perceptron; two weights to CNN, three weights to LSTM (without attention), and five weights to LSTM (with attention)
- These weights were obtained after hyperparameter tuning on a portion of data (they divided the train dataset into three parts)
Vista – Computer Vision
The Vista hackathon had a pretty intriguing problem statement. The participants had to build a model that counted the number of people in a given group selfie/photo. The dataset provided had already been split, wherein the training set consisted of images with coordinates of the bounding boxes and headcount for each image.
The evaluation metric for this competition was RMSE (root mean squared error) over the headcounts predicted for test images.
Check out Deepak’s approach in his own words below:
- As this was an object detection problem, I implemented Mask R-CNN in Python 3, using Keras and TensorFlow. The model generated bounding boxes and segmentation masks for each instance of an object in the image. It’s based on Feature Pyramid Network (FPN) and a ResNet101 backbone
Mask R-CNN and ResNet101
- Mask R-CNN is a two-stage framework:
- The first stage scans the image and generates proposals (areas likely to contain an object)
- The second stage classifies the proposals and generates bounding boxes and masks
Both stages are connected to the backbone structure.
- I have used the ResNet101 backbone. Backbone is an FPN-style deep neural network. It consists of a bottom-up pathway, a top-bottom pathway, and lateral connections:
- Bottom-up pathway extracts features from raw images
- Top-bottom pathway generates feature pyramid map which is similar in size to the bottom-up pathway
- Lateral connections are convolution and add operations between two corresponding levels of the two pathways
- I have used default 1024×1024 image size for better accuracy
- Apart from this, I also applied some data augmentation techniques to avoid overfitting and for better generalization
- I also used pre-trained weights trained on the MS-COCO dataset. MS-COCO is a large-scale object detection, segmentation, and captioning dataset. I used transfer learning and fine-tuning of pre-trained weights to train own custom Mask R-CNN model on the given dataset
- First, only the “heads” layers are trained. These include the Region Proposal Network (RPN), classifier, and mask heads of the network. Then I finetuned the layers from ResNet101. ResNet has various blocks wherein I trained only block 4 and above
- Finally, I used a weighted majority voting to ensemble the best models and predict the final values
Enigma – Machine Learning
As a part of enigma competition, the target was to predict the number of upvotes on a question based on other information provided. For every question – its tag, number of views received, number of answers, username and reputation of the question author, was provided. Using this information, the participant had to predict the upvote count that the question will receive.
The evaluation metric for this competition was RMSE (root mean squared error). Below is the data dictionary for your reference:
|Tag||Anonymised tags representing question category|
|Reputation||Reputation score of question author|
|Answers||Number of times question has been answered|
|Username||Anonymised user id of question author|
|Views||Number of times question has been viewed|
|Upvotes||(Target) Number of upvotes for the question|
Here is Raj’s approach to cracking the Enigma hackathon:
My focus was on feature engineering, i.e., using the existing features to create new features. Below are some key features I cooked up:
- The first feature I create was taking the ratio of views and answers. I believe that the ratio is a better metric than the individual number of views or answers. Otherwise, a person with more answers and a high number of total views would get more credit than a person with few (but good) answers and overall fewer views
- The second feature is the ratio of Reputation to five times the number of answers. I added the factor 5, because the reputation was roughly 5 times the answers. This ratio intuitively makes more sense
- I created another feature using the views and reputation values. I took the absolute value of the difference between views and reputation
- I used a linear regression model with polynomial features
- I have restricted the degree of polynomial to 2, as increasing this will lead to a more flexible model and increase the chances of overfitting
A big thank you to everyone for participating in codeFest 2018! This competition was all about quick and structured thinking, coding, experimentation, and finding the one approach that got you up the leaderboard. In short, what machine learning is all about!
Missed out this time? Don’t worry, you can check out all upcoming hackathons on our DataHack platform and register yourself today!