Text Classification & Word Representations using FastText (An NLP library by Facebook)

NSS 06 Jun, 2023
11 min read

Introduction

If you put a status update on Facebook about purchasing a car -don’t be surprised if Facebook serves you a car ad on your screen. This is not black magic! This is Facebook leveraging the text data to serve you better ads.

The picture below takes a jibe at a challenge while dealing with text data.

facebook ad serving using nlp

Well, it clearly failed in the above attempt to deliver the right ad. It is all the more important to capture the context in which the word has been used. This is a common problem in Natural Processing Language (NLP) tasks.

A single word with the same spelling and pronunciation (homonyms) can be used in multiple contexts and a potential solution to the above problem is computing word representations.

Now, imagine the challenge for Facebook. Facebook deals with enormous amount of text data on a daily basis in the form of status updates, comments etc. And it is all the more important for Facebook to utilise this text data to serve its users better. And using this text data generated by billions of users to compute word representations was a very time expensive task until Facebook developed their own library FastText, for Word Representations and Text Classification.

In this article, we will see how we can calculate Word Representations and perform Text Classification, all in a matter of seconds in comparison to the existing methods which took days to achieve the same performance.

What is FastText?

FastText is an open-source library for text representation and classification developed by Facebook’s AI Research (FAIR) team. It is designed to efficiently handle large amounts of text data and provides tools for text classification, word representation, and text similarity computation.

At its core, FastText uses the concept of word embeddings, which are dense vector representations of words in a continuous vector space. Word embeddings capture semantic and syntactic relationships between words based on their distributional properties in a given text corpus.

FastText extends the idea of word embeddings to represent entire words or subwords, called n-grams. Instead of considering words as atomic units, FastText breaks them down into smaller subword units, such as character n-grams. By doing so, it can capture morphological information and handle out-of-vocabulary words efficiently.

The training process of FastText involves learning these word and subword embeddings using a technique called continuous bag of words (CBOW) with negative sampling. CBOW predicts a target word based on the surrounding context words, and negative sampling helps train the model efficiently even with large vocabularies.

FastText supports both unsupervised and supervised learning tasks. In the unsupervised setting, it can learn word embeddings solely based on the distributional properties of words in the training corpus. In the supervised setting, it can perform text classification tasks, where it learns to classify text documents into predefined categories.

FastText has gained popularity due to its ability to handle large-scale text data efficiently. It has been used for various applications, including text classification, language identification, information retrieval, and text similarity computation.

FastText is a used for efficient learning of word representations and sentence classification.

Uses of FastText

This library has gained a lot of traction in the NLP community and is a possible substitution to the gensim package which provides the functionality of Word Vectors etc. If you are new to the Word Vectors and word representations in general then, I suggest you read this article first.

But the question that we should be really asking is – How is FastText different from gensim Word Vectors?

FastText differs in the sense that word vectors a.k.a word2vec treats every single word as the smallest unit whose vector representation is to be found but FastText assumes a word to be formed by a n-grams of character, for example, sunny is composed of [sun, sunn,sunny],[sunny,unny,nny]  etc, where n could range from 1 to the length of the word. This new representation of word by fastText provides the following benefits over word2vec or glove.

  1. It is helpful to find the vector representation for rare words. Since rare words could still be broken into character n-grams, they could share these n-grams with the common words. For example, for a model trained on a news dataset, the medical terms eg: diseases can be the rare words.Common and Rare words for a NLP task
  2. It can give the vector representations for the words not present in the dictionary (OOV words) since these can also be broken down into character n-grams. word2vec and glove both fail to provide any vector representations for words not in the dictionary.
    For example, for a word like stupedofantabulouslyfantastic, which might never have been in any corpus, gensim might return any two of the following solutions – a) a zero vector    or      b) a random vector with low magnitude. But FastText can produce vectors better than random by breaking the above word in chunks and using the vectors for those chunks to create a final vector for the word. In this particular case, the final vector might be closer to the vectors of fantastic and fantabulous.
  3. character n-grams embeddings tend to perform superior to word2vec and glove on smaller datasets.

We will now look at the steps to install the fastText library below.

Installation of Fasttext

To make full use of the FastText library, please make sure you have the following requirements satisfied:

  1. OS – MacOS or Linux
  2. C++ complier – gcc or clang
  3. Python 2.6+, numpy and scipy.

If you do not have the above pre-requisites, I urge you to go ahead and install the above dependencies first.

To install FastText, type the code below-

  1. git clone https://github.com/facebookresearch/fastText.git
  2. cd fastText
  3. make

You can check whether FastText has been properly installed by typing the below command inside the FastText folder.
./fasttext

If everything was installed correctly then, you should see the list of available commands for FastText as the output.

Fasttext Implementation

As stated earlier, FastText was designed for two specific purposes- Word Representation Learning and Text Classification. We will see each of these steps in detail. Let us get started with learning word representations.

Learning Word Representations

Words in their natural form cannot be used for any Machine Learning task in general. One way to use the words is to transform these words into some representations that capture some attributes of the word. It is analogous to describing a person as – [‘height’:5.10 ,’weight’:75, ‘colour’:’dusky’, etc.] where height, weight etc are the attributes of the person. Similarly, word representations capture some abstract attributes of words in the manner that similar words tend to have similar word representations. There are primarily two methods used to develop word vectors – Skipgram and CBOW.

We will see how we can implement both these methods to learn vector representations for a sample text file using fasttext.

Learning word representations using Skipgram and CBOW models

  1.  Skipgram
    ./fasttext skipgram -input file.txt -output model
  2. CBOW
    ./fasttext cbow -input file.txt -output model

Let us see the parameters defined above in steps for easy understanding.

./fasttext – It is used to invoke the FastText library.
skipgram/cbow – It is where you specify whether skipgram or cbow is to be used to create the word representations.
-input – This is the name of the parameter which specifies the following word to be used as the name of the file used for training. This argument should be used as is.
data.txt – a sample text file over which we wish to train the skipgram or cbow model. Change this name to the name of the text file you have.
-output – This is the name of the parameter which specifies the following word to be used as the name of the model being created. This argument is to be used as is.
model – This is the name of the model created.

Running the above command will create two files named model.bin and model.vec. model.bin contains the model parameters, dictionary and the hyperparameters and can be used to compute word vectors. model.vec is a text file that contains the word vectors for one word per line.

Now since we have created our own word vectors let’s see if we can do some common tasks like print word vectors for a word, find similar words, analogies etc. using these word vectors.

Print word vectors of a word

In order to get the word vectors for a word or set of words, save them in a text file. For example, here is a sample text file named queries.txt that contains some random words. We will get the vector representation of these words using the model we trained above.

./fasttext print-word-vectors model.bin < queries.txt

To check word vectors for a single word without saving into a file, you can do

echo "word" | ./fasttext print-word-vectors model.bin

Finding similar words

You can also find the words most similar to a given word. This functionality is provided by the nn parameter. Let’s see how we can find the most similar words to “happy”.

./fasttext nn model.bin

After typing the above command, the terminal will ask you to input a query word.

happy

by 0.183204
be 0.0822266
training 0.0522333
the 0.0404951
similar 0.036328
and 0.0248938
The 0.0229364
word 0.00767293
that 0.00138793
syntactic -0.00251774

The above is the result returned for the most similar words to happy. Interestingly, this feature could be used to correct spellings too. For example, when you enter a wrong spelling, it shows the correct spelling of the word if it occurred in the training file.

wrd

word 0.481091
words. 0.389373
words 0.370469
word2vec 0.354458
more 0.345805
and 0.333076
with 0.325603
in 0.268813
Word2vec 0.26591
or 0.263104

Analogies

FastText word vectors can also be used on analogies task of the kind, what is to C, what B is to A. Here, A, B and C are the words.

The analogies functionality is provided by the parameter analogies. Let’s see this with the help of an example.

./fasttext analogies model.bin

The above command will ask to input the words in the form A-B+C, but we just need to give three words separated by space.

happy sad angry

of 0.199229
the 0.187058
context 0.158968
a 0.151884
as 0.142561
The 0.136407
or 0.119725
on 0.117082
and 0.113304
be 0.0996916

Training on a very large corpus will produce better results.

Text Classification

As suggested by the name, text classification is tagging each document in the text with a particular class. Sentiment analysis and email classification are classic examples of text classification. In this era of technology, millions of digital documents are being generated each day. It would cost a huge amount of time as well as human efforts to categorise them in reasonable categories like spam and non-spam, important and unimportant and so on. Text classification techniques of NLP come here to our rescue. Let’s see how by doing hands-on practice based on a sentiment analysis problem. I have taken the data for this analysis from kaggle.

Before we jump upon the execution, there is a word of caution about the training file. The default format of text file on which we want to train our model should be    _ _ label _ _ <X>  <Text>

Where _ _label_ _ is a prefix to the class and <X> is the class assigned to the document. Also, there should not be quotes around the document and everything in one document should be on one line.

sample fastText file format


In fact, the reason why I have selected this data for this article is that the data is already available exactly in the required default format.If you are completely new to FastText and implementing text classification for very first time in FastText, I would strongly recommend using the data mentioned above.

In case your data has some other formats of the label, don’t be bothered. FastText will take care of it once you pass a suitable argument. We will see how to do it in a moment. Just stick to the article.

After this briefing about text classification, let’s move ahead and land on the implementation part. We will be using the train.ft text file to train the model and test.ft file to predict.

#training the classifier
./fasttext supervised -input train.ft.txt -output model_kaggle -label  __label__

Here, the parameters are same as the one mentioned while creating word representations. The only additional parameter is -label. This argument takes care of the format of the label specified. The file that you downloaded contains labels with the prefix __label__.

If you do not wish to use default parameters for training the model, then they can be specified during the training time. For example, if you explicitly want to specify the learning rate of the training process then you can use the argument -lr to specify the learning rate.

./fasttext supervised -input train.ft.txt -output model_kaggle -label  __label__ -lr 0.5

The other available parameters that can be tuned are –

  • -lr : learning rate [0.1]
  • -lrUpdateRate : change the rate of updates for the learning rate [100]
  • -dim : size of word vectors [100]
  • -ws : size of the context window [5]
  • -epoch : number of epochs [5]
  • -neg : number of negatives sampled [5]
  • -loss : loss function {ns, hs, softmax} [ns]
  • -thread : number of threads [12]
  • -pretrainedVectors : pretrained word vectors for supervised learning []
  • -saveOutput : whether output params should be saved [0]

The values in the square brackets [] represent the default values of the parameters passed.

# Testing the result
./fasttext test model_kaggle.bin test.ft.txt

N 400000
P@1 0.916
R@1 0.916

Number of examples: 400000
P@1 is the precision
R@1 is the recall

# Predicting on the test dataset
./fasttext predict model_kaggle.bin test.ft.txt

# Predicting the top 3 labels
./fasttext predict model_kaggle.bin test.ft.txt 3

Computing Sentence Vectors (Supervised)

This model can also be used for computing the sentence vectors. Let us see how we can compute the sentence vectors by using the following commands.

echo "this is a sample sentence" | ./fasttext print-sentence-vectors model_kaggle.bin
0.008204 0.016523 -0.028591 -0.0019852 -0.0043028 0.044917 -0.055856 -0.057333 0.16713 0.079895 0.0034849 0.052638 -0.073566 0.10069 0.0098551 -0.016581 -0.023504 -0.027494 -0.070747 -0.028199 0.068043 0.082783 -0.033781 0.051088 -0.024244 -0.031605 0.091783 -0.029228 -0.017851 0.047316 0.013819 0.072576 -0.004047 -0.10553 -0.12998 0.021245 0.0019761 -0.0068286 0.021346 0.012595 0.0016618 0.02793 0.0088362 0.031308 0.035874 -0.0078695 0.019297 0.032703 0.015868 0.025272 -0.035632 0.031488 -0.027837 0.020735 -0.01791 -0.021394 0.0055139 0.009132 -0.0042779 0.008727 -0.034485 0.027236 0.091251 0.018552 -0.019416 0.0094632 -0.0040765 0.012285 0.0039224 -0.0024119 -0.0023406 0.0025112 -0.0022772 0.0010826 0.0006142 0.0009227 0.016582 0.011488 0.019017 -0.0043627 0.00014679 -0.003167 0.0016855 -0.002838 0.0050221 -0.00078066 0.0015846 -0.0018429 0.0016942 -0.04923 0.056873 0.019886 0.043118 -0.002863 -0.0087295 -0.033149 -0.0030569 0.0063657 0.0016887 -0.0022234

Pros and Cons of FastText

Like every library in development, it has its pros and cons. Let us state them explicitly.

Pros

  1. The library is surprisingly very fast in comparison to other methods for achieving the same accuracy. Here is the result published by the Facebook research team in support of the argument.Comparison of FastText with other Word Representation models
  2. Sentence Vectors(supervised) can be easily computed.
  3. fastText works better on small datasets in comparison to gensim.
  4. fastText performs superior to gensim in terms of syntactic performance and fairs equally well in case of semantic performance.

Cons

  1. This is not a standalone library for NLP since it will require another library for the pre-processing steps.
  2. Though, this library has a python implementation. It is not officially supported.

Projects

Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on the challenge? Accelerate your NLP journey with the following Practice Problems:

Practice Problem: Identify the SentimentsIdentify the sentiment of tweets
Practice Problem : Twitter Sentiment AnalysisTo detect hate speech in tweets

Frequently Asked Questions

Q1. Is FastText a neural network?

A. Yes, FastText utilizes a neural network architecture. It employs a shallow neural network with a single hidden layer for training word and subword embeddings. The model uses a technique called continuous bag of words (CBOW) with negative sampling for learning. FastText is a neural network-based approach for efficient text representation and classification tasks.

Q2. Which is better Bert embeddings or FastText?

A. The choice between BERT embeddings and FastText depends on the specific task and requirements. BERT embeddings capture contextual information effectively, making them suitable for tasks like sentiment analysis and named entity recognition. FastText is more efficient for handling large-scale text data and can handle out-of-vocabulary words well. Ultimately, the selection should be based on the specific needs of the application.

End Notes

This article was aimed at making you aware of the FastText library as an alternative to the word2vec model and also letting you make your first vector representation and text classification model.

For people who want to go in greater depth of the difference in performance of fastText and gensim, you can visit this link, where a researcher has carried out the comparison using a jupyter notebook and some standard text datasets.

Please feel free to try out this library and share your experiences in the comment below.

Learn, Engage, Compete & Get Hired

NSS 06 Jun, 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,