Introduction to Flair for NLP: A Simple yet Powerful State-of-the-Art NLP Library

Sharoon Saxena 16 Dec, 2019 • 13 min read

Introduction

Last couple of years have been incredible for Natural Language Processing (NLP) as a domain! We have seen multiple breakthroughs – ULMFiT, ELMo, Facebook’s PyText, Google’s BERT, among many others. These have rapidly accelerated the state-of-the-art research in NLP (and language modeling, in particular).

We can now predict the next sentence, given a sequence of preceding words.

What’s even more important is that machines are now beginning to understand the key element that had eluded them for long.

Context! Understanding context has broken down barriers that had prevented NLP techniques making headway before. And today, we are going to talk about one such library – Flair.

Until now, the words were either represented as a sparse matrix or as word embeddings such as GLoVe, Bert and ELMo, and the results have been pretty impressive. But, there’s always room for improvement and Flair is willing to stand up to it.

In this article, we will first understand what Flair is and the concept behind it. Then we’ll dive into implementing NLP tasks using Flair. Get ready to be impressed by its accuracy!

Please note that this article assumes familiarity with NLP concepts. You can go through the below articles if you need a quick refresher:

 

Table of contents

  1. What is ‘Flair’ Library?
  2. What gives Flair the Edge
  3. Introduction to Contextual String Embeddings for Sequence Labeling
  4. Performing NLP Tasks in Python using Flair
  5. What’s Next for Flair?

 

What is ‘Flair’ Library?

Flair is a simple natural language processing (NLP) library developed and open-sourced by Zalando Research. Flair’s framework builds directly on PyTorch, one of the best deep learning frameworks out there. The Zalando Research team has also released several pre-trained models for the following NLP tasks:

  1. Name-Entity Recognition (NER): It can recognise whether a word represents a person, location or names in the text.
  2. Parts-of-Speech Tagging (PoS): Tags all the words in the given text as to which “part of speech” they belong to.
  3. Text Classification: Classifying text based on the criteria (labels)
  4. Training Custom Models: Making our own custom models.

All of this looks promising. But what truly caught my attention was when I saw Flair outperforming several state-of-the-art results in NLP. Check out this table:

Note: F1 score is an evaluation metric primarily used for classification tasks. It’s often used in machine learning projects over the accuracy metric when evaluating models. The F1 score takes into consideration the distribution of the classes present.

 

What Gives Flair the Edge?

There are plenty of awesome features packaged into the Flair library. Here’s my pick of the most prominent ones:

  1. It comprises of popular and state-of-the-art word embeddings, such as GloVe, BERT, ELMo, Character Embeddings, etc. There are very easy to use thanks to the Flair API
  2. Flair’s interface allows us to combine different word embeddings and use them to embed documents. This in turn leads to a significant uptick in results
  3. ‘Flair Embedding’ is the signature embedding provided within the Flair library. It is powered by contextual string embeddings. We’ll understand this concept in detail in the next section
  4. Flair supports a number of languages – and is always looking to add new ones

 

Introduction to Contextual String Embeddings for Sequence Labeling

Context is so vital when working on NLP tasks. Learning to predict the next character based on previous characters forms the basis of sequence modeling.

Contextual String Embeddings leverage the internal states of a trained character language model to produce a novel type of word embedding. In simple terms, it uses certain internal principles of a trained character model, such that words can have different meaning in different sentences.

Note: A language and character model is a probability distribution of Words / Characters such that every new word or character depends on the words or characters that came before it. Have a look here to know more about it.

There are two primary factors powering contextual string embeddings:

  1. The words are trained as characters (without any notion of words). Aka, it works similar to character embeddings
  2. The embeddings are contextualised by their surrounding text. This implies that the same word can have different embeddings depending on the context. Quite similar to natural human language, isn’t it? The same word may have different meanings in different situations

Let’s look at an example to understand this:

  • Case 1: Reading a book
  • Case 2: Please book a train ticket

Explanation:

  • In case 1, book is an OBJECT
  • In case 2, book is a VERB

Language is such a wonderful yet complex thing. You can read more about Contextual String Embeddings in this Research Paper.

 

Performing NLP Tasks in Python using Flair

It’s time to put Flair to the test! We’ve seen what this awesome library is all about. Now let’s see firsthand how it works on our machines.

We’ll use Flair to perform all the below NLP tasks in Python:

  1. Text Classification using the Flair embeddings
  2. Part of Speech Tagging (PoS) and comparison with the NLTK library

 

Setting up the Environment

We will be using Google Colaboratory for running our code. One of the best things about Colab is that it provides GPU support for free! It is pretty handy for training deep learning models.

 

Why use Colab?

  • Completely free
  • Comes with pretty decent hardware configuration
  • It’s on your web browser so even old machines with outdated hardware can run it
  • Connected to your Google Drive
  • Very well integrated with Github

All you need is a stable internet connection.

 

About the Dataset

We’ll be working on the Twitter Sentiment Analysis practice problem. Go ahead and download the dataset from there (you’ll need to register/log in first).

The problem statement posed by this challenge is:

The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.

 

1. Text Classification Using Flair Embeddings

Overview of steps:

Step 1: Import the data into the local Environment of Colab:

Step 2: Installing Flair

Step 3: Preparing text to work with Flair

Step 4: Word Embeddings with Flair

Step 5: Vectorizing the text

Step 6: Partitioning the data for Train and Test Sets

Step 7: Time for predictions!

 

     Step 1: Import the data into the local Environment of Colab:

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1GhyH4k9C4uPRnMAMKhJYOqa-V9Tqt4q8' ### File ID ###
data = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString()))

You can find the file ID in the shareable link of the dataset file in the drive.

Importing the dataset into the Colab notebook:

import io
Import pandas as pd
data = pd.read_csv(io.StringIO(data.GetContentString())) 
data.head()

All the emoticons and symbols have been removed from the data and the characters have been converted to lowercase. Additionally, our dataset has already been divided into train and test sets. You can download this clean dataset from here.

 

Step 2: Installing Flair

# download flair library #
import torch
!pip install flair
import flair

 

A Brief look at Flair Data Types

There are two types of objects central to this library – Sentence and Token objects. A Sentence holds a textual sentence and is essentially a list of Tokens:

from flair.data import Sentence
# create a sentence #
sentence = Sentence('Blogs of Analytics Vidhya are Awesome.')
# print the sentence to see what’s in it. #
print(Sentence)

 

Step 3: Preparing text to work with Flair

#extracting the tweet part#
text = data['tweet'] 
 ## txt is a list of tweets ##
txt = text.tolist()
print(txt[:10])

 

Step 4: Word Embeddings with Flair

Feel free to first go through this article if you’re new to word embeddings: An Intuitive Understanding of Word Embeddings.

## Importing the Embeddings ##
from flair.embeddings import WordEmbeddings
from flair.embeddings import CharacterEmbeddings
from flair.embeddings import StackedEmbeddings
from flair.embeddings import FlairEmbeddings
from flair.embeddings import BertEmbeddings
from flair.embeddings import ELMoEmbeddings
from flair.embeddings import FlairEmbeddings

### Initialising embeddings (un-comment to use others) ###
#glove_embedding = WordEmbeddings('glove')
#character_embeddings = CharacterEmbeddings()
flair_forward  = FlairEmbeddings('news-forward-fast')
flair_backward = FlairEmbeddings('news-backward-fast')
#bert_embedding = BertEmbedding()
#elmo_embedding = ElmoEmbedding()

stacked_embeddings = StackedEmbeddings( embeddings = [ 
                                                       flair_forward-fast, 
                                                       flair_backward-fast
                                                      ])

You would have noticed we just used some of the most popular word embeddings above. Awesome! You can remove the comments ‘#’ to use all the embeddings.

Now you might be asking – What in the world are “Stacked Embeddings”? Here, we can combine multiple embeddings to build a powerful word representation model without much complexity. Quite like ensembling, isn’t it?

We are using the stacked embedding of Flair only for reducing the computational time in this article. Feel free to play around with this and other embeddings by using any combination you like.

Testing the stacked embeddings:

# create a sentence #
sentence = Sentence(‘ Analytics Vidhya blogs are Awesome .')
# embed words in sentence #
stacked.embeddings(sentence)
for token in sentence:
  print(token.embedding)
# data type and size of embedding #
print(type(token.embedding))
# storing size (length) #
z = token.embedding.size()[0]

 

Step 5: Vectorizing the text

We’ll be showcasing this using two approaches.

 

Mean of Word Embeddings within a Tweet

We will be calculating the following in this approach:

For each sentence:

  1. Generate word embedding for each word
  2. Calculate the mean of the embeddings of each word to obtain the embedding of the sentence
from tqdm import tqdm ## tracks progress of loop ##

# creating a tensor for storing sentence embeddings #
s = torch.zeros(0,z)

# iterating Sentence (tqdm tracks progress) #
for tweet in tqdm(txt):   
  # empty tensor for words #
  w = torch.zeros(0,z)   
  sentence = Sentence(tweet)
  stacked_embeddings.embed(sentence)
  # for every word #
  for token in sentence:
    # storing word Embeddings of each word in a sentence #
    w = torch.cat((w,token.embedding.view(-1,z)),0)
  # storing sentence Embeddings (mean of embeddings of all words)   #
  s = torch.cat((s, w.mean(dim = 0).view(-1, z)),0)

 Document Embedding: Vectorizing the entire Tweet

from flair.embeddings import DocumentPoolEmbeddings

### initialize the document embeddings, mode = mean ###
document_embeddings = DocumentPoolEmbeddings([
                                              flair_embedding_backward,
                                              flair_embedding_forward
                                             ])
# Storing Size of embedding #
z = sentence.embedding.size()[1]

### Vectorising text ###
# creating a tensor for storing sentence embeddings
s = torch.zeros(0,z)
# iterating Sentences #
for tweet in tqdm(txt):   
  sentence = Sentence(tweet)
  document_embeddings.embed(sentence)
  # Adding Document embeddings to list #
  s = torch.cat((s, sentence.embedding.view(-1,z)),0)

You can choose either approach for your model. Now that our text is vectorised, we can feed it to our machine learning model!

Step 6: Partitioning the data for Train and Test Sets

## tensor to numpy array ##
X = s.numpy()   

## Test set ##
test = X[31962:,:]
train = X[:31962,:]

# extracting labels of the training set #
target = data['label'][data['label'].isnull()==False].values

 

Step 6: Building the Model and Defining Custom Evaluator (for F1 Score)

Defining custom F1 evaluator for XGBoost

def custom_eval(preds, dtrain):
    labels = dtrain.get_label().astype(np.int)
    preds = (preds >= 0.3).astype(np.int)
    return [('f1_score', f1_score(labels, preds))]

Building the XGBoost model

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

### Splitting training set ###
x_train, x_valid, y_train, y_valid = train_test_split(train, target,  
                                                      random_state=42, 
                                                          test_size=0.3)

### XGBoost compatible data ###
dtrain = xgb.DMatrix(x_train,y_train)         
dvalid = xgb.DMatrix(x_valid, label = y_valid)

### defining parameters ###
params = {
          'colsample': 0.9,
          'colsample_bytree': 0.5,
          'eta': 0.1,
          'max_depth': 8,
          'min_child_weight': 6,
          'objective': 'binary:logistic',
          'subsample': 0.9
          }

### Training the model ###
xgb_model = xgb.train(
                      params,
                      dtrain,
                      feval= custom_eval,
                      num_boost_round= 1000,
                      maximize=True,
                      evals=[(dvalid, "Validation")],
                      early_stopping_rounds=30
                      )

Our model has been trained and is ready for evaluation! Note: The parameters were taken from this Notebook.

 

Step 7: Time for predictions!

### Reformatting test set for XGB ###
dtest = xgb.DMatrix(test)

### Predicting ###
predict = xgb_model.predict(dtest) # predicting

I uploaded the predictions to the practice problem page with 0.2 as probability threshold:

Word Embedding F1- Score
Glove 0.53
flair-forward -fast 0.45
flair-backward-fast 0.48
Stacked (flair-forward-fast + flair-backward-fast) 0.54

 

Note: According to Flair’s official documentation, stacking of the flair embedding with other embeddings often yields even better results, But, there is a catch..

It might take a VERY LONG time to compute on a CPU. I highly recommend leveraging a GPU for faster results. You can use the free one within Colab!

 

2. Part of Speech (POS) Tagging with Flair

We will be using a subset of the Conll-2003 dataset, is a pre-tagged dataset in English. Download the dataset from here.

Overview of steps:

Step 1: Importing the dataset

Step 2 : Extracting Sentences and PoS Tags from the dataset

Step 3: Tagging the text using NLTK and Flair

Step 4: Evaluating the PoS tags from NLTK and Flair against the tagged dataset

 

Step 1: Importing the dataset

### file was uploaded manually to local environment of Colab ###
data = open('pos-tagged_corpus.txt','r')
txt = data.read()
#print(txt)

The data file contains one word per line, with empty lines representing sentence boundaries.

 

Step 2 : Extracting Sentences and PoS Tags from the dataset

### converting text in form of list of (words with their tags) ###
txt = txt.split('\n')

### removing DOCSTART (document header)
txt = [x for x in txt if x != '-DOCSTART- -X- -X- O']
### check ###
for i in range(10):
  print(txt[i])
  print(‘-’*10)

### Extracting Sentences ###
# Initialize empty list for storing words
words = []
# initialize empty list for storing sentences #
corpus = []

for i in tqdm(txt):
  ## if blank sentence encountered ##
  if i =='':
    ## previous words form a sentence ##
    corpus.append(' '.join(words))
    ## Refresh Word list ##
    words = []
  else:
   ## word at index 0 ##
    words.append(i.split()[0])
  
# did it work? #
for i in range(10):
  print(corpus[i])
  print(‘-’*10)


### Extracting POS ###
# Initialize empty list for storing word pos
w_pos = []
#initialize empty list for storing sentence pos #
POS = []
for i in tqdm(txt):
  ## blank sentence = new line ##
  if i =='':
    ## previous words form a sentence POS ##
    POS.append(' '.join(w_pos))
    ## Refresh words list ##
    w_pos = []
  else:
    ## pos tag from index 1 ##
    w_pos.append(i.split()[1])
  
# did it work? #
for i in range(10):
  print(corpus[i])
  print(POS[i])

### Removing blanks form sentence and pos ###
corpus = [x for x in corpus if x!= '']
POS = [x for x in POS if x!= '']

### Check ###
For i in range(10):
  print(corpus[i])
  print(POS[i])

We have extracted the essentials aspects we require from the dataset. Let’s move on to step 3.

 

Step 3: Tagging the text using NLTK and Flair

  • Tagging using NLTK:

First, import the required libraries:

import nltk
nltk.download('tagsets')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk import word_tokenize

This will download all the necessary files to tag the text using NLTK.

### Tagging the corpus with NLTK ###
#for storing results#
nltk_pos = []
##for every sentence ##
for i in tqdm(corpus):
  # Tokenize sentence #
  text = word_tokenize(i)
  #tag Words#
  z = nltk.pos_tag(text)
  # store #
  nltk_pos.append(z)

The PoS tags are in this format:

[(‘token_1’, ‘tag_1’), ………….. , (‘token_n’, ‘tag_n’)]

 

Lets extract PoS from this:

### Extracting final pos by nltk in a list ###

tmp = []
nltk_result = []

## every tagged sentence ##
for i in tqdm(nltk_pos):
  tmp = []
  ## every word ##
  for j in i:
    ## append tag (from index 1) ##
    tmp.append(j[1])
  # join the tags of every sentence #
  nltk_result.append(' '.join(tmp))

### check ###
for i in range(10):
  print(nltk_result[i])
  print(corpus[i])

The NLTK tags are ready for business.

  • Turning our attention to Flair now

Importing the libraries first:

!pip install flair
from flair.data import Sentence
from flair.models import SequenceTagger

 

Tagging using Flair

# initiating object #
pos = SequenceTagger.load('pos-fast')

#for storing pos tagged string#
f_pos = []
## for every sentence ##
for i in tqdm(corpus):
  sentence = Sentence(i)
  pos.predict(sentence)
  ## append tagged sentence ##
  f_pos.append(sentence.to_tagged_string())

###check ###
for i in range(10):
  print(f_pos[i])
  print(corpus[i])

The result is in the below format:

token_1 <tag_1> token_2 <tag_2> ………………….. token_n <tag_n>

Note: We can use different taggers available within the Flair library. Feel free to tinker around and experiment. You can find the list here.

Extract the sentence-wise tags as we did in NLTK

Import re

### Extracting POS tags ###
## in every sentence by index ##
for i in tqdm(range(len(f_pos))):
  ## for every words ith sentence ##
  for j in corpus[i].split():
    ## replace that word from ith sentence in f_pos ##
    f_pos[i] = str(f_pos[i]).replace(j,"",1)

  ## Removing < > symbols ##
  for j in  ['<','>']:
    f_pos[i] = str(f_pos[i]).replace(j,"")

    ## removing redundant spaces ##
    f_pos[i] = re.sub(' +', ' ', str(f_pos[i]))
    f_pos[i] = str(f_pos[i]).lstrip()

### check ###
for i in range(10):
  print(f_pos[i])
  print(corpus[i])

Aha! We have finally tagged the corpus and extracted them sentence-wise. We are free to remove all the punctuation and special symbols.

### Removing Symbols and redundant space ###

## in every sentence by index ##
for i in tqdm(range(len(corpus))):
  # Removing Symbols #
  corpus[i] = re.sub('[^a-zA-Z]', ' ', str(corpus[i]))
  POS[i] = re.sub('[^a-zA-Z]', ' ', str(POS[i]))
  f_pos[i] = re.sub('[^a-zA-Z]', ' ', str(f_pos[i]))
  nltk_result[i] = re.sub('[^a-zA-Z]', ' ', str(nltk_result[i]))

  ## Removing HYPH SYM (they are for symbols) ##
  f_pos[i] = str(f_pos[i]).replace('HYPH',"")
  f_pos[i] = str(f_pos[i]).replace('SYM',"")
  POS[i] = str(POS[i]).replace('SYM',"")
  POS[i] = str(POS[i]).replace('HYPH',"")
  nltk_result[i] = str(nltk_result[i].replace('HYPH',''))
  nltk_result[i] = str(nltk_result[i].replace('SYM',''))                     

  ## Removing redundant space ##
  POS[i] = re.sub(' +', ' ', str(POS[i]))
  f_pos[i] = re.sub(' +', ' ', str(f_pos[i]))
  corpus[i] = re.sub(' +', ' ', str(corpus[i]))
  nltk_result[i] = re.sub(' +', ' ', str(nltk_result[i]))  

We have tagged the corpus using NLTK and Flair, extracted and removed all the unnecessary elements. Let’s see it for ourselves:

for i in range(1000):
  print('corpus   '+corpus[i])
  print('actual   '+POS[i])
  print('nltk     '+nltk_result[i])
  print('flair    '+f_pos[i])
  print('-'*50)

OUTPUT:

corpus   SOCCER JAPAN GET LUCKY WIN CHINA IN SURPRISE DEFEAT
actual    NN NNP VB NNP NNP NNP IN DT NN
nltk        NNP NNP NNP NNP NNP NNP NNP NNP NNP
flair        NNP NNP VBP JJ NN NNP IN NNP NNP
————————————————–
corpus   Nadim Ladki
actual    NNP NNP
nltk        NNP NNP
flair        NNP NNP
————————————————–
corpus   AL AIN United Arab Emirates
actual    NNP NNP NNP NNPS CD
nltk        NNP NNP NNP VBZ JJ
flair        NNP NNP NNP NNP CD

That looks convincing!

 

Step 4: Evaluating the PoS tags from NLTK and Flair against the tagged dataset

Here, we are doing word-wise evaluation of the tags with the help of a custom-made evaluator.

corpus   Japan coach Shu Kamo said The Syrian own goal proved lucky for us
actual    NNP NN NNP NNP VBD POS DT JJ JJ NN VBD JJ IN PRP
nltk        NNP VBP NNP NNP VBD DT JJ JJ NN VBD JJ IN PRP
flair        NNP NN NNP NNP VBD DT JJ JJ NN VBD JJ IN PRP

Note that in the example above, the actual POS tags contain redundancy compared to NLTK and flair tags as shown (in bold). Therefore we will not be considering the POS tagged sentences where the sentences are of unequal length.

### EVALUATION FUNCTION ###
def eval(x,y):
  # correct match #
  count = 0
  #Total comparisons made# 
  comp = 0
  ## for every sentence index in dataset ##
  for i in range(len(x)):
    ## if the sentence length match ##
    if len(x[i].split()) == len(y[i].split()):
      ## compare each word ##
      for j in range(len(x[i].split())):
        if x[i][j] == y[i][j] :
          ## Match! ##
          count = count+1
          comp = comp + 1
        else:
          comp = comp + 1
  return (count/comp)*100

Finally we evaluate the POS tags of NLTK and Flair against the POS tags provided by the dataset.

print("nltk Score ", eval2(POS,nltk_result))
print("Flair Score ", eval2(POS,f_pos))

Our Result:

NLTK Score: 85.38654023442645

Flair Score: 90.96172124773179

Well, well, well. I can see why Flair has been getting so much attention in the NLP community.

 

End Notes

Flair clearly provides an edge in word embeddings and stacked word embeddings. These can be implemented without much hassle due to its high level API. The Flair embedding is something to keep an eye on in the near future.

I love that the Flair library supports multiple languages. The developers are additionally currently working on “Frame Detection” using flair. The future looks really bright for this library.

I personally enjoyed working and learning the in’s and out’s of this library. I hope you found the tutorial useful and will be using Flair to your advantage next time you take up an NLP challenge.

Sharoon Saxena 16 Dec 2019

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

kabir ali
kabir ali 12 Feb, 2019

Getting error at File "", line 9 z = token.embedding.size()[0] ^ SyntaxError: invalid syntax

jamil
jamil 30 Apr, 2019

nice tutorial! please do you have any guide to use flair with some neural network like LSTM.

houssemus
houssemus 30 Apr, 2019

Hi sir , I would like to extract named entities from resumes which I have as text files . please can you suggest how to do that ? flair framework seems to work only with sentence=Sentence('') . whereas I would like my input to be a text file . how can I do that please ? thanks !

Rednivrug
Rednivrug 22 May, 2019

UnboundLocalError: local variable 'index' referenced before assignment while executing the sentence = Sentence(tweet) when doing the DocumentPoolEmbeddings