Parul Rajput — August 15, 2021
Beginner Data Visualization Structured Data Text

This article was published as a part of the Data Science Blogathon

Introduction

In our digital journey, we often encounter designs filled with words representing an idea or conveying a message. They come in various sizes, shapes, colors saying the reader writer’s idea with the frequency or importance by occurring words. This is also known as Word cloud or Tag cloud.

What exactly is a word cloud 💡?

A word cloud is a visualization technique for text data where the most frequent word is shown in the biggest font size. In this post, we will learn how to create a custom word cloud in python.

The installation

Let’s start by installing specified packages.

Python offers an inbuilt library called “WordCloud” which helps to generate Word cloud.

We can install this library by using the following command:

! pip install wordcloud

We will also use basic libraries as ‘numpy’, ‘pandas’, ‘matplotlib’, ‘pillow’. if you are new to python, please visit this, it will be really helpful to you.

  • Numpy: The most popular library is Numpy. it is basically used for handling multidimensional arrays and matrixes. In this post, we will use it for changing the shape of the word cloud image.
  • Pandas: Pandas library is used for data analysis, here we are using it for extracting the words from a group of information.
  • Matplotlib: For visualisation, python offers a matplotlib library, which creates plots for gathered data with pandas.
  • Pillow: pillow library is used for collecting different images for the word cloud generation, Used as PIL.

The below code can be referred to for importing the libraries:

 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from PIL import Image
 from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

Dataset📦📦📦

As our sample dataset, we will use the Fake News classification dataset from kaggle.

Word Cloud Generation☁️

Let’s move on to the code section to generate a word cloud. This section will describe the different parameters to build a custom word cloud image in detail.

The function description is:

  • wordcloud(): It was imported from the WordCloud library. This function will generate a word cloud.
  • .generate method: In .generation method contain one argument of the text data (which we want to create a word cloud). The syntax is .generate(text)
  • imshow(): imshow function will display the image.

Now, Let’s get started with a basic word cloud example:

# Create and Generate a Word Cloud Image
wordcloud = WordCloud().generate(text)
#Display the generated image
plt.imshow(wordcloud, interpolation="bilinear")
plt.figure(figsize=[8,10])
plt.axis("off")

 

word cloud in python generation

Wow! we have successfully generated our first-word cloud image. It shows that most of the articles/ news data talk about “trump”, “said”, “one”, “people”.

Next, we can change the max_font_size, max_word, and background_color of the word cloud.

#change font_size, max_word and background_color
wordcloud = WordCloud(max_font_size=50, max_words=10, background_color="white").generate(text)
#Display the image
plt.imshow(wordcloud, interpolation="bilinear")
plt.figure()
plt.axis("off")

 

word cloud in python clinton

 

In the above code, we have changed the parameter of the WorldCloud function.

  • max_font_size: This argument defines the maximum font size for the biggest word. If none, adjust as image height.
  • max_words: It specifies the maximum number of the word, default is 200.
  • background_color: It set up the background color of the word cloud image, by default the color is defined as black.

To display word cloud image .imshow() method of matplotlib.pyplot is used. In the above code, we are using two parameters:

  • wordcloud: created in the above step
  • interpolation=”bilinear”: used to display smoother image.

Let’s generate another word cloud with width, height, random_state, background_color, colormap of the word cloud

# Create stopword
stopwords = set(STOPWORDS)
# Generate a word cloud image
wordcloud = WordCloud(width = 3000, height = 2000, random_state=1, background_color='black', colormap='Set2', collocations=False, stopwords = STOPWORDS).generate(text)
# Display the generated image
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

The description of the following arguments is below

  • width/height: we can change the dimension of the canvas using these arguments. Here we assign width as 3000 and height as 2000.
  • random_state:  It will return PIL color for each word, set as an int value. 
  • background_color: It is used for the background color of the word cloud image. If you want to add a different color, you can explore this website.
  • colormap: using this argument we can change each word color. Matplotlib colormaps provide awesome colors.
  • collocation: collocation argument is set to FALSE to ensure that the word cloud doesn’t contain any bigrams or duplicate words.
  • stopwords: ‘stop_words’ are those words that are commonly used in the English language such as ‘we’, ‘the’, ‘a’, ‘an’, etc. thus, we have to eliminate those words. we already imported the STOPWORDS function from the WordCloud library.

 The output of the above code

news word cloud in python

Moving forward, we are going to create the custom shape word cloud.

To create a custom shape, a masking image is required in PNG format. The design of the word cloud will be generated in this image. We can search it by using keywords such as “masking images for word cloud” on different Search engines. You can also visit this dataset – here, you can explore different custom images.

In this post, we have used ‘cloud.png’ to create the custom image.

To create a custom shape, ‘WordCloud’ function has a mask argument enabling it to take maskable images. We add the ‘cloud.png’ image using the NumPy array and store it as a mask variable. Here we are changing some more arguments to create an attractive word cloud. The arguments description as:

  • mask: Specify the shape of the word cloud image. By default, it takes a rectangle. As we know we have created a variable as a mask assign it to the mask parameter.
  • Contour_width: This parameter creates an outline of the word cloud mask. We set the width of the mask image is as 3.
  • Contour_color: Contour_color use for the outline color of the mask image. It can be a string or color code. Here we use color code as ‘#023075’.
# Generate a word cloud image
stopwords = set(STOPWORDS)
mask = np.array(Image.open("../input/input-img/cloud.png"))
wordcloud = WordCloud(stopwords=stopwords,background_color='white', max_words=1000, mask=mask,contour_color='#023075',contour_width=3,colormap='rainbow').generate(' '.join(df['text_without_stopwords']))
# create image as cloud
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
# store to file
plt.savefig("cloud.png", format="png")
plt.show()
trump show

Great!! We have just created a word cloud in the shape of the cloud. It looks like a cloud filled with words. Let’s create another word cloud by using the Twitter logo as our mask image.

# Generate a word cloud image
stopwords = set(STOPWORDS)
mask = np.array(Image.open("../input/input-img/Twitter.png"))
wordcloud = WordCloud(stopwords=stopwords, background_color="white", max_words=1000, mask=mask).generate(' '.join(df['text_without_stopwords']))
# create twitter image
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
# store to file
plt.savefig("twitter.png", format="png")
plt.show()
said trump word cloud in python

We can also generate a word cloud image in words (combination of the alphabet). Let’s create the next image as the “NEWS” word.

# Generate a word cloud image
stopwords = set(STOPWORDS)
mask = np.array(Image.open("../input/input-img/News_mask.PNG"))
wordcloud = WordCloud(width = 3000, height = 2000, random_state=1, background_color='white', colormap='Set2', collocations=False, stopwords = STOPWORDS,mask=mask).generate(' '.join(df['text_without_stopwords']))
# create coloring from image
image_colors = ImageColorGenerator(mask)
plt.figure(figsize=[20,20])
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
# store to file
plt.savefig("news.png", format="png") 
plt.show()
interpolation

Looks Good! again we create a similar word image with some changes of the parameter.

# Generate a word cloud image
stopwords = set(STOPWORDS)
mask = np.array(Image.open("../input/input-img/News_mask.PNG"))
wordcloud = WordCloud(stopwords=stopwords, background_color="white", mode="RGBA", max_words=1000, mask=mask).generate(' '.join(df['text_without_stopwords']))
# create coloring from image
image_colors = ImageColorGenerator(mask)
plt.figure(figsize=[20,20])
plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
# store to file
plt.savefig("news1.png", format="png") 
plt.show()

Congratulations!!

We have designed different word cloud images as different shapes. We also learned how to mask the images with any color and shape. You can visit my Account for more code. Please share your feedback in the below comment box.

The media shown in this article on word cloud in python are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *