Data visualization (such as charts, graphs, infographics, etc.) gives businesses value in communicating important information, but what if your data is text-based? If you want the stunning visualization format to highlight important textual data points, then use a word cloud.
If you are not familiar with a word cloud, it is a picture consisting of a group of words where the size of each word represents frequency or importance. The bigger and bolder the word appears, the more often it’s mentioned within a given text and the more important it is. Word clouds are easy to read and simple to understand. The keywords stand out to the reader and are visually appealing to the audience.
However, you might be bored seeing the simple form of the word cloud. What if I told you that WordCloud can also be custom-made to our liking? In this article, we explore how to generate a word cloud in Python in any shape that you desire. So, let’s get started.
If you want to see the full code of this article, please visit my GitHub. The first step is to install the package that will be used, namely Wordcloud. Open your terminal (Linux / macOS) or command prompt (Windows) and type:
$ pip install wordcloud
We will start by doing web scraping on an article on the internet. If you are not familiar with web scraping, I suggest you read my previous article entitled Web Scraping News with 4 lines using Python.
In this post, I will scrape the news from Wikipedia entitled ‘Ice Cream’.
from newspaper import Article
article = Article('https://en.wikipedia.org/wiki/Ice_cream')
article.download()
article.parse()
And we only take the text of the article, which is:
article.text
Simple Word Cloud
We will start by making a simple word cloud. The first step to take is to import dependencies that we will use.
from wordcloud import WordCloud
import matplotlib.pyplot as plt
Here we use the wordcloud library and matplotlib. The wordcloud library is used to generate the word cloud, while matplotlib is used to display the results of the word cloud. After that, we call the word cloud function and display the word cloud.
wc = WordCloud()
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
And this is the result of the simple word cloud that we created.
In addition, the wordcloud function has parameters, including:
background_color = Color of background
max_words = The maximum number of unique words used
stopwords = stopword list
max_font_size = Maximum font size
random_state = To ensure that random numbers are generated in the
same order, so the results will be the same even if generated several times
width = width size of the output
height = height size of the output
Let’s try using the parameters above. First, let’s import the stopword provided by the wordcloud library
from wordcloud import STOPWORDS
Then we enter the following code
wc = WordCloud(background_color="white", max_words=2000,
stopwords=STOPWORDS, max_font_size=256,
random_state=42, width=500, height=500)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
And this is the result.
Add Custom Font
We can also change the font used. You can download fonts from the site dafont for personal use. Next, enter the path of the font into the parameters.
font_path = 'path/to/font'
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
background_color="white", max_words=2000,
max_font_size=256, random_state=42,
width=500, height=500)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
And this is the result
Add Custom Mask
Next we will add a mask for the word cloud. Keep in mind the background of the image used must be white, otherwise, the system will consider the background as an object. In addition, the background cannot be transparent, because transparent colors will be considered black. I will use the following image as a mask.
We need to add some dependencies to load the image.
from PIL import Image
import numpy as np
Next, enter the path of the font into the parameters.
mask = np.array(Image.open('path/to/image'))
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
mask=mask, background_color="white",
max_words=2000, max_font_size=256,
random_state=42, width=mask.shape[1],
height=mask.shape[0])
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
And this is the result.
Adjust Colors
We can also adjust the colors used in the word cloud. Basically we are free to determine the color we will use, but in this article I will discuss the fairly commonly used. We will use just one color. But, we must define the function to be used.
def one_color_func(word=None, font_size=None,
position=None, orientation=None,
font_path=None, random_state=None):
h = 160 # 0 - 360
s = 100 # 0 - 100
l = 50 # 0 - 100
return "hsl({}, {}%, {}%)".format(h, s, l)
The color format used is the HSL format (hue, saturation, lightness). For more details, please visit HSL Color Picker to find out more about the colors used. Then to form the word cloud, all we have to do is add the functions that we have created to the word cloud function.
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
mask=mask, background_color="white",
max_words=2000, max_font_size=256,
random_state=42, width=mask.shape[1],
height=mask.shape[0], color_func=one_color_func)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
And the image will appear like this.
Apart from that, we can also produce similar colors by randomizing within a certain range. I will add a random function to lightness to adjust the brightness of the colors.
def similar_color_func(word=None, font_size=None,
position=None, orientation=None,
font_path=None, random_state=None):
h = 40 # 0 - 360
s = 100 # 0 - 100
l = random_state.randint(30, 70) # 0 - 100
return "hsl({}, {}%, {}%)".format(h, s, l)
Then, the same as before. Enter the function into the wordcloud function.
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
mask=mask, background_color="white",
max_words=2000, max_font_size=256,
random_state=42, width=mask.shape[1],
height=mask.shape[0], color_func=similar_color_func)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
And the result will be like this.
In addition, we can define many colors that we will use. As an example.
def multi_color_func(word=None, font_size=None,
position=None, orientation=None,
font_path=None, random_state=None):
colors = [[4, 77, 82],
[25, 74, 85],
[82, 43, 84],
[158, 48, 79]]
rand = random_state.randint(0, len(colors) - 1)
return "hsl({}, {}%, {}%)".format(colors[rand][0], colors[rand][1], colors[rand][2])
And add the function into the wordcloud function.
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
mask=mask, background_color="white",
max_words=2000, max_font_size=256,
random_state=42, width=mask.shape[1],
height=mask.shape[0], color_func=multi_color_func)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
And the result be like this.
And last but not least, generate color based on the mask. We will need the functions provided by the wordcloud library.
from wordcloud import ImageColorGenerator
Then add the masking colors and add the function into the wordcloud function.
mask_colors = ImageColorGenerator(mask)
wc = WordCloud(stopwords=STOPWORDS, font_path=font_path,
mask=mask, background_color="white",
max_words=2000, max_font_size=256,
random_state=42, width=mask.shape[1],
height=mask.shape[0], color_func=mask_colors)
wc.generate(article.text)
plt.imshow(wc, interpolation="bilinear")
plt.axis('off')
plt.show()
This is the final result.
As we can see, the color of the word cloud follows the color of the original image.
So for this post, I hope you get new knowledge from what I have said. If you have other opinions, please write in the comments. In the future, I will analyze the usage of this word cloud for text analysis.