Shivani Sharma — August 31, 2021
Advanced Deep Learning Libraries Programming Python

This article was published as a part of the Data Science Blogathon


Keras originally grew as a convenient add-on to Theano. Although a lot of water has flowed under the bridge since then, Keras first began to support Tensorflow, and then completely became a part of it. However, our article will be devoted not to the complex fate of this framework, but to its capabilities. image | keras

Image 1 (Link Below)


Installing Keras is extremely easy because it is a regular python package:

pip install Keras

Now we can start analyzing it, but first, let’s talk about backends.


Backends are the major things that increase the popularity of Keras. Keras enables the use of many other frameworks as a backend. Now Keras works with it by default, but if you want to use Theano, then there are two options for how to do it:

  1. Edit the keras.json configuration file located along the path $HOME/.keras/keras.json(or %USERPROFILE%.keraskeras.json in the case of Windows operating systems). We need a field backend:

"image_data_format": "channels_last",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
  1. The second way is to set an environment variable KERAS_BACKEND, like so:

KERAS_BACKEND=theano python -c "from keras import backend"

So Keras backends are expanding and will take over the world over time! (But it is not exactly.)

Practical example

It seems we can now take a [not-so] deep neural network as an example.

7 Advantages of Using VPN - InfiniGEEK | kerasImage 2 (Link Below)


Training any machine learning model starts with data. Keras contains several training datasets inside, but they are already put in a convenient form for work and do not allow showing the full power of Keras. Therefore, we will take a more raw dataset. It will be a dataset of 20 newsgroups – 20,000 news posts from Usenet groups (this is a mail exchange system from the 1990s, akin to FIDO, which may be a little more familiar to the reader), roughly evenly divided into 20 categories. We will train our network to properly distribute messages to these newsgroups.

from sklearn.datasets import fetch_20newsgroups
newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')
Here is an example of the content of a document from the training sample:
newsgroups_train ['data'] [0]


Keras contains tools for convenient preprocessing of texts, pictures, and time series, in other words, the most common data types. Today we work with texts, so we need to break them down into tokens and bring them into matrix form.

tokenizer = Tokenizer(num_words=max_words)
# now the tokenizer knows the dictionary for this corpus of texts
x_train = tokenizer.texts_to_matrix(newsgroups_train["data"], mode='binary')
x_test = tokenizer.texts_to_matrix(newsgroups_test["data"], mode='binary')
As a result, we got binary matrices of the following sizes:
x_train shape: (11314, 1000)
x_test shape: (7532, 1000)
We also need to convert class labels to matrix form for training using cross-entropy. To do this, we will translate the class number into the so-called one-hot vector:
y_train = keras.utils.to_categorical(newsgroups_train["target"], num_classes)
y_test = keras.utils.to_categorical(newsgroups_test["target"], num_classes)
At the output, we also get binary matrices of the following sizes:
y_train shape: (11314, 20)
y_test shape: (7532, 20)

As we can see, the sizes of these matrices partially coincide with the data matrices (in the first coordinate – the number of documents in the training and test samples), and partially – not. At the second coordinate, we have the number of classes (20, as the name of the dataset, suggests).

That’s it, now we are ready to teach our network to classify news!


A model in Keras can be described in two main ways:

Sequential API

#The first one is a consistent description of the model, like this:
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
or like this:
model = Sequential([
          Dense(512, input_shape=(max_words,)),

Functional API

Some time ago, it became possible to use the functional API to create a model – the second way:

a = Input(shape=(max_words,))
b = Dense(512)(a)
b = Activation('relu')(b)
b = Dropout(0.5)(b)
b = Dense(num_classes)(b)
b = Activation('softmax')(b)
model = Model(inputs=a, outputs=b)
There is no specific difference between the methodologies, choose the preferred one.
This allows you to save models in a human-readable form, as well as instantiate models from such a description:
from keras.models import model_from_yaml
yaml_string = model.to_yaml()
model = model_from_yaml(yaml_string)

It is important to note that the model saved in text form (by the way, it is also possible to save it in JSON) does not contain weights. To save and load the balance, use the functions save_weight sand load_weights accordingly.


Model rendering

Visualization cannot be ignored. Keras has built-in visualization for models:

from keras.utils import plot_model
plot_model(model, to_file='model.png', show_shapes=True)

This code will save model.png the following picture under the name :

model rendering | keras

Here we additionally displayed the dimensions of the inputs and outputs for the layers. None first in the size tuple is the batch size. Because it costs None, then the batch can be arbitrary.

from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
SVG(model_to_dot(model, show_shapes=True).create(prog='dot', format='svg'))

It is important to note that the visualization requires the Graphviz package as well as the Python package pydot.

pip install pydot-ng

The package Graphviz in Ubuntu is installed like this (in other Linux distributions it is similar):

apt install graphviz

On macOS (using the HomeBrew package system):

brew install graphviz

Preparing the model for work

So, we have formed our model. Now you need to prepare it for work:


What do the function parameters mean to compile? loss- this is the error function, in our case, it is the cross-entropy, it is for this function that we prepared our labels in the form of matrices; optimizer- the optimizer used, there could be ordinary stochastic gradient descent, but Adam shows the best convergence on this problem; metrics- the metrics by which the quality of the model is considered, in our case it is the accuracy, that is, the proportion of correctly guessed answers.


Custom loss

While Keras contains most of the popular error functions, your task might require something unique. To make your own loss, you need a little: just define a function that takes vectors of correct and predicted answers and outputs one number per output. For training, let’s make our own function for calculating the cross-entropy. To make it something different, we will introduce the so-called clipping – cutting off the vector values ​​from the top and bottom.

from keras import backend as K
epsilon = 1.0e-9
def custom_objective(y_true, y_pred):
    '''Yet another cross-entropy'''
    y_pred = K.clip(y_pred, eps, 1.0 - eps)
    y_pred /= K.sum(y_pred, axis=-1, keepdims=True)
    cce = categorical_crossentropy(y_pred, y_true)
return cce

Here y_trueand y_predare tensors from Tensorflow, so Tensorflow functions are used to process them.

To use another loss function, it is enough to change the values ​​of the loss function parameter by compile passing the object of our loss function there (in Python, functions are also objects, although this is a completely different story):


Training and testing

Finally, it’s time to train the model:

history =, y_train,

The method fit does exactly that. It accepts a training sample as input along with labels – x_train and y_train, the size of the batch batch_size, which limits the number of examples served at a time, the number of epochs for training epochs(one epoch is a training sample that was completely passed by the model once), as well as what fraction of the training sample submit for validation – validation_split.

This method returns a history of errors at each step of training.

Finally, testing. The method evaluates a test sample as input along with labels for it. The metric was set in preparation for work, so nothing else is needed. (But we will also indicate the size of the batch).

score = model.evaluate(x_test, y_test, batch_size=batch_size)


I also need to say a few words about such an important feature of Keras as callbacks. A lot of useful functionality is implemented through them. For example, if you have been training the network for a very long time, you need to understand when to stop if the error on your dataset has stopped decreasing. In English, the functionality described is called “early stopping”.

from keras.callbacks import EarlyStopping  
history =, y_train,
Run an experiment and check how quickly early stopping works in our example?


Also, as a callback, you can use saving logs in a format convenient for Tensorboard (we talked about it in an article about Tensorflow, in short – this is a special utility for processing and visualizing information from Tensorflow logs).

from keras.callbacks import TensorBoard  
tensorboard=TensorBoard(log_dir='./logs', write_graph=True)
history =, y_train,

After the training is over (or even in the process!), You can start Tensorboard by specifying the absolute path to the directory with the logs:

tensorboard –logdir=/path/to/logs

There you can see, for example, how the target metric changed on the validation set:

image | keras

Image 3

Advanced graphs

Now let’s look at building a slightly more complex computation graph. A neural network can have many inputs and outputs, the input data can be transformed by various mappings. To reuse parts of complex graphs (in particular, for transfer learning), it makes sense to describe the model in a modular style that allows you to conveniently retrieve, save, and apply parts of the model to new input data.

It is most convenient to describe the model by mixing both methods – Functional API and Sequential API described earlier.

Let’s look at this approach using the Siamese Network model as an example. Similar models are actively used in practice to obtain vector representations with useful properties. For example, a similar model can be used to learn the mapping of photographs of faces into vectors such that vectors for similar faces will be close to each other. In particular, image search applications such as FindFace take advantage of this.

An illustration of the model can be seen in the diagram:

image illustration

Image 4

Here the function G turns the input image into a vector, after which the distance between the vectors for a pair of images is calculated. If the pictures are from the same class, the distance should be minimized, if, from different classes, the distance should be maximized.

After such a neural network is trained, we can represent an arbitrary image as a vector G(x)and use this representation either to find the nearest images or as a feature vector for other machine learning algorithms.

First, we define a function on Keras that maps the input vector.

def create_base_network(input_dim):
    seq = Sequential()
    seq.add(Dense(128, input_shape=(input_dim,), activation='relu'))
    seq.add(Dense(128, activation='relu'))
    seq.add(Dense(128, activation='relu'))
return seq

Please note that we have described the model using Sequential API, however, we have wrapped its creation in a function. Now we can create such a model by calling this function and apply it using its Functional API to the input data:

base_network = create_base_network(input_dim)
input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))
processed_a = base_network(input_a)
processed_b = base_network(input_b)

Now variables processed_aand processed_bare vector representation obtained by applying the network defined previously, to the input data.

It is necessary to calculate the distances between them. For this, Keras provides a wrapper function Lambda that represents any expression as a layer ( Layer). Do not forget that we process data in batches, so all tensors always have an additional dimension that is responsible for the size of the batch.

from keras import backend as K
def euclidean_distance(vects):
    x, y = vects
return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))
distance = Lambda(euclidean_distance)([processed_a, processed_b])

Great, we got the distance between the internal views, now it remains to collect the inputs and the distance into one model.

model = Model([input_a, input_b], distance)

Thanks to the modular structure, we can use it base_networkseparately, which is especially useful after training the model. How can I do that? Let’s take a look at the layers of our model:

>>> model.layers
[<keras.engine.topology.InputLayer object at 0x7f238fdacb38>, <keras.engine.topology.InputLayer object at 0x7f238fdc34a8>, <keras.models.Sequential object at 0x7f239127c3c8>, <keras.layers.core.Lambda object at 0x7f238fddc4a8>]

We see the third object in the type list models.Sequential. This is the model that maps the input image into a vector. To extract it and use it as a full-fledged model (you can retrain, validate, embed in another graph), you just need to pull it out of the list of layers:

>>> embedding_model = model.layers[2]

>>> embedding_model.layers
[<keras.layers.core.Dense object at 0x7f23c4e557f0>, <keras.layers.core.Dropout object at 0x7f238fe97908>, <keras.layers.core.Dense object at 0x7f238fe44898>, <keras.layers.core.Dropout object at 0x7f238fe449e8>, <keras.layers.core.Dense object at 0x7f238fe01f60>]

For example, for a Siamese network already trained on MNIST data with an output dimension of base_modeltwo, you can visualize vector representations as follows:

Let’s load the data and convert the size images 28x28to flat vectors.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_test = x_test.reshape(10000, 784)

Let’s display the pictures using the previously extracted model:

embeddings = embedding_model.predict(x_test)

Now in embeddings lie two-dimensional vectors, they can be depicted on a plane:

2-d vector| keras

Image 5


That’s it, we made the first Keras models! We hope you are interested in the opportunities it provides so that you will use them in your work.

The obvious advantages include the simplicity of creating models, which translates into a high speed of prototyping. Overall, this framework is becoming more and more popular:


Image 6

Usually, you can suggest Keras for use when you need to quickly build and test a network for a definite task. But if you need some complex things, like a non-standard layer or parallelizing code on several GPUs, then it is better (and sometimes just inevitable) to use the underlying framework.


Image Sources

  1. Image 1 –
  2. Image 2 –
  3. Image 3 –
  4. Image 4 –
  5. Image 5 –
  6. Image 6 –

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *