Shivani Sharma — September 8, 2021

This article was published as a part of the Data Science Blogathon

### Introduction

There are many tutorials and video lectures on the Web, and other materials discussing the basic principles of building neural networks, their architecture, learning strategies, etc. Traditionally, neural networks are trained by presenting image packets from the training sample to the neural network and correcting the coefficients of this network using the backpropagation method. One of the most popular tools for working with neural networks is Google’s Tensorflow library.

A neural network in Tensorflow is represented by a sequence of layer operations

(such as matrix multiplication, convolution, pooling, etc.). The layers of the neural network, together with the operations of correcting the coefficients, form a computation graph.

The process of training a neural network, in this case, consists in “presenting” the neural

network of packets of objects, comparing the predicted classes with the true ones, calculating the error, and modifying the neural network coefficients.

At the same time, Tensoflow hides the technical details of training and the implementation of the algorithm for adjusting the coefficients, and from the point of view of the programmer, one can basically speak only about the computation graph that produces “predictions”. Compare the graph of computations the programmer is thinking about with a graph which, among other things, adjusts the coefficients. Image 1

But what Tensorflow cannot do for a programmer is to convert the input dataset into a dataset convenient for training a neural network. Although the library has quite a few “basic blocks”.

As with using them to build an effective conveyor for the “power” neural network input data, I want to tell you in this article.

As an example of the problem, we will use the ImageNet dataset, published recently as an object detection competition on Kaggle. We will train the network to detect one object, the one with the largest bounding box.

## Preparatory steps

The following assumes that you have

• [Python] [python_org] is installed, examples use Python 2.7, but it shouldn’t be difficult to port them to Python 3. *

• The library [Tensorflow and Python-interface to it] [install_tensorflow]

```import tensorflow as tf
import numpy as np```

## Data preprocessing

To load data, we will use the mechanisms provided by the module for working with datasets in Tensorflow.

For training and validation, we need a dataset that contains both images and their descriptions. But in the downloaded dataset, files with images and annotations are neatly arranged in different folders.

Therefore, we will make an iterator that iterates over the corresponding pairs.

```ANNOTATION_DIR = os.path.join("Annotations", "DET")
IMAGES_DIR = os.path.join("Data", "DET")
IMAGES_EXT = "JPEG"
def image_annotation_iterator(dataset_path, subset="train"):
annotations_root = os.path.join(dataset_path, ANNOTATION_DIR, subset)
print annotations_root
images_root = os.path.join(dataset_path, IMAGES_DIR, subset)
print images_root
for dir_path, _, file_names in os.walk(annotations_root):
for annotation_file in file_names:
path = os.path.join(dir_path, annotation_file)
relpath = os.path.relpath(path, annotations_root)
img_path = os.path.join(
images_root,
os.path.splitext(relpath) + '.' + IMAGES_EXT
)
assert os.path.isfile(img_path),
RuntimeError("File {} doesn't exist".format(img_path))
yield img_path, path
From this, you can already make a dataset and start "processing on the graph",
for example, extract the file names from the dataset.
We create a dataset:
files_dataset = tf.data.Dataset.from_generator(
functools.partial(image_annotation_iterator, "./ILSVRC"),
output_types=(tf.string, tf.string),
output_shapes=(tf.TensorShape([]), tf.TensorShape([]))
)```

To retrieve data from a dataset, we need an iterator to

```Make_one_shot_iterator create an iterator that iterates over the
data once. Iterator.get_next()creates a tensor into which the
data from the iterator is loaded .
iterator = files_dataset.make_one_shot_iterator()
next_elem = iterator.get_next()
Now you can create a session and "calculate the values" of the tensor:
with tf.Session() as sess:
for i in range(10):
element = sess.run(next_elem)
print i, element```

But for use in neural networks, we do not need file names, but images in the form of “three-layer” matrices of the same shape and categories of these images in the form of a “one-hot” -vector

## Encode image categories

Parsing annotation files is not very interesting in itself. I used the BeautifulSoup package for this. The helper class is Annotationable to initialize from the file path and store a list of objects. First, we need to collect a list of categories in order to know the size of the vector to encode cat_max. And also make a mapping of string categories to a number from [0..cat_max]. Creating such maps is not very interesting, then we will assume that the dictionaries cat2idand id2catinclude forward and reverse mapping described above.

Function to convert file name to encoded category vector.

You can see that another category is being added, for the background: on some images, no objects are marked.

```def ann_file2one_hot(ann_file):
category = annotation.main_object().cls
result = np.zeros(len(cat2id) + 1)
result[cat2id.get(category, len(cat2id))] = 1
return result
Let's apply the transformation to the dataset:
dataset = file_dataset.map(
lambda img_file_tensor, ann_file_tensor:
(img_file_tensor, tf.py_func(ann_file2one_hot, [ann_file_tensor], tf.float64))
)```

The method map returns a new dataset in which the function is applied to each line of the original dataset. The function doesn’t actually apply until we start iterating over the resulting dataset.

You can also notice that we have wrapped our function in the tf.py_funcneed for it. as parameters, the tensors get into the transformation function, and not the values ​​they contain.

And to work with strings you need this wrapper.

Tensorflow has a rich library for working with images. Let’s use it to download them. We need to: read the file, decode it into a matrix, bring the matrix to a standard size (for example, average), normalize the values ​​in this matrix.

```def image_parser(file_name):
image_parsed = tf.image.decode_jpeg(image_data, channels=3)
image_parsed = tf.cast(image_parsed, dtype=tf.float16)
image_parsed = tf.image.per_image_standardization(image_parsed)
return image_parsed```

Unlike the previous function, here file_nameit is a tensor, which means we do not need to wrap this function, add it to the previous snippet:

```dataset = file_dataset.map(
lambda img_file_tensor, ann_file_tensor:
(
image_parser(img_file_tensor),
tf.py_func(ann_file2one_hot, [ann_file_tensor], tf.float64)
)
)```

Let’s check that our calculation graph produces something meaningful:

```   iterator = dataset.make_one_shot_iterator()
next_elem = iterator.get_next()
print type(next_elem)
with tf.Session() as sess:
for i in range(3):
element = sess.run(next_elem)
print i, element.shape, element.shape
You should get:
0 (482, 415, 3) (201,)
1 (482, 415, 3) (201,)
2 (482, 415, 3) (201,)```

As a rule, at the very beginning, you should divide the dataset into 2 or 3 parts for training/validation/testing. We will use the division into training and validation datasets from the downloaded archive.

## Designing a computation graph

We will train a convolutional neural network (CNN) using a method similar to stochastic gradient descent, but we will use an improved version of Adam. To do this, we need to combine our instances into “packages” (eng. Batch). In addition, in order to utilize multiprocessing (and at best, the presence of a GPU for training), you can enable background data paging

```BATCH_SIZE = 16
dataset = dataset.batch(BATCH_SIZE)
dataset = dataset.prefetch(2)```

We will unite into packages of BATCH_SIZE copies and pump up 2 such packages.

During training, we want to periodically run validation on a sample that is not involved in training. So we need to repeat all the manipulations above for one more dataset.

Fortunately, all of them can be combined into a function, for example, dataset_from_file_iteratorand create two datasets:

```train_dataset = dataset_from_file_iterator(
functools.partial(image_annotation_iterator, "./ILSVRC", subset="train"),
cat2id,
BATCH_SIZE
)
valid_dataset = ... # the same only subset = "val"```

But since we want to continue to use the same computation graph for training and validation, we will create a more flexible iterator. One that allows it to be reinitialized.

```   iterator = tf.data.Iterator.from_structure(
train_dataset.output_types,
train_dataset.output_shapes
)
train_initializer_op = iterator.make_initializer(train_dataset)
valid_initializer_op = iterator.make_initializer(valid_dataset)```

Later, after “performing” this or that operation, we will be able to switch the iterator from one dataset to

another.

```with tf.Session(config=config, graph=graph) as sess:
sess.run(train_initialize_op)
#Training
# ...
sess.run (valid_initialize_op)```

For now, we need to describe our neural network, but we will not delve into this issue.

We will assume that the function semi_alex_net_v1(mages_batch, num_labels)builds the desired architecture and returns a tensor with the output values ​​predicted by the neural network.

Let’s set the error function, and the subtleties, the optimization operation:

```img_batch, label_batch = iterator.get_next()
logits = semi_alexnet_v1.semi_alexnet_v1(img_batch, len(cat2id))
loss = tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=label_batch)
labels = tf.argmax(label_batch, axis=1)
predictions = tf.argmax(logits, axis=1)
c_predictions = tf.reduce_sum(tf.to_float(tf.equal(labels, predictions)))

## Training and validation cycle

Now you can start learning:

```with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())
sess.run(train_initializer_op)
counter = tqdm()
total = 0.
correct = 0.
try:
while True:
opt, l, c_batch = sess.run([optimizer, loss, c_predict])
total += BATCH_SIZE
correct += c_batch
counter.set_postfix({
"loss": "{:.6}".format(l),
"accuracy": correct/total
})
counter.update(BATCH_SIZE)
except tf.errors.OutOfRangeError:
print "Finished training"```

Above, we create a session, initialize global and local variables in the graph, and initialize the iterator with training data. [tqdm] [tgdm] is not a learning process, it is just a handy tool for visualizing progress.

In the context of the same session, we launch the validation as well: the validation loop looks very similar. The main difference is that the optimization operation does not start.

```with tf.Session() as sess:
# Train
# ...
# Validate
counter = tqdm()
sess.run(valid_initializer_op)
total = 0.
correct = 0.
try:
while True:
l, correct_batch = sess.run([loss, correct_predictions])
total += BATCH_SIZE
correct += correct_batch
counter.set_postfix({
"loss": "{:.6}".format(l),
"valid accuracy": correct/total
})
counter.update(BATCH_SIZE)
except tf.errors.OutOfRangeError:
print "Finished validation"```

#### Eras and checkpoints

One simple pass through all the images is certainly not enough for training. And you need to execute the training and validation code above in a loop (within one session).

Either perform a fixed number of iterations or while training helps. One pass through the entire dataset is traditionally called an epoch.

In case of unexpected stops in training and for further use of the model, you need to save it. To do this, when creating the execution graph, you need to create an object of the class Saver. And during training, save the state of the model.

```# create a graph
# ...
saver = tf.train.Saver ()
# Create session
with tf.Session() as sess:
for i in range(EPOCHS):
# Train
# ...
# Validate
# ...
saver.save(sess, "checkpoint/name")```

## Conclusion

We learned how to create datasets, transform them using functions for working with tensors, as well as ordinary functions written in Python. We learned how to load images in a background loop without trying to load them into memory or save them in an uncompressed form. We also learned to save the trained model. By applying some of the steps described above and downloading them, you can make a program that will recognize images.

### References :

1. https://habrastorage.org/webt/4d/ui/dt/4duidtqhdydft4ys2ahttbj9ysm.png 