Getting started with the basic tasks of Computer Vision

Prathamesh Dinkar 27 Oct, 2021

7 min read

This article was published as a part of the Data Science Blogathon

If you are interested or planning to do anything which is related to images or videos, you should definitely consider using Computer Vision. Computer Vision (CV) is a branch of artificial intelligence (AI) that enables computers to extract meaningful information from images, videos, and other visual inputs and also take necessary actions. Examples can be self-driving cars, automatic traffic management, surveillance, image-based quality inspections, and the list goes on.

What is OpenCV?

OpenCV is a library primarily aimed at computer vision. It has all the tools that you will need while working with Computer Vision (CV). The ‘Open’ stands for Open Source and ‘CV’ stands for Computer Vision.

What will I learn?

The article contains all you need to get started with computer vision using the OpenCV library. You will feel more confident and more efficient in Computer Vision. All the code and data are present here.

Reading and displaying the images

First let’s understand how to read the image and display it, which is the basics of CV.

Reading the Image:

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img=cv2.imread('../input/images-for-computer-vision/tiger1.jpg')

The ‘img’ contains the image in the form of a numpy array. Let’s print its type and shape,

print(type(img))
print(img.shape)

The numpy array has a shape of (667, 1200, 3), where,

667 – Image height, 1200 – Image width, 3 – Number of channels,

In this case, there are RGB channels so we have 3. The original image is in the form of RGB but OpenCV reads the image as BGR by default, so we have to convert it back to RGB before displaying it.

Displaying the Image:

# Converting image from BGR to RGB for displaying
img_convert=cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.imshow(img_convert)

Drawing over Image

We can draw lines, shapes, and text an image.

# Rectangle
color=(240,150,240) # Color of the rectangle
cv.rectangle(img, (100,100),(300,300),color,thickness=10, lineType=8) ## For filled rectangle, use thickness = -1
## (100,100) are (x,y) coordinates for the top left point of the rectangle and (300, 300) are (x,y) coordinates for the bottom right point

# Circle
color=(150,260,50)
cv.circle(img, (650,350),100, color,thickness=10) ## For filled circle, use thickness = -1
## (250, 250) are (x,y) coordinates for the center of the circle and 100 is the radius

# Text
color=(50,200,100)
font=cv.FONT_HERSHEY_SCRIPT_COMPLEX
cv.putText(img, 'Save Tigers',(200,150), font, 5, color,thickness=5, lineType=20)

# Converting BGR to RGB
img_convert=cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.imshow(img_convert)

Blending Images

We can also blend two or more images with OpenCV. An image is nothing but numbers, and you can add, subtract, multiply and divide numbers and thus images. One thing to note is that the size of the images should be the same.

# For plotting multiple images at once
def myplot(images,titles):
    fig, axs=plt.subplots(1,len(images),sharey=True)
    fig.set_figwidth(15)
    for img,ax,title in zip(images,axs,titles):
        if img.shape[-1]==3:
            img=cv.cvtColor(img, cv.COLOR_BGR2RGB) # OpenCV reads images as BGR, so converting back them to RGB
        else:
            img=cv.cvtColor(img, cv.COLOR_GRAY2BGR)
        ax.imshow(img)
        ax.set_title(title)

img1 = cv.imread('../input/images-for-computer-vision/tiger1.jpg')
img2 = cv.imread('../input/images-for-computer-vision/horse.jpg')

# Resizing the img1
img1_resize = cv.resize(img1, (img2.shape[1], img2.shape[0]))

# Adding, Subtracting, Multiplying and Dividing Images
img_add = cv.add(img1_resize, img2)
img_subtract = cv.subtract(img1_resize, img2)
img_multiply = cv.multiply(img1_resize, img2)
img_divide = cv.divide(img1_resize, img2)

# Blending Images
img_blend = cv.addWeighted(img1_resize, 0.3, img2, 0.7, 0) ## 30% tiger and 70% horse
myplot([img1_resize, img2], ['Tiger','Horse'])
myplot([img_add, img_subtract, img_multiply, img_divide, img_blend], ['Addition', 'Subtraction', 'Multiplication', 'Division', 'Blending'])

The multiply image is almost white and the division image is black, this is because white means 255 and black means 0. When we multiply two-pixel values of the images, we get a higher number, so its color becomes white or close to white and opposite for the division image.

Image Transformation

Image transformation includes translating, rotating, scaling, shearing, and flipping an image.

img=cv.imread('../input/images-for-computer-vision/tiger1.jpg')

width, height, _=img.shape




# Translating

M_translate=np.float32([[1,0,200],[0,1,100]]) # 200=> Translation along x-axis and 100=>translation along y-axis

img_translate=cv.warpAffine(img,M_translate,(height,width)) 




# Rotating

center=(width/2,height/2)

M_rotate=cv.getRotationMatrix2D(center, angle=90, scale=1)

img_rotate=cv.warpAffine(img,M_rotate,(width,height))




# Scaling

scale_percent = 50

width = int(img.shape[1] * scale_percent / 100)

height = int(img.shape[0] * scale_percent / 100)

dim = (width, height)

img_scale = cv.resize(img, dim, interpolation = cv.INTER_AREA)




# Flipping

img_flip=cv.flip(img,1) # 0:Along horizontal axis, 1:Along verticle axis, -1: first along verticle then horizontal




# Shearing

srcTri = np.array( [[0, 0], [img.shape[1] - 1, 0], [0, img.shape[0] - 1]] ).astype(np.float32)

dstTri = np.array( [[0, img.shape[1]*0.33], [img.shape[1]*0.85, img.shape[0]*0.25], [img.shape[1]*0.15, img.shape[0]*0.7]] ).astype(np.float32)

warp_mat = cv.getAffineTransform(srcTri, dstTri)

img_warp = cv.warpAffine(img, warp_mat, (height, width))




myplot([img, img_translate, img_rotate, img_scale, img_flip, img_warp],

       ['Original Image', 'Translated Image', 'Rotated Image', 'Scaled Image', 'Flipped Image', 'Sheared Image'])

Image Transformation | basic tasks of computer vision

Image Preprocessing

Thresholding: In thresholding, the pixel values less than the threshold value become 0 (black), and pixel values greater than the threshold value become 255 (white).

I am taking the threshold to be 150, but you can choose any other number as well.

# For visualising the filters
import plotly.graph_objects as go
from plotly.subplots import make_subplots
def plot_3d(img1, img2, titles):
    fig = make_subplots(rows=1, cols=2,
                    specs=[[{'is_3d': True}, {'is_3d': True}]],
                    subplot_titles=[titles[0], titles[1]],
                    )
    x, y=np.mgrid[0:img1.shape[0], 0:img1.shape[1]]
    fig.add_trace(go.Surface(x=x, y=y, z=img1[:,:,0]), row=1, col=1)
    fig.add_trace(go.Surface(x=x, y=y, z=img2[:,:,0]), row=1, col=2)
    fig.update_traces(contours_z=dict(show=True, usecolormap=True,
                                  highlightcolor="limegreen", project_z=True))
    fig.show()

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')

# Pixel value less than threshold becomes 0 and more than threshold becomes 255

_,img_threshold=cv.threshold(img,150,255,cv.THRESH_BINARY)

plot_3d(img, img_threshold, ['Original Image', 'Threshold Image=150'])

After applying thresholding, the values which are 150 becomes equal to 255

Filtering: Image filtering is changing the appearance of an image by changing the values of the pixels. Each type of filter changes the pixel value based on the corresponding mathematical formula. I am not going into detail math here, but I will show how each filter work by visualizing them in 3D. If you are interested in the math behind the filters, you can check this.

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')

# Gaussian Filter
ksize=(11,11) # Both should be odd numbers
img_guassian=cv.GaussianBlur(img, ksize,0)
plot_3d(img, img_guassian, ['Original Image','Guassian Image'])

# Median Filter
ksize=11
img_medianblur=cv.medianBlur(img,ksize)
plot_3d(img, img_medianblur, ['Original Image','Median blur'])

# Bilateral Filter
img_bilateralblur=cv.bilateralFilter(img,d=5, sigmaColor=50, sigmaSpace=5)
myplot([img, img_bilateralblur],['Original Image', 'Bilateral blur Image'])
plot_3d(img, img_bilateralblur, ['Original Image','Bilateral blur'])

Gaussian Filter: Blurring an image by removing the details and the noise. For more details, you can read this.

Median Filter: Nonlinear process useful in reducing impulsive, or salt-and-pepper noise

Bilateral Filter: Edge-preserving, and noise-reducing smoothing.

In simple words, the filters help to reduce or remove the noise which is a random variation of brightness or color, and this is called smoothing.

Feature Detection

Feature detection is a method for making local decisions at every image point by computing abstractions of image information. For example, for an image of a face, the features are eyes, nose, lips, ears, etc. and we try to identify these features.

Let’s first try to identify the edges of an image.

Edge Detection

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')
img_canny1=cv.Canny(img,50, 200)
# Smoothing the img before feeding it to canny
filter_img=cv.GaussianBlur(img, (7,7), 0)
img_canny2=cv.Canny(filter_img,50, 200)
myplot([img, img_canny1, img_canny2],
       ['Original Image', 'Canny Edge Detector(Without Smoothing)', 'Canny Edge Detector(With Smoothing)'])

Here we are using the Canny edge detector which is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It was developed by John F. Canny in 1986. I am not going in much details of how Canny works, but the key point here is that it is used to extract the edges. To know more about its working, you can check this.

Before detecting an edge using the Canny edge detection method, we smooth the image to remove the noise. As you can see from the image, that after smoothing we get clear edges.

Contours

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')
img_copy=img.copy()
img_gray=cv.cvtColor(img,cv.COLOR_BGR2GRAY)
_,img_binary=cv.threshold(img_gray,50,200,cv.THRESH_BINARY)
#Edroing and Dilating for smooth contours
img_binary_erode=cv.erode(img_binary,(10,10), iterations=5)
img_binary_dilate=cv.dilate(img_binary,(10,10), iterations=5)
contours,hierarchy=cv.findContours(img_binary,cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
cv.drawContours(img, contours,-1,(0,0,255),3) # Draws the contours on the original image just like draw function
myplot([img_copy, img], ['Original Image', 'Contours in the Image'])

Erode The erosion operation that uses a structuring element for probing and reducing the shapes contained in the image.

Dilation: Adds pixels to the boundaries of objects in an image, simply opposite of erosion

erode and dialations | basic tasks of computer vision

Hulls

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png',0)
_,threshold=cv.threshold(img,50,255,cv.THRESH_BINARY)
contours,hierarchy=cv.findContours(threshold,cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
hulls=[cv.convexHull(c) for c in contours]
img_hull=cv.drawContours(img, hulls,-1,(0,0,255),2) #Draws the contours on the original image just like draw function
plt.imshow(img)

Summary

We saw how to read and display the image, drawing shapes, text over an image, blending two images, transforming the image like rotating, scaling, translating, etc., filtering the images using Gaussian blur, Median blur, Bilateral blur, and detecting the features using Canny edge detection and finding contours in an image.

I tried to scratch the surface of the computer vision world. This field is evolving each day but the basics will remain the same, so if you try to understand the basic concepts, you will definitely excel in this field.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

blogathon