How to apply Convolutional Neural networks in Tensorflow

Are you looking at how you can apply convolutional neural networks in TensorFlow? Stay with us throughout the article!

Convolutional neural networks (CNN) are the form of artificial neural network that is specifically made to process pixel input and is used in image recognition and processing. As we know that images are simply matrices containing different numeric values, and CNN takes these matrices as input values and learns the patterns. In this article, we will discuss convolutional neural networks in more detail. We will discuss, how we can convert an image into suitable data for CNN by covering pooling, padding, and filtering.

Before going into the convolutional neural networks, it is highly recommended to go through the articles about neural networks for regression and neural networks for classification to have an idea of how to build neural networks using TensorFlow.

Getting started with Convolutional Neural networks in Tensorflow

The purpose of convolutional neural networks is to process data using many layers of arrays. Applications like image identification and facial recognition employ this kind of neural network. The primary difference between CNN and any other general neural network is that CNN operates directly on the images it receives as input, as opposed to other neural networks, which concentrate on feature extraction. CNN receives input as a two-dimensional array. Unlike other neural networks, CNN performs filtering, pooling, padding, and flattening of the input dataset.

Before going into the details of how convolutional neural networks works, let us first understand what is an image.

What is an image and what are the types of images?

An image is simply an array that contains numeric values. These values combine in specific order to form something useful visually. Based on the number of arrays and range of numeric values, an image can be divided into three main categories.

• Binary Image
• Grayscale image
• Colored Image

Let us now understand each type of these images.

A binary image, as the name suggests contains only two values ( 0 or 1) and the image itself contains either a fully black region or a fully white region. In simple words, such an image has only two colors. For example, see the sample image below:

As you can see, the binary image contains only two colors, and the values in the matrix are also either 1 or 0.

A grayscale image is an image that contains a range of values (from 0 to 255) and these values represent the intensities of a gray color. The brightest color is white and the darkest color is black. For example, see the sample image below:

As you can see, a gray scale contains values in a range (0 255) where each value represents the intensity of the gray color.

A colored image is also known as RGB ( Red, Green, Blue). It is similar to a grayscale image but has three different layers of array stacked on one another. Where each array represents the three primary colors. For example, see the sample image below:

As you can see, the image contains three different layers and each layer represents the primary colors.

How to build a convolutional neural network for images?

Now we will discuss the major steps that are needed to build a convolutional neural network. We will follow the following major steps to build a convolutional neural network.

1. Preprocessing of the image.
2. Applying filters on the image with padding and without padding.
3. Applying different types of poolings.
4. The next step is to flatten the 2-dimensional array so that the neural network can take the values
5. Building a neural network with the input layer, hidden layers, and output layer
6. compiling the neural network model.
7. Training the neural network model.
8. Testing and evaluating the results of the model.

Now, let us learn some of the terms that we have used in the above steps in more detail.

What are filters in CNN?

Convolutional neural networks use filters to identify spatial patterns in images, such as edges, by identifying variations in the image’s intensity values.

A high-frequency image is one in which the intensity of the pixels varies significantly, whereas a low-frequency image is one in which the intensity is essentially constant. An image typically contains both high-frequency and low-frequency elements. The edges of an object correlate to the high-frequency components because there is a rapid shift in pixel values at the edges. High-pass filters are used to enhance the high-frequency parts of an image.

We will now use the Python language to apply filters on an image to detect the edges. You can get access to the dataset and the source code from my Github Account. Let us first import the image and show it.

``````# importing the required for Convolutional Neural networks in Tensorflow
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import cv2
import numpy as np``````

Let us now import the image.

``````# importing the image

# showing the image
plt.imshow(image)``````

Output:

Let us now defines some filters to detect the edges in the above image.

``````# sobel 3x3 filter - horizonal edges
sobel_y = np.array([[ -1, -2, -1],
[ 0, 0, 0],
[ 1, 2, 1]])

#  3X3 filter -- vertical edge detection
sobel_x = np.array([[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]])``````

Now, let us apply these filters to the above image.

``````# filter the image using filter
filtered_image1 = cv2.filter2D(gray, -1, sobel_y)
filtered_image2 = cv2.filter2D(gray, -1, sobel_x)

# plotting the filtered images
f, ax = plt.subplots(1, 2, figsize=(16, 4))

# plotting the horizontal edge detections
ax[0].set_title('horizontal edge detection')
ax[0].imshow(filtered_image1)

# plotting the vertical edge dections
ax[1].set_title('vertical edge detection')
ax[1].imshow(filtered_image2)``````

Output:

As you can see, the filter has detected the vertical and the horizontal edges in the image.

What is padding in CNN?

A mathematical technique for mixing two signals to create a third signal is called convolution. However, it is essential to many common image-processing operators in artificial neural networks. By multiplying two arrays or matrices of numbers, typically of different sizes, convolution offers a method for creating a third array of numbers with the same dimensionality.

Applying a filter on an image is nothing, it is just multiplying the image with another matrix. For example, see the sample below:

As you can see that after applying a filter, the size of the images reduces. So, here comes the concept of padding. The addition of empty pixels to an image’s border is referred to as padding. When using a convolutional filter, padding maintains the size of the original image while allowing the filter to fully convolute the edge pixels. For example, see the same filtering of the above image below by adding padding around the image.

As you can see, applying padding and then filtering images doesn’t affect the original size of the image.

What is pooling in CNN?

Convolutional neural networks use a technique called pooling to generalize the characteristics that the convolutional filters have retrieved, enabling the network to recognize features regardless of where they are in the image. The primary goal of the pooling layer is to gradually reduce the input image’s spatial size, which lowers the network’s computational load. combining downsamples by shrinking the sample size and sending only the crucial information to CNN’s subsequent layers is the main task of the pooling.

There are basically two types of pooling.

• Max Pooling: In max pooling, we consider only the maximum value for the next convolutional layer.
• Average Pooling: In average pooling, we consider the average of the values for the nest convolutional layer.

The following figures show the max pooling and average poolings.

Now, that we have a basic understanding of the terms that are used in convolutional neural networks, let us move to the full architecture of the CNN.

The full architecture of convolutional neural networks

A convolutional neural network has input, output, and numerous hidden layers, much like any other neural network. ReLU is mostly used as an activation function in a CNN’s hidden layers. The ReLU function will convert any negative values present in the matrix as a result of processing into non-negative values because we know the image matrix cannot contain negative values.

First, we have to apply pooling, padding, and filtering to the image to get the inside and important features. The final step before feeding the image to the neural network after the filtering and pooling phase is to flatten the matrix. The flattening stage, which is a necessary part of creating a convolutional neural network, converts the pooled feature map created during the pooling stage into a one-dimensional vector. In other words, flattening creates a one-dimensional array from an NxN matrix.

Below figure shows, the full architecture of CNN.

Implementation of Convolutional Neural networks in Tensorflow

Till now, we discussed the theoretical part of the convolutional neural networks, in this section, we will implement the CNN using TensorFlow on colored images. For demonstration purposes, we will use the already built-in dataset from the TensorFlow module. You can get access to the source code from my GitHub account.

Exploring the dataset

As we said, we will be using a built-in dataset from the TensorFlow module. we will be using CIFAR images. The dataset consists of 60000 32Ã—32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Let us first import the dataset.

``````# importing the tensorflow module
import tensorflow as tf

# importing the training and testing dataset
(x_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()``````

As you can see, we have loaded the testing and training dataset. It will take some time to download the dataset.

Let us also find the shape of the testing and training dataset.

``````# printing the shape of the dataset
print(x_train.shape, X_test.shape)``````

Output:

The first value represents the total number of images, and the second and third values show the size of the images and in our case, the size is 32X32. While the last value 3 means the images are colored images.

Let us now plot the first five items from the dataset.

``````# size of the plots
plt.figure(figsize=(10,8))

# for loop for 5 images
for i in range(5):

#plotting the images
plt.subplot(1,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i])
plt.show()``````

Output:

As you can see, there are various items in our dataset.

As we know that the values in the matrices are from 0 to 255. Before going to the neural networks, let us scale the images between 0 and 1.

``````# scalling dataset
x_train= x_train / 255.0
X_test = X_test / 255.0``````

Building Convolutional Neural networks in TensorFlow

Now, we will use TensorFlow to build a fully connected convolutional neural network. We will build step by step.

The very first step is to build the input layer by specifying the size of the images.

``````# initializing Convolutional Neural networks in Tensorflow
model = tf.keras.models.Sequential()

# creating the input layer for the images
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))``````

As you can see, we have specified the shape of the images in the input layer of the CNN. The next step is to use pooling and convolution layers.

``````# pooling the images

# convolutional layer
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))

# pooling second time in Convolutional Neural networks in Tensorflow

# convolutional layer again in Convolutional Neural networks in Tensorflow
model.add(tf.keras.layers.Conv2D(128, (3, 3), activation='relu'))``````

The last step before feeding to the neural network is to flatten the input values.

``````# flattening the input matrix in Convolutional Neural networks in Tensorflow

Now, the rest is similar to the neural networks for classification. We will just add hidden layers and an output layer with 10 nodes as there is a total of 10 items in the output class.

``````# hidden layer with 32 nodes

# output layer with 10 nodes

Now, the convolutional neural network is ready. We can train the model on the training dataset.

``````# compiling Convolutional Neural networks in Tensorflow

#               cross entropy as loss function
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

# fitting Convolutional Neural networks in Tensorflow
model.fit(x_train, y_train, epochs=10)``````

This will take some time to train the model.

Testing and evaluating the performance of the CNN model

Now, we will make predictions using the testing dataset, and based on the prediction, we will evaluate the performance of the model.

``````# finding the test accuracy
test_acc = model.evaluate(X_test,  y_test)``````

As you can see, we have evaluated the performance of the model, let us now print out the accuracy score to see how well the predictions were.

``````# accuracy of the model
print(test_acc[1])``````

Output:

As you can see, we get an accuracy score of 71%. You can increase the accuracy by using hyperparameter tuning methods.

Summary

A convolutional neural network (CNN) is a type of artificial neural network used in image recognition and processing that is specifically designed to process pixel data. In this article, we discussed how we can build CNN models and classify images. We also learned about different terms and concepts that are required in order to fully understand convolutional neural networks.

Categories ANN