8. Transfer Learning and Pre-Trained Models – The Deep Learning with Keras Workshop

8. Transfer Learning and Pre-Trained Models

Overview

This chapter introduces the concept of pre-trained models and utilizing them for different applications from those for which they were trained, known as transfer learning. By the end of this chapter, you will be able to apply feature extraction to pre-trained models, exploit pre-trained models for image classification, and apply fine-tuning to pre-trained models to classify images of flowers and cars into their respective classes. We will see that this achieves the same task that we completed in the previous chapter but with greater accuracy and shorter training times.

Introduction

In the previous chapter, we learned how to create a Convolutional Neural Network (CNN) from scratch with Keras. We experimented with different architectures by adding more convolutional and Dense layers and changing the activation function. We compared the performance of each model by classifying images of cars and flowers into their respective classes and comparing their accuracies.

In real-world projects, however, you almost never code a convolutional neural network from scratch. You always tweak and train them as per the requirements. This chapter will introduce you to the important concepts of transfer learning and pre-trained networks (also known as pre-trained models), both of which are used in the industry.

We will use images and, rather than building a CNN from scratch, we will match these images on pre-trained models to try and classify them. We will also tweak our models to make them more flexible. The models we will use in this chapter are called VGG16 and ResNet50, and we will discuss them later in this chapter. Before we start working on pre-trained models, we need to understand transfer learning.

Pre-Trained Sets and Transfer Learning

Humans learn by experience. We apply the knowledge we gain in one situation to similar situations we face in the future. Suppose you want to learn how to drive an SUV. You have never driven an SUV; all you know is how to drive a small hatchback car.

The dimensions of the SUV are considerably larger than the hatchback, so navigating the SUV in traffic will surely be a challenge. Still, some basic systems (such as the clutch, accelerator, and brakes) remain similar to that of the hatchback. So, knowing how to drive a hatchback will surely be of great help to you when you are learning to drive the SUV. All the knowledge that you acquired while driving a hatchback can be used when you learn to drive a big SUV.

This is precisely what transfer learning is. By definition, transfer learning is a concept in machine learning in which we store and use the knowledge gained in one activity while learning another similar activity. The hatchback-SUV model fits this definition perfectly.

Suppose we want to know whether a picture is of a dog or a cat; here, we can have two approaches. One is building a deep learning model from scratch and then passing on the new pictures to the networks. Another option is to use a pre-trained deep learning neural network model that has already been built by using cats' and dogs' images, instead of creating a neural network from scratch.

Using the pre-trained model saves us computational time and resources. There can be some unforeseen advantages of using a pre-trained network. For example, almost all the pictures of dogs and cats will have some more objects in the picture, such as trees, the sky, and furniture. We can even use this pre-trained network to identify objects such as trees, the sky, and furniture.

So, a pre-trained network is a saved network (a neural network, in the case of deep learning) that was trained on a very large dataset, mostly on image classification problems. To work on a pre-trained network, we need to understand the concepts of feature extraction and fine-tuning.

Feature Extraction

To understand feature extraction, we need to revisit the architecture of a convolutional neural network.

You may recall that the full architecture of a CNN, at a high level, consists of the following components:

  • A convolution layer
  • A pooling and flattening layer
  • An Artificial Neural Network (ANN)

The following figure shows a complete CNN architecture:

Figure 8.1: CNN architecture

Now, let's divide this architecture into two parts. The first part contains everything but the ANN, while the second part only contains the ANN. The following figure shows a split CNN architecture:

Figure 8.2: CNN split architecture – convolutional base and classifier

The first part is called a convolutional base while the second part is called the classifier.

In feature extraction, we keep reusing the convolutional base, and the classifier is changed. So, we preserve the learnings of the convolutional layer, and we can pass different classifiers on top of the convolutional layer. A classifier can be dog versus cat, bikes versus cars, or even medical X-ray images to classify tumors, infections, and so on. The following diagram shows some convolutional base layers that are used for different classifiers:

Figure 8.3: Reusable convolutional base layer

The obvious next question is, can't we reuse the classifier too, like the base layer? The general answer is no. The reason is that learning from the convolutional base is likely to be more generic and, therefore, more reusable. However, the learning of the classifier is mostly specific to the classes that the model was trained on. Therefore, it is advisable to only reuse the convolutional base layer and not the classifier.

The amount of generalized learning from a convolutional base layer depends on the depth of the layer. For example, in the case of a cat, the initial layers of the model learn about general traits such as edges and the background, while the higher layers may learn more about specific details such as eyes, ears, or the shape of the nose. So, if your new dataset is something very different from the original dataset—for example, if you wish to identify fruit instead of cats—then it is better to only use some initial layers of the convolutional base layer rather than using the whole layer.

Freezing convolutional layers: One of the most important features of pre-trained learning is to understand the concept of freezing some layers of a pre-trained network. Freezing essentially means that we stop the process of the weight updating some of the convolutional layers. Since we are using a pre-trained network, it is important to understand that we will need the information stored in the initial layers of the network. If that information is updated in training a network, we might lose some general concepts that have been learned and stored in the pre-trained network. If we add a classifier (CNN), many Dense layers on top of the network are randomly initialized, and there may be cases where, due to backpropagation, the learning of the initial layers of the network will be totally destroyed.

To avoid this information decay, we freeze some layers. This is done by making the layers non-trainable. The process of freezing some layers and training others is called fine-tuning a network.

Fine-Tuning a Pre-Trained Network

Fine-tuning means tweaking our neural network in such a way that it becomes more relevant to the task at hand. We can freeze some of the initial layers of the network so that we don't lose information stored in those layers. The information stored there is generic and useful. However, if we can freeze those layers while our classifier is learning and then unfreeze them, we can tweak them a little so that they fit even better to the problem at hand. Suppose we have a pre-trained network that identifies animals. If we want to identify specific animals, such as dogs and cats, we can tweak the layers a little bit so that they can learn what dogs and cats look like. This is like using the whole pre-trained network and then adding a new layer that consists of images of dogs and cats. We will be doing a similar activity by using a pre-built network and adding a classifier on top of it, which will be trained on pictures of dogs and cats.

There is a three-point system to working with fine-tuning:

  1. Add a classifier (ANN) on top of a pre-trained system.
  2. Freeze the convolutional base and train the network.
  3. Train the added classifier and the unfrozen part of the convolutional base jointly.

The ImageNet Dataset

In real practical work experience, you almost never need to build a base convolutional model on your own. You will always use pre-trained models. But where do you get the data from? For visual computing, the answer is ImageNet. The ImageNet dataset is a large visual database that is used in visual object recognition. It consists of more than 14 million labeled images with object names. ImageNet contains more than 20,000 categories.

Some Pre-Trained Networks in Keras

The following pre-trained networks can be thought of as the base convolutional layers. You use these networks and fit a classifier (ANN):

  • VGG16
  • Inception V3
  • Xception
  • ResNet50
  • MobileNet

Different vendors have created the preceding pre-trained networks. For example, ResNet50 was created by Microsoft, while Inception V3 and MobileNet were created by Google. In this chapter, we will be working with the VGG16 and ResNet50 models.

VGG16 is a convolutional neural network model with 16 layers and was proposed by K. Simonyan and A. Zisserman from the University of Oxford. The model was submitted to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2014—a challenge used to test state-of-the-art models that use the ImageNet dataset. ResNet50 is another convolutional neural network that was trained on the ImageNet dataset that has 50 layers and won first place in the ILSVRC in 2015.

Now that we understand what these networks are, we will practice utilizing these pre-trained neural networks to classify an image of a slice of pizza with the VGG16 model.

Note

All the exercises and activities in this chapter will be developed in Jupyter notebooks. Please download this book's GitHub repository, along with all the prepared templates, from https://packt.live/2uI63CC.

Exercise 8.01: Identifying an Image Using the VGG16 Network

We have a picture of a slice of pizza. We will use the VGG16 network to process and identify the image. Before completing the following steps, ensure you have downloaded the pizza image from GitHub and saved it to your working directory:

  1. Import the libraries:

    import numpy as np

    from keras.applications.vgg16 import VGG16

    from keras.preprocessing import image

    from keras.applications.vgg16 import preprocess_input

  2. Initiate the model (this may take a while):

    classifier = VGG16()

    Note

    The last layer of predictions (Dense) has 1,000 values. This means that VGG16 has a total of 1,000 labels and our image will be one out of those 1,000 labels.

  3. Load the image. '../Data/Prediction/pizza.jpg.jpg' is the path of the image on our system; it may be different on your system:

    new_image= image.load_img('../Data/Prediction/pizza.jpg', \

                              target_size=(224, 224))

    new_image

    The following figure shows the output of the preceding code:

    Figure 8.4: An image of a slice of pizza

    The target size should be 224x224 since VGG16 only accepts (224,224).

  4. Change the image to an array by using the img_to_array function:

    transformed_image = image.img_to_array(new_image)

    transformed_image.shape

    The preceding code produces the following output:

    (224, 224, 3)

  5. The image has to be in a four-dimensional form for VGG16 to allow further processing. Expand the dimension of the image, as follows:

    transformed_image = np.expand_dims(transformed_image, axis=0)

    transformed_image.shape

    The preceding code produces the following output:

    (1, 224, 224, 3)

  6. Preprocess the image using the preprocess_input function:

    transformed_image = preprocess_input(transformed_image)

    transformed_image

    The following figure shows the output of the preceding code:

    Figure 8.5: A screenshot of image preprocessing

  7. Create the predictor variable:

    y_pred = classifier.predict(transformed_image)

    y_pred

  8. Check the shape of the image. It should be (1,1000). It's 1000 because the ImageNet database has 1000 categories of images. The predictor variable shows the probability of our image being one of those images:

    y_pred.shape

    The preceding code produces the following output:

    (1, 1000)

  9. Print the top five probabilities of what our image is using the decode_predictions function and pass the function of the predictor variable, y_pred, and the number of predictions and corresponding labels to output:

    from keras.applications.vgg16 import decode_predictions

    decode_predictions(y_pred,top=5)

    The preceding code produces the following output:

    [[('n07873807', 'pizza', 0.97680503),

      ('n07871810', 'meat_loaf', 0.012848727),

      ('n07880968', 'burrito', 0.0019428912),

      ('n04270147', 'spatula', 0.0019108421),

      ('n03887697', 'paper_towel', 0.0009799759)]]

    The first column of the array is the internal code number. The second is the possible label, while the third is the probability of the image being the label.

  10. Put the predictions in a human-readable form. Print the most probable label from the output from the result of the decode_predictions function:

    label = decode_predictions(y_pred)

    """

    Most likely result is retrieved, for example, the highest probability

    """

    decoded_label = label[0][0]

    # The classification is printed

    print('%s (%.2f%%)' % (decoded_label[1], \

          decoded_label[2]*100 ))

    The preceding code produces the following output:

    pizza (97.68%)

In this exercise, we predicted an image that says (with 97.68% probability) that the picture is pizza. Clearly, higher accuracy here means a relatively similar object to our picture is present in the ImageNet database, and our algorithm has successfully identified the image.

Note

To access the source code for this specific section, please refer to https://packt.live/3dXqdsQ.

You can also run this example online at https://packt.live/3dZMZAq.

In the following activity, we will put our knowledge to practice by using the VGG16 network to classify an image of a motorbike.

Activity 8.01: Using the VGG16 Network to Train a Deep Learning Network to Identify Images

You are given an image of a motorbike. Use the VGG16 network to predict the image. Before you start, ensure that you have downloaded the image (test_image_1) to your working directory. To complete this activity, follow these steps:

  1. Import the required libraries, along with the VGG16 network.
  2. Initiate the pre-trained VGG16 model.
  3. Load the image that is going to be classified.
  4. Preprocess the image by applying the transformations.
  5. Create a predictor variable to predict the image.
  6. Label the image and classify it.

    Note

    The solution for this activity can be found via this link.

With that, we have completed this activity. Unlike in Chapter 7, Computer Vision with Convolutional Neural Networks, we did not build a CNN from scratch. Instead, we used a pre-trained model. We just uploaded a picture that needs to be classified. From this, we can see that, with 84.33% accuracy, it is predicted to be a moped. In the next exercise, we'll work with an image for which there is no matching image in the ImageNet database.

Exercise 8.02: Classifying Images That Are Not Present in the ImageNet Database

Now, let's work with an image that is not part of the 1000 labels in our VGG16 network. In this exercise, we will work with an image of a stick insect, and there are no labels for stick insects in our pre-trained network. Let's see what results we get:

  1. Import the numpy library and the necessary Keras libraries:

    import numpy as np

    from keras.applications.vgg16 import VGG16

    from keras.preprocessing import image

    from keras.applications.vgg16 import preprocess_input

  2. Initiate the model and print a summary of the model:

    classifier = VGG16()

    classifier.summary()

    classifier.summary() shows us the architecture of the network. The following are the points to be noted – it has a four-dimensional input shape (None, 224, 224, 3) and it has three convolutional layers. The following figure shows the last four layers of the output:

    Figure 8.6: Summary of the image using the VGG16 classifier

    Note

    The last layer of predictions (Dense) has 1000 values. This means that VGG16 has a total of 1000 labels and that our image will be one out of those 1000 labels.

  3. Load the image. '../Data/Prediction/stick_insect.jpg' is the path of the image on our system. It will be different on your system:

    new_image = \

    image.load_img('../Data/Prediction/stick_insect.jpg', \

                   target_size=(224, 224))

    new_image

    The following figure shows the output of the preceding code:

    Figure 8.7: Sample stick insect image for prediction

    The target size should be 224x224 since VGG16 only accepts (224,224).

  4. Change the image to an array by using the img_to_array function:

    transformed_image = image.img_to_array(new_image)

    transformed_image.shape

  5. The image must be in a four-dimensional form for VGG16 to allow further processing. Expand the dimension of the image along the 0th axis using the expand_dims function:

    transformed_image = np.expand_dims(transformed_image, axis=0)

    transformed_image.shape

  6. Preprocess the image using the preprocess_input function:

    transformed_image = preprocess_input(transformed_image)

    transformed_image

    The following figure shows the output of the preceding code:

    Figure 8.8: Screenshot showing a few instances of image preprocessing

  7. Create the predictor variable:

    y_pred = classifier.predict(transformed_image)

    y_pred

    The following figure shows the output of the preceding code:

    Figure 8.9: Creating the predictor variable

  8. Check the shape of the image. It should be (1,1000). It's 1000 because, as we mentioned previously, the ImageNet database has 1000 categories of images. The predictor variable shows the probabilities of our image being one of those images:

    y_pred.shape

    The preceding code produces the following code:

    (1, 1000)

  9. Select the top five probabilities of what our image label is out of the 1000 labels that the VGG16 network has:

    from keras.applications.vgg16 import decode_predictions

    decode_predictions(y_pred, top=5)

    The preceding code produces the following code:

    [[('n02231487', 'walking_stick', 0.30524516),

      ('n01775062', 'wolf_spider', 0.26035702),

      ('n03804744', 'nail', 0.14323168),

      ('n01770081', 'harvestman', 0.066652186),

      ('n01773549', 'barn_spider', 0.03670299)]]

    The first column of the array is an internal code number. The second is the label, while the third is the probability of the image being the label.

  10. Put the predictions in a human-readable format. Print the most probable label from the output from the result of the decode_predictions function:

    label = decode_predictions(y_pred)

    """

    Most likely result is retrieved, for example, the highest probability

    """

    decoded_label = label[0][0]

    # The classification is printed

    print('%s (%.2f%%)' % (decoded_label[1], decoded_label[2]*100 ))

    The preceding code produces the following code:

    walking_stick (30.52%)

    Here, you can see that the network predicted that our image was a walking stick with 30.52% accuracy. Clearly, the image is not a walking stick but a stick insect; out of all the labels that the VGG16 network contains, a walking stick is the closest thing to a stick insect. The following image is that of a walking stick:

Figure 8.10: Walking stick

To avoid such outputs, we could freeze the existing layer of VGG16 and add our own layer. We could also add a layer that contains images of walking sticks and stick insects so that we can obtain better output.

If you have a large number of a walking stick and stick insect images, you could perform a similar task to improve the model's ability to classify images into their respective classes. You could then test it by rerunning the previous exercise.

Note

To access the source code for this specific section, please refer to https://packt.live/31I7bnR.

You can also run this example online at https://packt.live/31Hv1QE.

To understand this in detail, let's work on a different example, where we freeze the last layer of the network and add our own layer with images of cars and flowers. This will help the network improve its accuracy in classifying images of cars and flowers.

Exercise 8.03: Fine-Tuning the VGG16 Model

Let's work on fine-tuning the VGG16 model. In this exercise, we will freeze the network and remove the last layer of VGG16, which has 1000 labels in it. After removing the last layer, we will build a new flower-car classifier ANN, just like we did in Chapter 7, Computer Vision with Convolutional Neural Networks, and will connect this ANN to VGG16 instead of the original one with 1000 labels. Essentially, what we will do is replace the last layer of VGG16 with a user-defined layer.

Before we begin, ensure you have downloaded the image datasets from this book's GitHub repository to your own working directory. You will need a training_set folder and a test_set folder to test your model. Each of these folders will contain a cars folder, containing car images, and a flowers folder, containing flower images.

The steps for completing this exercise are as follows:

Note

Unlike the original new model, which had 1000 labels (100 different object categories), this new fine-tuned model will only have images of flowers or cars. So, whatever image you provide as an input to the model, it will categorize it as a flower or car based on its prediction probability.

  1. Import the numpy library, TensorFlow's random library, and the necessary Keras libraries:

    import numpy as np

    import keras

    from keras.layers import Dense

    from tensorflow import random

  2. Initiate the VGG16 model:

    vgg_model = keras.applications.vgg16.VGG16()

  3. Check the model summary:

    vgg_model.summary()

    The following figure shows the output of the preceding code:

    Figure 8.11: Model summary after initiating the model

  4. Remove the last layer, labeled predictions in the preceding image, from the model summary. Create a new Keras model of the sequential class and iterate through all the layers of the VGG model. Add all of them to the new model, except for the last layer:

    last_layer = str(vgg_model.layers[-1])

    np.random.seed(42)

    random.set_seed(42)

    classifier= keras.Sequential()

    for layer in vgg_model.layers:

        if str(layer) != last_layer:

            classifier.add(layer)

    Here, we have created a new model name's classifier instead of vgg_model. All the layers, except the last layer, that is, vgg_model, have been included in the classifier.

  5. Print the summary of the newly created model:

    classifier.summary()

    The following figure shows the output of the preceding code:

    Figure 8.12: Rechecking the summary after removing the last layer

    The last layer of prediction (Dense) has been deleted.

  6. Freeze the layers by iterating through the layers and setting the trainable parameter to False:

    for layer in classifier.layers:

        layer.trainable=False

  7. Add a new output layer of size 1 with a sigmoid activation function and print the model summary:

    classifier.add(Dense(1, activation='sigmoid'))

    classifier.summary()

    The following function shows the output of the preceding code:

    Figure 8.13: Rechecking the summary after adding the new layer

    Now, the last layer is the newly created user-defined layer.

  8. Compile the network with an adam optimizer and binary cross-entropy loss and compute the accuracy during training:

    classifier.compile(optimizer='adam', loss='binary_crossentropy', \

                       metrics=['accuracy'])

    Create some training and test data generators, just like we did in Chapter 7, Computer Vision with Convolutional Neural Networks. Rescale the training and test images by 1/255 so that all the values are between 0 and 1. Set the following parameters for the training data generators only: shear_range=0.2, zoom_range=0.2, and horizontal_flip=True.

  9. Next, create a training set from the training set folder. ../Data/dataset/training_set is the folder where our data is placed. Our CNN model has an image size of 224x224, so the same size should be passed here too. batch_size is the number of images in a single batch, which is 32. class_mode is binary since we are creating a binary classifier.

    Note

    Unlike in Chapter 7, Computer Vision with Convolutional Neural Networks, where the image size was 64x64, VGG16 needs an image size of 224x224.

    Finally, fit the model to the training data:

    from keras.preprocessing.image import ImageDataGenerator

    generate_train_data = \

    ImageDataGenerator(rescale = 1./255,\

                       shear_range = 0.2,\

                       zoom_range = 0.2,\

                       horizontal_flip = True)

    generate_test_data = ImageDataGenerator(rescale =1./255)

    training_dataset = \

    generate_train_data.flow_from_directory(\

        '../Data/Dataset/training_set',\

        target_size = (224, 224),\

        batch_size = 32,\

        class_mode = 'binary')

    test_datasetset = \

    generate_test_data.flow_from_directory(\

        '../Data/Dataset/test_set',\

        target_size = (224, 224),\

        batch_size = 32,\

        class_mode = 'binary')

    classifier.fit_generator(training_dataset,\

                             steps_per_epoch = 100,\

                             epochs = 10,\

                             validation_data = test_datasetset,\

                             validation_steps = 30,\

                             shuffle=False)

    There are 100 training images here, so set steps_per_epoch =100, set validation_steps=30, and set shuffle=False:

    100/100 [==============================] - 2083s 21s/step - loss: 0.5513 - acc: 0.7112 - val_loss: 0.3352 - val_acc: 0.8539

  10. Predict the new image (the code is the same as it was in Chapter 7, Computer Vision with Convolutional Neural Networks). First, load the image from '../Data/Prediction/test_image_2.jpg' and set the target size to (224, 224) since the VGG16 model accepts images of that size.

    from keras.preprocessing import image

    new_image = \

    image.load_img('../Data/Prediction/test_image_2.jpg', \

                   target_size = (224, 224))

    new_image

    At this point, you can view the image by executing the code new_image and the class labels by running training_dataset.class_indices.

    Next, preprocess the image, first by converting the image into an array using the img_to_array function, then by adding another dimension along the 0th axis using the expand_dims function. Finally, make the prediction using the predict method of the classifier and printing the output in human-readable format:

    new_image = image.img_to_array(new_image)

    new_image = np.expand_dims(new_image, axis = 0)

    result = classifier.predict(new_image)

    if result[0][0] == 1:

        prediction = 'It is a flower'

    else:

        prediction = 'It is a car'

    print(prediction)

    The preceding code produces the following output:

    It is a car

  11. As a final step, you can save the classifier by running classifier.save('car-flower-classifier.h5').

Here, we can see that the algorithm has done the correct image classification by identifying the image of the car. We just used a pre-built VGG16 model for image classification by tweaking its layers and molding it as per our requirements. This is a very powerful technique for image classification.

Note

To access the source code for this specific section, please refer to https://packt.live/2ZxCqzA

This section does not currently have an online interactive example, and will need to be run locally.

In the next exercise, we will utilize a different pre-trained model, known as ResNet50, and demonstrate how to classify images with this model.

Exercise 8.04: Image Classification with ResNet

Finally, before closing this chapter, let's work on an exercise with the ResNet50 network. We'll use an image of a Nascar racer and try to predict it through the network. Follow these steps to complete this exercise:

  1. Import the necessary libraries:

    import numpy as np

    from keras.applications.resnet50 import ResNet50, preprocess_input

    from keras.preprocessing import image

  2. Initiate the ResNet50 model and print the summary of the model:

    classifier = ResNet50()

    classifier.summary()

    The following figure shows the output of the preceding code:

    Figure 8.14: A summary of the model

  3. Load the image. '../Data/Prediction/test_image_3.jpg' is the path of the image on our system. It will be different on your system:

    new_image = \

    image.load_img('../Data/Prediction/test_image_3.jpg', \

                   target_size=(224, 224))

    new_image

    The following figure shows the output of the preceding code:

    Figure 8.15: Sample Nascar racer image for prediction

    Note that the target size should be 224x224 since ResNet50 only accepts (224,224).

  4. Change the image to an array by using the img_to_array function:

    transformed_image = image.img_to_array(new_image)

    transformed_image.shape

  5. The image has to be in a four-dimensional form for ResNet50 to allow further processing. Expand the dimension along the 0th axis using the expand_dims function:

    transformed_image = np.expand_dims(transformed_image, axis=0)

    transformed_image.shape

  6. Preprocess the image using the preprocess_input function:

    transformed_image = preprocess_input(transformed_image)

    transformed_image

  7. Create the predictor variable by using the classifier to predict the image using its predict method:

    y_pred = classifier.predict(transformed_image)

    y_pred

  8. Check the shape of the image. It should be (1,1000):

    y_pred.shape

    The preceding code produces the following output:

    (1, 1000)

  9. Select the top five probabilities of what our image is using the decode_predictions function and by passing the predictor variable, y_pred, as the argument and the top number of predictions and corresponding labels:

    from keras.applications.resnet50 import decode_predictions

    decode_predictions(y_pred, top=5)

    The preceding code produces the following output:

    [[('n04037443', 'racer', 0.8013074),

      ('n04285008', 'sports_car', 0.06431753),

      ('n02974003', 'car_wheel', 0.024077434),

      ('n02504013', 'Indian_elephant', 0.019822922),

      ('n04461696', 'tow_truck', 0.007778575)]]

    The first column of the array is an internal code number. The second is the label, while the third is the probability of the image being the label.

  10. Put the predictions in a human-readable format. Print the most probable label from the output from the result of the decode_predictions function:

    label = decode_predictions(y_pred)

    """

    Most likely result is retrieved, for example, the highest probability

    """

    decoded_label = label[0][0]

    # The classification is printed

    print('%s (%.2f%%)' % (decoded_label[1], \

          decoded_label[2]*100 ))

    The preceding code produces the following output:

    racer (80.13%)

Here, the model clearly shows (with a probability of 80.13%) that the picture is that of a racer. This is the power of pre-trained models, and Keras gives us the flexibility to use and tweak these models.

Note

To access the source code for this specific section, please refer to https://packt.live/2BzvTMK.

You can also run this example online at https://packt.live/3eWelJh.

In the next activity, we will classify another image using the pre-trained ResNet50 model.

Activity 8.02: Image Classification with ResNet

Now, let's work on an activity that uses another pre-trained network, known as ResNet. We have an image of television located at ../Data/Prediction/test_image_4. We will use the ResNet50 network to predict the image. To implement the activity, follow these steps:

  1. Import the required libraries.
  2. Initiate the ResNet model.
  3. Load the image that needs to be classified.
  4. Preprocess the image by applying the appropriate transformations.
  5. Create a predictor variable to predict the image.
  6. Label the image and classify it.

    Note

    The solution for this activity can be found via this link.

So, the network says, with close to 100% accuracy, that the image is that of a television. This time, we used a ResNet50 pre-trained model to classify the image of television and obtained similar results to those we obtained using the VGG16 model to predict the image of a slice of pizza.

Summary

In this chapter, we covered the concept of transfer learning and how is it related to pre-trained networks. We utilized this knowledge by using the pre-trained deep learning networks VGG16 and ResNet50 to predict various images. We practiced how to take advantage of such pre-trained networks using techniques such as feature extraction and fine-tuning to train models faster and more accurately. Finally, we learned the powerful technique of tweaking existing models and making them work according to our dataset. This technique of building our own ANN over an existing CNN is one of the most powerful techniques used in the industry.

In the next chapter, we will learn about sequential modeling and sequential memory by looking at some real-life cases with Google Assistant. Furthermore, we will learn how sequential modeling is related to Recurrent Neural Networks (RNN). We will learn about the vanishing gradient problem in detail and how using an LSTM is better than a simple RNN to overcome the vanishing gradient problem. We will apply what we have learned to time series problems by predicting stock trends that come out as fairly accurate.