Appendix – The Deep Learning with Keras Workshop

Appendix

1. Introduction to Machine Learning with Keras

Activity 1.01: Adding Regularization to the Model

In this activity, we will utilize the same logistic regression model from the scikit-learn package. This time, however, we will add regularization to the model and search for the optimum regularization parameter - a process often called hyperparameter tuning. After training the models, we will test the predictions and compare the model evaluation metrics to the ones that were produced by the baseline model and the model without regularization.

  1. Load the feature data from Exercise 1.03, Appropriate Representation of the Data, and the target data from Exercise 1.02, Cleaning the Data:

    import pandas as pd

    feats = pd.read_csv('../data/OSI_feats_e3.csv')

    target = pd.read_csv('../data/OSI_target_e2.csv')

  2. Create a test and train dataset. Train the data using the training dataset. This time, however, use part of the training dataset for validation in order to choose the most appropriate hyperparameter.

    Once again, we will use test_size = 0.2, which means that 20% of the data will be reserved for testing. The size of our validation set will be determined by how many validation folds we have. If we do 10-fold cross-validation, this equates to reserving 10% of the training dataset to validate our model on. Each fold will use a different 10% of the training dataset, and the average error across all folds is used to compare models with different hyperparameters. Assign a random value to the random_state variable:

    from sklearn.model_selection import train_test_split

    test_size = 0.2

    random_state = 13

    X_train, X_test, y_train, y_test = \

    train_test_split(feats, target, test_size=test_size, \

                     random_state=random_state)

  3. Check the dimensions of the DataFrames:

    print(f'Shape of X_train: {X_train.shape}')

    print(f'Shape of y_train: {y_train.shape}')

    print(f'Shape of X_test: {X_test.shape}')

    print(f'Shape of y_test: {y_test.shape}')

    The preceding code produces the following output:

    Shape of X_train: (9864, 68)

    Shape of y_train: (9864, 1)

    Shape of X_test: (2466, 68)

    Shape of y_test: (2466, 1)

  4. Next, instantiate the models. Try two types of regularization parameters, l1 and l2, with 10-fold cross-validation. Iterate our regularization parameter from 1x10-2 to 1x106 equally in the logarithmic space to observe how the parameters affect the results:

    import numpy as np

    from sklearn.linear_model import LogisticRegressionCV

    Cs = np.logspace(-2, 6, 9)

    model_l1 = LogisticRegressionCV(Cs=Cs, penalty='l1', \

                                    cv=10, solver='liblinear', \

                                    random_state=42, max_iter=10000)

    model_l2 = LogisticRegressionCV(Cs=Cs, penalty='l2', cv=10, \

                                    random_state=42, max_iter=10000)

    Note

    For a logistic regression model with the l1 regularization parameter, only the liblinear solver can be used.

  5. Next, fit the models to the training data:

    model_l1.fit(X_train, y_train['Revenue'])

    model_l2.fit(X_train, y_train['Revenue'])

    The following figure shows the output of the preceding code:

    Figure 1.37: Output of the fit command indicating all of the model training parameters

  6. Here, we can see what the value of the regularization parameter was for the two different models. The regularization parameter is chosen according to which produced a model with the lowest error:

    print(f'Best hyperparameter for l1 regularization model: \

    {model_l1.C_[0]}')

    print(f'Best hyperparameter for l2 regularization model: \

    {model_l2.C_[0]}')

    The preceding code produces the following output:

    Best hyperparameter for l1 regularization model: 1000000.0

    Best hyperparameter for l2 regularization model: 1.0

    Note

    The C_ attribute is only available once the model has been trained because it is set once the best parameter from the cross-validation process has been determined.

  7. To evaluate the performance of the models, make predictions on the test set, which we'll compare against the true values:

    y_pred_l1 = model_l1.predict(X_test)

    y_pred_l2 = model_l2.predict(X_test)

  8. To compare these models, calculate the evaluation metrics. First, look at the accuracy of the model:

    from sklearn import metrics

    accuracy_l1 = metrics.accuracy_score(y_pred=y_pred_l1, \

                                         y_true=y_test)

    accuracy_l2 = metrics.accuracy_score(y_pred=y_pred_l2, \

                                         y_true=y_test)

    print(f'Accuracy of the model with l1 regularization is \

    {accuracy_l1*100:.4f}%')

    print(f'Accuracy of the model with l2 regularization is \

    {accuracy_l2*100:.4f}%')

    The preceding code produces the following output:

    Accuracy of the model with l1 regularization is 89.2133%

    Accuracy of the model with l2 regularization is 89.2944%

  9. Also, look at the other evaluation metrics:

    precision_l1, recall_l1, fscore_l1, _ = \

    metrics.precision_recall_fscore_support(y_pred=y_pred_l1, \

                                            y_true=y_test, \

                                            average='binary')

    precision_l2, recall_l2, fscore_l2, _ = \

    metrics.precision_recall_fscore_support(y_pred=y_pred_l2, \

                                            y_true=y_test, \

                                            average='binary')

    print(f'l1\nPrecision: {precision_l1:.4f}\nRecall: \

    {recall_l1:.4f}\nfscore: {fscore_l1:.4f}\n\n')

    print(f'l2\nPrecision: {precision_l2:.4f}\nRecall: \

    {recall_l2:.4f}\nfscore: {fscore_l2:.4f}')

    The preceding code produces the following output:

    l1

    Precision: 0.7300

    Recall: 0.4078

    fscore: 0.5233

    l2

    Precision: 0.7350

    Recall: 0.4106

    fscore: 0.5269

  10. Observe the values of the coefficients once the model has been trained:

    coef_list = [f'{feature}: {coef}' for coef, \

                 feature in sorted(zip(model_l1.coef_[0], \

                                   X_train.columns.values.tolist()))]

    for item in coef_list:

        print(item)

    Note

    The coef_ attribute is only available once the model has been trained because it is set once the best parameter from the cross-validation process has been determined.

    The following figure shows the output of the preceding code:

    Figure 1.38: The feature column names and the value of their respective coefficients for the model with l1 regularization

  11. Do the same for the model with an l2 regularization parameter type:

    coef_list = [f'{feature}: {coef}' for coef, \

                 feature in sorted(zip(model_l2.coef_[0], \

                                       X_train.columns.values.tolist()))]

    for item in coef_list:

        print(item)

    The following figure shows the output of the preceding code:

Figure 1.39: The feature column names and the value of their respective coefficients for the model with l2 regularization

Note

To access the source code for this specific section, please refer to https://packt.live/2VIoe5M.

This section does not currently have an online interactive example, and will need to be run locally.

2. Machine Learning versus Deep Learning

Activity 2.01: Creating a Logistic Regression Model Using Keras

In this activity, we are going to create a basic model using the Keras library. The model that we will build will classify users of a website into those that will purchase a product from a website and those that will not. To do this, we will utilize the same online shopping purchasing intention dataset that we did previously and attempt to predict the same variables that we did in Chapter 1, Introduction to Machine Learning with Keras.

Perform the following steps to complete this activity:

  1. Open a Jupyter notebook from the start menu to implement this activity. Load in the online shopping purchasing intention datasets, which you can download from the GitHub repository. We will use the pandas library for data loading, so import the pandas library. Ensure you have saved the csv files to an appropriate data folder for this chapter first. Alternatively, you can change the path to the files that you use in your code.

    import pandas as pd

    feats = pd.read_csv('../data/OSI_feats.csv')

    target = pd.read_csv('../data/OSI_target.csv')

  2. For the purposes of this activity, we will not perform any further preprocessing. As we did in the previous chapter, we will split the dataset into training and testing and leave the testing until the very end when we evaluate our models. We will reserve 20% of our data for testing by setting the test_size=0.2 parameter, and we will create a random_state parameter so that we can recreate the results:

    from sklearn.model_selection import train_test_split

    test_size = 0.2

    random_state = 42

    X_train, X_test, y_train, y_test = \

    train_test_split(feats, target, test_size=test_size, \

                     random_state=random_state)

  3. Set a seed in numpy and tensorflow for reproducibility. Begin creating the model by initializing a model of the Sequential class:

    from keras.models import Sequential

    import numpy as np

    from tensorflow import random

    np.random.seed(random_state)

    random.set_seed(random_state)

    model = Sequential()

  4. To add a fully connected layer to the model, add a layer of the Dense class. Here, we include the number of nodes in the layer. In our case, this will be one since we are performing binary classification and our desired output is zero or one. Also, specify the input dimensions, which is only done on the first layer of the model. It is there to indicate the format of the input data. Pass the number of features:

    from keras.layers import Dense

    model.add(Dense(1, input_dim=X_train.shape[1]))

  5. Add a sigmoid activation function to the output of the previous layer to replicate the logistic regression algorithm:

    from keras.layers import Activation

    model.add(Activation('sigmoid'))

  6. Once we have all the model components in the correct order, we must compile the model so that all the learning processes are configured. Use the adam optimizer, a binary_crossentropy for the loss, and track the accuracy of the model by passing the parameter into the metrics argument:

    model.compile(optimizer='adam', loss='binary_crossentropy', \

                  metrics=['accuracy'])

  7. Print the model summary to verify the model is as we expect it to be:

    print(model.summary())

    The following figure shows the output of the preceding code:

    Figure 2.19: A summary of the model

  8. Next, fit the model using the fit method of the model class. Provide the training data, as well as the number of epochs and how much data to use for validation after each epoch:

    history = model.fit(X_train, y_train['Revenue'], epochs=10, \

                        validation_split=0.2, shuffle=False)

    The following figure shows the output of the preceding code:

    Figure 2.20: Using the fit method on the model

  9. The values for the loss and accuracy have been stored within the history variable. Plot the values for each using the loss and accuracy we tracked after each epoch:

    import matplotlib.pyplot as plt

    %matplotlib inline

    # Plot training and validation accuracy values

    plt.plot(history.history['accuracy'])

    plt.plot(history.history['val_accuracy'])

    plt.title('Model accuracy')

    plt.ylabel('Accuracy')

    plt.xlabel('Epoch')

    plt.legend(['Train', 'Validation'], loc='upper left')

    plt.show()

    # Plot training and validation loss values

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.title('Model loss')

    plt.ylabel('Loss')

    plt.xlabel('Epoch')

    plt.legend(['Train', 'Validation'], loc='upper left')

    plt.show()

    The following plots show the output of the preceding code:

    Figure 2.21: The loss and accuracy while fitting the model

  10. Finally, evaluate the model on the test data we held out from the beginning, which will give an objective evaluation of the performance of the model:

    test_loss, test_acc = model.evaluate(X_test, y_test['Revenue'])

    print(f'The loss on the test set is {test_loss:.4f} \

    and the accuracy is {test_acc*100:.3f}%')

    The output of the preceding code can be found below. Here, the model predicts the purchasing intention of users in the test dataset and evaluates the performance by comparing it to the real values in y_test. Evaluating the model on the test dataset produces loss and accuracy values that we can print out:

    2466/2466 [==============================] - 0s 15us/step

    The loss on the test set is 0.3632 and the accuracy is 86.902%

    Note

    To access the source code for this specific section, please refer to

    https://packt.live/3dVTQLe.

    You can also run this example online at https://packt.live/2ZxEhV4.

3. Deep Learning with Keras

Activity 3.01: Building a Single-Layer Neural Network for Performing Binary Classification

In this activity, we will compare the results of a logistic regression model and single-layer neural networks of different node sizes and different activation functions. The dataset we will use represents the normalized test results of aircraft propeller inspections, while the class represents whether they passed or failed a manual visual inspection. We will create models to predict the results of the manual inspection when given the automated test results. Follow these steps to complete this activity:

  1. Load all the required packages:

    # import required packages from Keras

    from keras.models import Sequential

    from keras.layers import Dense, Activation

    import numpy as np

    import pandas as pd

    from tensorflow import random

    from sklearn.model_selection import train_test_split

    # import required packages for plotting

    import matplotlib.pyplot as plt

    import matplotlib

    %matplotlib inline

    import matplotlib.patches as mpatches

    # import the function for plotting decision boundary

    from utils import plot_decision_boundary

  2. Set up a seed:

    """

    define a seed for random number generator so the result will be reproducible

    """

    seed = 1

  3. Load the simulated dataset and print the size of X and Y and the number of examples:

    """

    load the dataset, print the shapes of input and output and the number of examples

    """

    feats = pd.read_csv('../data/outlier_feats.csv')

    target = pd.read_csv('../data/outlier_target.csv')

    print("X size = ", feats.shape)

    print("Y size = ", target.shape)

    print("Number of examples = ", feats.shape[0])

    Expected output:

    X size = (3359, 2)

    Y size = (3359, 1)

    Number of examples = 3359

  4. Plot the dataset. The x and y coordinates of each point will be the two input features. The color of each record represents the pass/fail result:

    class_1=plt.scatter(feats.loc[target['Class']==0,'feature1'], \

                        feats.loc[target['Class']==0,'feature2'], \

                        c="red", s=40, edgecolor='k')

    class_2=plt.scatter(feats.loc[target['Class']==1,'feature1'], \

                        feats.loc[target['Class']==1,'feature2'], \

                        c="blue", s=40, edgecolor='k')

    plt.legend((class_1, class_2),('Fail','Pass'))

    plt.xlabel('Feature 1')

    plt.ylabel('Feature 2')

    The following image shows the output of the preceding code:

    Figure 3.19: Simulated training data points

  5. Build the logistic regression model, which will be a one-node sequential model with no hidden layers and a sigmoid activation function:

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

    model.add(Dense(1, activation='sigmoid', input_dim=2))

    model.compile(optimizer='sgd', loss='binary_crossentropy')

  6. Fit the model to the training data:

    model.fit(feats, target, batch_size=5, epochs=100, verbose=1, \

              validation_split=0.2, shuffle=False)

    Expected output:

    The loss on the validation set after 100 epochs = 0.3537:

    Figure 3.20: The loss details of the last 5 epochs out of 100

  7. Plot the decision boundary on the training data:

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plot_decision_boundary(lambda x: model.predict(x), feats, target)

    plt.title("Logistic Regression")

    The following image shows the output of the preceding code:

    Figure 3.21: The decision boundary of the logistic regression model

    The linear decision boundary of the logistic regression model is obviously unable to capture the circular decision boundary between the two classes and predicts all the results as a passed result.

  8. Create a neural network with one hidden layer with three nodes and a relu activation function and an output layer with one node and a sigmoid activation function. Finally, compile the model:

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

    model.add(Dense(3, activation='relu', input_dim=2))

    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='sgd', loss='binary_crossentropy')

  9. Fit the model to the training data:

    model.fit(feats, target, batch_size=5, epochs=200, verbose=1, \

              validation_split=0.2, shuffle=False)

    Expected output:

    The loss that's evaluated on the validation set after 200 epochs = 0.0260:

    Figure 3.22: The loss details of the last 5 epochs out of 200

  10. Plot the decision boundary that was created:

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plot_decision_boundary(lambda x: model.predict(x), feats, target)

    plt.title("Decision Boundary for Neural Network with "\

              "hidden layer size 3")

    The following image shows the output of the preceding code:

    Figure 3.23: The decision boundary for the neural network with a hidden layer size of 3 and a ReLU activation function

    Having three processing units instead of one dramatically improved the capability of the model in capturing the non-linear boundary between the two classes. Notice that the loss value decreased drastically in comparison to the previous step.

  11. Create a neural network with one hidden layer with six nodes and a relu activation function and an output layer with one node and a sigmoid activation function. Finally, compile the model:

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

    model.add(Dense(6, activation='relu', input_dim=2))

    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='sgd', loss='binary_crossentropy')

  12. Fit the model to the training data:

    model.fit(feats, target, batch_size=5, epochs=400, verbose=1, \

              validation_split=0.2, shuffle=False)

    Expected output:

    The loss after 400 epochs = 0.0231:

    Figure 3.24: The loss details of the last 5 epochs out of 400

  13. Plot the decision boundary:

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plot_decision_boundary(lambda x: model.predict(x), feats, target)

    plt.title("Decision Boundary for Neural Network with "\

              "hidden layer size 6")

    The following image shows the output of the preceding code:

    Figure 3.25: The decision boundary for the neural network with a hidden layer size of 6 and the ReLU activation function

    By doubling the number of units in the hidden layer, the decision boundary of the model gets closer to a true circular shape, and the loss value is decreased even more in comparison to the previous step.

  14. Create a neural network with one hidden layer with three nodes and a tanh activation function and an output layer with one node and a sigmoid activation function. Finally, compile the model:

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

    model.add(Dense(3, activation='tanh', input_dim=2))

    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='sgd', loss='binary_crossentropy')

  15. Fit the model to the training data:

    model.fit(feats, target, batch_size=5, epochs=200, verbose=1, \

              validation_split=0.2, shuffle=False)

    Expected output:

    The loss after 200 epochs = 0.0426:

    Figure 3.26: The loss details of the last 5 epochs out of 200

  16. Plot the decision boundary:

    plot_decision_boundary(lambda x: model.predict(x), feats, target)

    plt.title("Decision Boundary for Neural Network with "\

              "hidden layer size 3")

    The following image shows the output of the preceding code:

    Figure 3.27: The decision boundary for the neural network with a hidden layer size of 3 and the tanh activation function

    Using the tanh activation function has eliminated the sharp edges in the decision boundary. In other words, it has made the decision boundary smoother. However, the model is not performing better since we can see an increase in the loss value. We achieved similar loss and accuracy scores when we evaluated on the test dataset, despite mentioning previously that the learning parameters for tanh are slower than they are for relu.

  17. Create a neural network with one hidden layer with six nodes and a tanh activation function and an output layer with one node and a sigmoid activation function. Finally, compile the model:

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

    model.add(Dense(6, activation='tanh', input_dim=2))

    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='sgd', loss='binary_crossentropy')

  18. Fit the model to the training data:

    model.fit(feats, target, batch_size=5, epochs=400, verbose=1, \

              validation_split=0.2, shuffle=False)

    Expected output:

    The loss after 400 epochs = 0.0215:

    Figure 3.28: The loss details of the last 5 epochs out of 400

  19. Plot the decision boundary:

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plot_decision_boundary(lambda x: model.predict(x), feats, target)

    plt.title("Decision Boundary for Neural Network with "\

              "hidden layer size 6")

    The following image shows the output of the preceding code:

Figure 3.29: The decision boundary for the neural network with a hidden layer size of 6 and the tanh activation function

Again, using the tanh activation function instead of relu and adding more nodes to our hidden layer has smoothed the curves on the decision boundary more, fitting the training data better according to the accuracy of the training data. We should be careful not to add too many nodes to the hidden layer as we may begin to overfit the data. This can be observed by evaluating the test set, where there is a slight decrease in the accuracy of the neural network with six nodes compared to a neural network with three.

Note

To access the source code for this specific section, please refer to https://packt.live/3iv0wn1.

You can also run this example online at https://packt.live/2BqumZt.

Activity 3.02: Advanced Fibrosis Diagnosis with Neural Networks

In this activity, you are going to use a real dataset to predict whether a patient has advanced fibrosis based on measurements such as age, gender, and BMI. The dataset consists of information for 1,385 patients who underwent treatment dosages for hepatitis C. For each patient, 28 different attributes are available, as well as a class label, which can only take two values: 1, indicating advanced fibrosis, and 0, indicating no indication of advanced fibrosis. This is a binary/two-class classification problem with an input dimension equal to 28.

In this activity, you will implement different deep neural network architectures to perform this classification, plot the trends in training error rates and test error rates, and determine how many epochs the final classifier needs to be trained for. Follow these steps to complete this activity:

  1. Import all the necessary libraries and load the dataset using the pandas read_csv function:

    import pandas as pd

    import numpy as np

    from tensorflow import random

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from keras.models import Sequential

    from keras.layers import Dense

    import matplotlib.pyplot as plt

    import matplotlib

    %matplotlib inline

    X = pd.read_csv('../data/HCV_feats.csv')

    y = pd.read_csv('../data/HCV_target.csv')

  2. Print the number of records and features in the feature dataset and the number of unique classes in the target dataset:

    print("Number of Examples in the Dataset = ", X.shape[0])

    print("Number of Features for each example = ", X.shape[1])

    print("Possible Output Classes = ", \

          y['AdvancedFibrosis'].unique())

    Expected output:

    Number of Examples in the Dataset = 1385

    Number of Features for each example = 28

    Possible Output Classes = [0 1]

  3. Normalize the data and scale it. Following this, split the dataset into the training and test sets:

    seed = 1

    np.random.seed(seed)

    sc = StandardScaler()

    X = pd.DataFrame(sc.fit_transform(X), columns=X.columns)

    X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.2, random_state=seed)

    # Print the information regarding dataset sizes

    print(X_train.shape)

    print(y_train.shape)

    print(X_test.shape)

    print(y_test.shape)

    print ("Number of examples in training set = ", X_train.shape[0])

    print ("Number of examples in test set = ", X_test.shape[0])

    Expected output:

    (1108, 28)

    (1108, 1)

    (277, 28)

    (277, 1)

    Number of examples in training set = 1108

    Number of examples in test set = 277

  4. Implement a deep neural network with one hidden layer of size 3 and a tanh activation function, an output layer with one node, and a sigmoid activation function. Finally, compile the model and print out a summary of the model:

    np.random.seed(seed)

    random.set_seed(seed)

    # define the keras model

    classifier = Sequential()

    classifier.add(Dense(units = 3, activation = 'tanh', \

                         input_dim=X_train.shape[1]))

    classifier.add(Dense(units = 1, activation = 'sigmoid'))

    classifier.compile(optimizer = 'sgd', loss = 'binary_crossentropy', \

                       metrics = ['accuracy'])

    classifier.summary()

    The following image shows the output of the preceding code:

    Figure 3.30: The architecture of the neural network

  5. Fit the model to the training data:

    history=classifier.fit(X_train, y_train, batch_size = 20, \

                           epochs = 100, validation_split=0.1, \

                           shuffle=False)

  6. Plot the training error rate and test error rate for every epoch:

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    Expected output:

    Figure 3.31: A plot of the training error rate and test error rate while training the model

  7. Print the values of the best accuracy that was reached on the training set and on the test set, as well as the loss and accuracy that was evaluated on the test dataset.

    print(f"Best Accuracy on training set = \

    {max(history.history['accuracy'])*100:.3f}%")

    print(f"Best Accuracy on validation set = \

    {max(history.history['val_accuracy'])*100:.3f}%")

    test_loss, test_acc = \

    classifier.evaluate(X_test, y_test['AdvancedFibrosis'])

    print(f'The loss on the test set is {test_loss:.4f} and \

    the accuracy is {test_acc*100:.3f}%')

    The following image shows the output of the preceding code:

    Best Accuracy on training set = 52.959%

    Best Accuracy on validation set = 58.559%

    277/277 [==============================] - 0s 25us/step

    The loss on the test set is 0.6885 and the accuracy is 55.235%

  8. Implement a deep neural network with two hidden layers of sizes 4 and 2 with a tanh activation function, an output layer with one node, and a sigmoid activation function. Finally, compile the model and print out a summary of the model:

    np.random.seed(seed)

    random.set_seed(seed)

    # define the keras model

    classifier = Sequential()

    classifier.add(Dense(units = 4, activation = 'tanh', \

                         input_dim = X_train.shape[1]))

    classifier.add(Dense(units = 2, activation = 'tanh'))

    classifier.add(Dense(units = 1, activation = 'sigmoid'))

    classifier.compile(optimizer = 'sgd', loss = 'binary_crossentropy', \

                       metrics = ['accuracy'])

    classifier.summary()

    Figure 3.32: The architecture of the neural network

  9. Fit the model to the training data:

    history=classifier.fit(X_train, y_train, batch_size = 20, \

                           epochs = 100, validation_split=0.1, \

                           shuffle=False)

  10. Plot training and test error plots with two hidden layers of size 4 and 2. Print the best accuracy that was reached on the training and test sets:

    # plot training error and test error plots

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    Expected output:

    Figure 3.33: A plot of the training error and test error rates while training the model

  11. Print the values of the best accuracy that was achieved on the training set and on the test set, as well as the loss and accuracy that was evaluated on the test dataset.

    print(f"Best Accuracy on training set = \

    {max(history.history['accuracy'])*100:.3f}%")

    print(f"Best Accuracy on validation set = \

    {max(history.history['val_accuracy'])*100:.3f}%")

    test_loss, test_acc = \

    classifier.evaluate(X_test, y_test['AdvancedFibrosis'])

    print(f'The loss on the test set is {test_loss:.4f} and \

    the accuracy is {test_acc*100:.3f}%')

    The following shows the output of the preceding code:

    Best Accuracy on training set = 57.272%

    Best Accuracy on test set = 54.054%

    277/277 [==============================] - 0s 41us/step

    The loss on the test set is 0.7016 and the accuracy is 49.819%

    Note

    To access the source code for this specific section, please refer to https://packt.live/2BrIRMF.

    You can also run this example online at https://packt.live/2NUl22A.

4. Evaluating Your Model with Cross-Validation Using Keras Wrappers

Activity 4.01: Model Evaluation Using Cross-Validation for an Advanced Fibrosis Diagnosis Classifier

In this activity, we are going to use what we learned in this topic to train and evaluate a deep learning model using k-fold cross-validation. We will use the model that resulted in the best test error rate from the previous activity and the goal will be to compare the cross-validation error rate with the training set/test set approach error rate. The dataset we will use is the hepatitis C dataset, in which we will build a classification model to predict which patients get advanced fibrosis. Follow these steps to complete this activity:

  1. Load the dataset and print the number of records and features in the dataset, as well as the number of possible classes in the target dataset:

    # Load the dataset

    import pandas as pd

    X = pd.read_csv('../data/HCV_feats.csv')

    y = pd.read_csv('../data/HCV_target.csv')

    # Print the sizes of the dataset

    print("Number of Examples in the Dataset = ", X.shape[0])

    print("Number of Features for each example = ", X.shape[1])

    print("Possible Output Classes = ", \

          y['AdvancedFibrosis'].unique())

    Here's the expected output:

    Number of Examples in the Dataset = 1385

    Number of Features for each example = 28

    Possible Output Classes = [0 1]

  2. Define the function that returns the Keras model. First, import the necessary libraries for Keras. Inside the function, instantiate the sequential model and add two dense layers, with the first of size 4 and the second of size 2, both with tanh activation functions. Add the output layer with a sigmoid activation function. Compile the model and return the model from the function:

    from keras.models import Sequential

    from keras.layers import Dense

    # Create the function that returns the keras model

    def build_model():

        model = Sequential()

        model.add(Dense(4, input_dim=X.shape[1], activation='tanh'))

        model.add(Dense(2, activation='tanh'))

        model.add(Dense(1, activation='sigmoid'))

        model.compile(loss='binary_crossentropy', optimizer='adam', \

                      metrics=['accuracy'])

        return model

  3. Scale the training data using the StandardScaler function. Set the seed so that the model is reproducible. Define the n_folds, epochs, and batch_size hyperparameters. Then, build the Keras wrapper with scikit-learn, define the cross-validation iterator, perform k-fold cross-validation, and store the scores:

    # import required packages

    import numpy as np

    from tensorflow import random

    from keras.wrappers.scikit_learn import KerasClassifier

    from sklearn.model_selection import StratifiedKFold

    from sklearn.model_selection import cross_val_score

    from sklearn.preprocessing import StandardScaler

    sc = StandardScaler()

    X = pd.DataFrame(sc.fit_transform(X), columns=X.columns)

    """

    define a seed for random number generator so the result will be reproducible

    """

    seed = 1

    np.random.seed(seed)

    random.set_seed(seed)

    """

    determine the number of folds for k-fold cross-validation, number of epochs and batch size

    """

    n_folds = 5

    epochs = 100

    batch_size = 20

    # build the scikit-learn interface for the keras model

    classifier = KerasClassifier(build_fn=build_model, \

                                 epochs=epochs, \

                                 batch_size=batch_size, \

                                 verbose=1, shuffle=False)

    # define the cross-validation iterator

    kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

                            random_state=seed)

    """

    perform the k-fold cross-validation and store the scores in results

    """

    results = cross_val_score(classifier, X, y, cv=kfold)

  4. For each of the folds, print the accuracy stored in the results parameter:

    # print accuracy for each fold

    for f in range(n_folds):

        print("Test accuracy at fold ", f+1, " = ", results[f])

    print("\n")

    """

    print overall cross-validation accuracy plus the standard deviation of the accuracies

    """

    print("Final Cross-validation Test Accuracy:", results.mean())

    print("Standard Deviation of Final Test Accuracy:", results.std())

    Here's the expected output:

    Test accuracy at fold 1 = 0.5198556184768677

    Test accuracy at fold 2 = 0.4693140685558319

    Test accuracy at fold 3 = 0.512635350227356

    Test accuracy at fold 4 = 0.5740072131156921

    Test accuracy at fold 5 = 0.5523465871810913

    Final Cross-Validation Test Accuracy: 0.5256317675113678

    Standard Deviation of Final Test Accuracy: 0.03584760640500936

    Note

    To access the source code for this specific section, please refer to https://packt.live/3eWgR2b.

    You can also run this example online at https://packt.live/3iBYtOi.

Activity 4.02: Model Selection Using Cross-Validation for the Advanced Fibrosis Diagnosis Classifier

In this activity, we are going to improve our classifier for the hepatitis C dataset by using cross-validation for model selection and hyperparameter selection. Follow these steps to complete this activity:

  1. Import all the required packages and load the dataset. Scale the dataset using the StandardScaler function:

    # import the required packages

    from keras.models import Sequential

    from keras.layers import Dense

    from keras.wrappers.scikit_learn import KerasClassifier

    from sklearn.model_selection import StratifiedKFold

    from sklearn.model_selection import cross_val_score

    import numpy as np

    import pandas as pd

    from sklearn.preprocessing import StandardScaler

    from tensorflow import random

    # Load the dataset

    X = pd.read_csv('../data/HCV_feats.csv')

    y = pd.read_csv('../data/HCV_target.csv')

    sc = StandardScaler()

    X = pd.DataFrame(sc.fit_transform(X), columns=X.columns)

  2. Define three functions, each returning a different Keras model. The first model should have three hidden layers of size 4, the second model should have two hidden layers, the first of size 4 and the second of size 2, and the third model should have two hidden layers of size 8. Use function parameters for the activation functions and optimizers so that they can be passed through to the model. The goal is to find out which of these three models leads to the lowest cross-validation error rate:

    # Create the function that returns the keras model 1

    def build_model_1(activation='relu', optimizer='adam'):

        # create model 1

        model = Sequential()

        model.add(Dense(4, input_dim=X.shape[1], \

                        activation=activation))

        model.add(Dense(4, activation=activation))

        model.add(Dense(4, activation=activation))

        model.add(Dense(1, activation='sigmoid'))

        # Compile model

        model.compile(loss='binary_crossentropy', \

                      optimizer=optimizer, metrics=['accuracy'])

        return model

    # Create the function that returns the keras model 2

    def build_model_2(activation='relu', optimizer='adam'):

        # create model 2

        model = Sequential()

        model.add(Dense(4, input_dim=X.shape[1], \

                        activation=activation))

        model.add(Dense(2, activation=activation))

        model.add(Dense(1, activation='sigmoid'))

        # Compile model

        model.compile(loss='binary_crossentropy', \

                      optimizer=optimizer, metrics=['accuracy'])

        return model

    # Create the function that returns the keras model 3

    def build_model_3(activation='relu', optimizer='adam'):

        # create model 3

        model = Sequential()

        model.add(Dense(8, input_dim=X.shape[1], \

                        activation=activation))

        model.add(Dense(8, activation=activation))

        model.add(Dense(1, activation='sigmoid'))

        # Compile model

        model.compile(loss='binary_crossentropy', \

                      optimizer=optimizer, metrics=['accuracy'])

        return model

    Write the code that will loop over the three models and perform 5-fold cross-validation. Set the seed so that the models are reproducible and define the n_folds, batch_size, and epochs hyperparameters. Store the results from applying the cross_val_score function when training the models:

    """

    define a seed for random number generator so the result will be reproducible

    """

    seed = 2

    np.random.seed(seed)

    random.set_seed(seed)

    """

    determine the number of folds for k-fold cross-validation, number of epochs and batch size

    """

    n_folds = 5

    batch_size=20

    epochs=100

    # define the list to store cross-validation scores

    results_1 = []

    # define the possible options for the model

    models = [build_model_1, build_model_2, build_model_3]

    # loop over models

    for m in range(len(models)):

        # build the scikit-learn interface for the keras model

        classifier = KerasClassifier(build_fn=models[m], \

                                     epochs=epochs, \

                                     batch_size=batch_size, \

                                     verbose=0, shuffle=False)

        # define the cross-validation iterator

        kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

                                random_state=seed)

        """

        perform the k-fold cross-validation and store the scores

        in result

        """

        result = cross_val_score(classifier, X, y, cv=kfold)

        # add the scores to the results list

        results_1.append(result)

    # Print cross-validation score for each model

    for m in range(len(models)):

        print("Model", m+1,"Test Accuracy =", results_1[m].mean())

    Here's an example output. In this instance, Model 2 has the best cross-validation test accuracy, as you can see below:

    Model 1 Test Accuracy = 0.4996389865875244

    Model 2 Test Accuracy = 0.5148014307022095

    Model 3 Test Accuracy = 0.5097472846508027

  3. Choose the model with the highest accuracy score and repeat step 2 by iterating over the epochs = [100, 200] and batches = [10, 20] values and performing 5-fold cross-validation:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # determine the number of folds for k-fold cross-validation

    n_folds = 5

    # define possible options for epochs and batch_size

    epochs = [100, 200]

    batches = [10, 20]

    # define the list to store cross-validation scores

    results_2 = []

    # loop over all possible pairs of epochs, batch_size

    for e in range(len(epochs)):

        for b in range(len(batches)):

            # build the scikit-learn interface for the keras model

            classifier = KerasClassifier(build_fn=build_model_2, \

                                         epochs=epochs[e], \

                                         batch_size=batches[b], \

                                         verbose=0)

            # define the cross-validation iterator

            kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

                                    random_state=seed)

            # perform the k-fold cross-validation.

            # store the scores in result

            result = cross_val_score(classifier, X, y, cv=kfold)

            # add the scores to the results list

            results_2.append(result)

    """

    Print cross-validation score for each possible pair of epochs, batch_size

    """

    c = 0

    for e in range(len(epochs)):

        for b in range(len(batches)):

            print("batch_size =", batches[b],", epochs =", epochs[e], \

                  ", Test Accuracy =", results_2[c].mean())

            c += 1

    Here's an example output:

    batch_size = 10 , epochs = 100 , Test Accuracy = 0.5010830342769623

    batch_size = 20 , epochs = 100 , Test Accuracy = 0.5126353740692139

    batch_size = 10 , epochs = 200 , Test Accuracy = 0.5176895320416497

    batch_size = 20 , epochs = 200 , Test Accuracy = 0.5075812220573426

    In this case, the batch_size= 10, epochs=200 pair has the best cross-validation test accuracy.

  4. Choose the batch size and epochs with the highest accuracy score and repeat step 3 by iterating over the optimizers = ['rmsprop', 'adam','sgd'] and activations = ['relu', 'tanh'] values and performing 5-fold cross-validation:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    """

    determine the number of folds for k-fold cross-validation, number of epochs and batch size

    """

    n_folds = 5

    batch_size = 10

    epochs = 200

    # define the list to store cross-validation scores

    results_3 = []

    # define possible options for optimizer and activation

    optimizers = ['rmsprop', 'adam','sgd']

    activations = ['relu', 'tanh']

    # loop over all possible pairs of optimizer, activation

    for o in range(len(optimizers)):

        for a in range(len(activations)):

            optimizer = optimizers[o]

            activation = activations[a]

            # build the scikit-learn interface for the keras model

            classifier = KerasClassifier(build_fn=build_model_2, \

                                         epochs=epochs, \

                                         batch_size=batch_size, \

                                         verbose=0, shuffle=False)

            # define the cross-validation iterator

            kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

                                    random_state=seed)

            # perform the k-fold cross-validation.

            # store the scores in result

            result = cross_val_score(classifier, X, y, cv=kfold)

            # add the scores to the results list

            results_3.append(result)

    """

    Print cross-validation score for each possible pair of optimizer, activation

    """

    c = 0

    for o in range(len(optimizers)):

        for a in range(len(activations)):

            print("activation = ", activations[a],", optimizer = ", \

                  optimizers[o], ", Test accuracy = ", \

                  results_3[c].mean())

            c += 1

    Here's the expected output:

    activation = relu , optimizer = rmsprop ,

    Test accuracy = 0.5234657049179077

    activation = tanh , optimizer = rmsprop ,

    Test accuracy = 0.49602887630462644

    activation = relu , optimizer = adam ,

    Test accuracy = 0.5039711117744445

    activation = tanh , optimizer = adam ,

    Test accuracy = 0.4989169597625732

    activation = relu , optimizer = sgd ,

    Test accuracy = 0.48953068256378174

    activation = tanh , optimizer = sgd ,

    Test accuracy = 0.5191335678100586

    Here, the activation='relu' and optimizer='rmsprop' pair has the best cross-validation test accuracy. Also, the activation='tanh' and optimizer='sgd' pair results in the second-best performance.

    Note

    To access the source code for this specific section, please refer to https://packt.live/2D3AIhD.

    You can also run this example online at https://packt.live/2NUpiiC.

Activity 4.03: Model Selection Using Cross-validation on a Traffic Volume Dataset

In this activity, you are going to practice model selection using cross-validation one more time. Here, we are going to use a simulated dataset that represents a target variable representing the volume of traffic in cars/hour across a city bridge and various normalized features related to traffic data such as time of day and the traffic volume on the previous day. Our goal is to build a model that predicts the traffic volume across the city bridge given the various features. Follow these steps to complete this activity:

  1. Import all the required packages and load the dataset:

    # import the required packages

    from keras.models import Sequential

    from keras.layers import Dense

    from keras.wrappers.scikit_learn import KerasRegressor

    from sklearn.model_selection import KFold

    from sklearn.model_selection import cross_val_score

    from sklearn.preprocessing import StandardScaler

    from sklearn.pipeline import make_pipeline

    import numpy as np

    import pandas as pd

    from tensorflow import random

  2. Load the dataset, print the input and output size for the feature dataset, and print the possible classes in the target dataset. Also, print the range of the output:

    # Load the dataset

    # Load the dataset

    X = pd.read_csv('../data/traffic_volume_feats.csv')

    y = pd.read_csv('../data/traffic_volume_target.csv')

    # Print the sizes of input data and output data

    print("Input data size = ", X.shape)

    print("Output size = ", y.shape)

    # Print the range for output

    print(f"Output Range = ({y['Volume'].min()}, \

    { y['Volume'].max()})")

    Here's the expected output:

    Input data size = (10000, 10)

    Output size = (10000, 1)

    Output Range = (0.000000, 584.000000)

  3. Define three functions, each returning a different Keras model. The first model should have one hidden layer of size 10, the second model should have two hidden layers of size 10, and the third model should have three hidden layers of size 10. Use function parameters for the optimizers so that they can be passed through to the model. The goal is to find out which of these three models leads to the lowest cross-validation error rate:

    # Create the function that returns the keras model 1

    def build_model_1(optimizer='adam'):

        # create model 1

        model = Sequential()

        model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

        model.add(Dense(1))

        # Compile model

        model.compile(loss='mean_squared_error', optimizer=optimizer)

        return model

    # Create the function that returns the keras model 2

    def build_model_2(optimizer='adam'):

        # create model 2

        model = Sequential()

        model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

        model.add(Dense(10, activation='relu'))

        model.add(Dense(1))

        # Compile model

        model.compile(loss='mean_squared_error', optimizer=optimizer)

        return model

    # Create the function that returns the keras model 3

    def build_model_3(optimizer='adam'):

        # create model 3

        model = Sequential()

        model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

        model.add(Dense(10, activation='relu'))

        model.add(Dense(10, activation='relu'))

        model.add(Dense(1))

        # Compile model

        model.compile(loss='mean_squared_error', optimizer=optimizer)

        return model

  4. Write the code that will loop over the three models and perform 5-fold cross-validation. Set the seed so that the models are reproducible and define the n_folds hyperparameters. Store the results from applying the cross_val_score function when training the models:

    """

    define a seed for random number generator so the result will be reproducible

    """

    seed = 1

    np.random.seed(seed)

    random.set_seed(seed)

    # determine the number of folds for k-fold cross-validation

    n_folds = 5

    # define the list to store cross-validation scores

    results_1 = []

    # define the possible options for the model

    models = [build_model_1, build_model_2, build_model_3]

    # loop over models

    for i in range(len(models)):

        # build the scikit-learn interface for the keras model

        regressor = KerasRegressor(build_fn=models[i], epochs=100, \

                                   batch_size=50, verbose=0, \

                                   shuffle=False)

        """

        build the pipeline of transformations so for each fold training

        set will be scaled and test set will be scaled accordingly.

        """

        model = make_pipeline(StandardScaler(), regressor)

        # define the cross-validation iterator

        kfold = KFold(n_splits=n_folds, shuffle=True, \

                      random_state=seed)

        # perform the k-fold cross-validation.

        # store the scores in result

        result = cross_val_score(model, X, y, cv=kfold)

        # add the scores to the results list

        results_1.append(result)

    # Print cross-validation score for each model

    for i in range(len(models)):

        print("Model ", i+1," test error rate = ", \

              abs(results_1[i].mean()))

    The following is the expected output:

    Model 1 test error rate = 25.48777518749237

    Model 2 test error rate = 25.30460816860199

    Model 3 test error rate = 25.390239462852474

    Model 2 (a two-layer neural network) has the lowest test error rate.

  5. Choose the model with the lowest test error rate and repeat step 4 while iterating over epochs = [80, 100] and batches = [50, 25] and performing 5-fold cross-validation:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # determine the number of folds for k-fold cross-validation

    n_folds = 5

    # define the list to store cross-validation scores

    results_2 = []

    # define possible options for epochs and batch_size

    epochs = [80, 100]

    batches = [50, 25]

    # loop over all possible pairs of epochs, batch_size

    for i in range(len(epochs)):

        for j in range(len(batches)):

            # build the scikit-learn interface for the keras model

            regressor = KerasRegressor(build_fn=build_model_2, \

                                       epochs=epochs[i], \

                                       batch_size=batches[j], \

                                       verbose=0, shuffle=False)

            """

            build the pipeline of transformations so for each fold

            training set will be scaled and test set will be scaled

            accordingly.

            """

            model = make_pipeline(StandardScaler(), regressor)

            # define the cross-validation iterator

            kfold = KFold(n_splits=n_folds, shuffle=True, \

                          random_state=seed)

            # perform the k-fold cross-validation.

            # store the scores in result

            result = cross_val_score(model, X, y, cv=kfold)

            # add the scores to the results list

            results_2.append(result)

    """

    Print cross-validation score for each possible pair of epochs, batch_size

    """

    c = 0

    for i in range(len(epochs)):

        for j in range(len(batches)):

            print("batch_size = ", batches[j],\

                  ", epochs = ", epochs[i], \

                  ", Test error rate = ", abs(results_2[c].mean()))

            c += 1

    Here's the expected output:

    batch_size = 50 , epochs = 80 , Test error rate = 25.270704221725463

    batch_size = 25 , epochs = 80 , Test error rate = 25.309741401672362

    batch_size = 50 , epochs = 100 , Test error rate = 25.095393986701964

    batch_size = 25 , epochs = 100 , Test error rate = 25.24592453837395

    The batch_size=5 and epochs=100 pair has the lowest test error rate.

  6. Choose the model with the highest accuracy score and repeat step 2 by iterating over optimizers = ['rmsprop', 'sgd', 'adam'] and performing 5-fold cross-validation:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # determine the number of folds for k-fold cross-validation

    n_folds = 5

    # define the list to store cross-validation scores

    results_3 = []

    # define the possible options for the optimizer

    optimizers = ['adam', 'sgd', 'rmsprop']

    # loop over optimizers

    for i in range(len(optimizers)):

        optimizer=optimizers[i]

        # build the scikit-learn interface for the keras model

        regressor = KerasRegressor(build_fn=build_model_2, \

                                   epochs=100, batch_size=50, \

                                   verbose=0, shuffle=False)

        """

        build the pipeline of transformations so for each fold training

        set will be scaled and test set will be scaled accordingly.

        """

        model = make_pipeline(StandardScaler(), regressor)

        # define the cross-validation iterator

        kfold = KFold(n_splits=n_folds, shuffle=True, \

                      random_state=seed)

        # perform the k-fold cross-validation.

        # store the scores in result

        result = cross_val_score(model, X, y, cv=kfold)

        # add the scores to the results list

        results_3.append(result)

    # Print cross-validation score for each optimizer

    for i in range(len(optimizers)):

        print("optimizer=", optimizers[i]," test error rate = ", \

              abs(results_3[i].mean()))

    Here's the expected output:

    optimizer= adam test error rate = 25.391812739372256

    optimizer= sgd test error rate = 25.140230269432067

    optimizer= rmsprop test error rate = 25.217947859764102

    optimizer='sgd' has the lowest test error rate, so we should proceed with this particular model.

    Note

    To access the source code for this specific section, please refer to https://packt.live/31TcYaD.

    You can also run this example online at https://packt.live/3iq6iqb.

5. Improving Model Accuracy

Activity 5.01: Weight Regularization on an Avila Pattern Classifier

In this activity, you will build a Keras model to perform classification on the Avila pattern dataset according to given network architecture and hyperparameter values. The goal is to apply different types of weight regularization on the model, that is, L1 and L2, and observe how each type changes the result. Follow these steps to complete this activity:

  1. Load the dataset and split the dataset into a training set and a test set:

    # Load the dataset

    import pandas as pd

    X = pd.read_csv('../data/avila-tr_feats.csv')

    y = pd.read_csv('../data/avila-tr_target.csv')

    """

    Split the dataset into training set and test set with a 0.8-0.2 ratio

    """

    from sklearn.model_selection import train_test_split

    seed = 1

    X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.2, random_state=seed)

  2. Define a Keras sequential model with three hidden layers, the first of size 10, the second of size 6, and the third of size 4. Finally, compile the model:

    """

    define a seed for random number generator so the result will be reproducible

    """

    import numpy as np

    from tensorflow import random

    np.random.seed(seed)

    random.set_seed(seed)

    # define the keras model

    from keras.models import Sequential

    from keras.layers import Dense

    model_1 = Sequential()

    model_1.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu'))

    model_1.add(Dense(6, activation='relu'))

    model_1.add(Dense(4, activation='relu'))

    model_1.add(Dense(1, activation='sigmoid'))

    model_1.compile(loss='binary_crossentropy', optimizer='sgd', \

                    metrics=['accuracy'])

  3. Fit the model to the training data to perform the classification, saving the results of the training process:

    history=model_1.fit(X_train, y_train, batch_size = 20, epochs = 100, \

                        validation_data=(X_test, y_test), \

                        verbose=0, shuffle=False)

  4. Plot the trends in training error and test error by importing the necessary libraries for plotting the loss and validation loss and saving them in the variable that was created when the model was fit to the training process. Print out the maximum validation accuracy:

    import matplotlib.pyplot as plt

    import matplotlib

    %matplotlib inline

    # plot training error and test error

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim(0,1)

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Best Accuracy on Validation Set =", \

          max(history.history['val_accuracy']))

    The following is the expected output:

    Figure 5.13: A plot of the training error and validation error during training for the model without regularization

    The validation loss keeps decreasing along with the training loss. Despite having no regularization, this is a fairly good example of the training process since the bias and variance are fairly low.

  5. Redefine the model, adding L2 regularizers with lambda=0.01 to each hidden layer of the model. Repeat steps 3 and 4 to train the model and plot the training error and validation error:

    """

    set up a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # define the keras model with l2 regularization with lambda = 0.01

    from keras.regularizers import l2

    l2_param = 0.01

    model_2 = Sequential()

    model_2.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_2.add(Dense(6, activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_2.add(Dense(4, activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_2.add(Dense(1, activation='sigmoid'))

    model_2.compile(loss='binary_crossentropy', optimizer='sgd', \

                    metrics=['accuracy'])

    # train the model using training set while evaluating on test set

    history=model_2.fit(X_train, y_train, batch_size = 20, epochs = 100, \

                        validation_data=(X_test, y_test), \

                        verbose=0, shuffle=False)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim(0,1)

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Best Accuracy on Validation Set =", \

          max(history.history['val_accuracy']))

    The following is the expected output:

    Figure 5.14: A plot of the training error and validation error during training for the model with L2 weight regularization (lambda=0.01)

    As shown from the preceding plots, the test error almost plateaus after being decreased to a certain amount. The gap between the training error and the validation error at the end of the training process (the bias) is slightly smaller, which is indicative of reduced overfitting of the model for the training examples.

  6. Repeat the previous step with lambda=0.1 for the L2 parameter—redefine the model with the new lambda parameter, fit the model to the training data, and repeat step 4 to plot the training error and validation error:

    """

    set up a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    from keras.regularizers import l2

    l2_param = 0.1

    model_3 = Sequential()

    model_3.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_3.add(Dense(6, activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_3.add(Dense(4, activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_3.add(Dense(1, activation='sigmoid'))

    model_3.compile(loss='binary_crossentropy', optimizer='sgd', \

                    metrics=['accuracy'])

    # train the model using training set while evaluating on test set

    history=model_3.fit(X_train, y_train, batch_size = 20, \

                        epochs = 100, validation_data=(X_test, y_test), \

                        verbose=0, shuffle=False)

    # plot training error and test error

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim(0,1)

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Best Accuracy on Validation Set =", \

          max(history.history['val_accuracy']))

    The following is the expected output:

    Figure 5.15: A plot of the training error and validation error during training for the model with L2 weight regularization (lambda=0.1)

    The training and validation error quickly plateau and are much higher than they were for the models we created with a lower L2 parameter, indicating that we have penalized the model so much that it has not had the flexibility to learn the underlying function of the training data. Following this, we will reduce the value of the regularization parameter to prevent it from penalizing the model as much.

  7. Repeat the previous step, this time with lambda=0.005. Repeat step 4 to plot the training error and validation error:

    """

    set up a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # define the keras model with l2 regularization with lambda = 0.05

    from keras.regularizers import l2

    l2_param = 0.005

    model_4 = Sequential()

    model_4.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_4.add(Dense(6, activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_4.add(Dense(4, activation='relu', \

                      kernel_regularizer=l2(l2_param)))

    model_4.add(Dense(1, activation='sigmoid'))

    model_4.compile(loss='binary_crossentropy', optimizer='sgd', \

                    metrics=['accuracy'])

    # train the model using training set while evaluating on test set

    history=model_4.fit(X_train, y_train, batch_size = 20, \

                        epochs = 100, validation_data=(X_test, y_test), \

                        verbose=0, shuffle=False)

    # plot training error and test error

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim(0,1)

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Best Accuracy on Validation Set =", \

          max(history.history['val_accuracy']))

    The following is the expected output:

    Figure 5.16: A plot of the training error and validation error during training for the model with L2 weight regularization (lambda=0.005)

    The value for the L2 weight regularization achieves the highest accuracy that was evaluated on the validation data of all the models with L2 regularization, but it is slightly lower than without regularization. Again, the test error does not increase a significant amount after being decreased to a certain value, which is indicative of the model not overfitting the training examples. It seems that L2 weight regularization with lambda=0.005 achieves the lowest validation error while preventing the model from overfitting.

  8. Add L1 regularizers with lambda=0.01 to the hidden layers of your model. Redefine the model with the new lambda parameter, fit the model to the training data, and repeat step 4 to plot the training error and validation error:

    """

    set up a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # define the keras model with l1 regularization with lambda = 0.01

    from keras.regularizers import l1

    l1_param = 0.01

    model_5 = Sequential()

    model_5.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu', \

                      kernel_regularizer=l1(l1_param)))

    model_5.add(Dense(6, activation='relu', \

                      kernel_regularizer=l1(l1_param)))

    model_5.add(Dense(4, activation='relu', \

                      kernel_regularizer=l1(l1_param)))

    model_5.add(Dense(1, activation='sigmoid'))

    model_5.compile(loss='binary_crossentropy', optimizer='sgd', \

                    metrics=['accuracy'])

    # train the model using training set while evaluating on test set

    history=model_5.fit(X_train, y_train, batch_size = 20, \

                        epochs = 100, validation_data=(X_test, y_test), \

                        verbose=0, shuffle=True)

    # plot training error and test error

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim(0,1)

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Best Accuracy on Validation Set =", \

          max(history.history['val_accuracy']))

    The following is the expected output:

    Figure 5.17: A plot of the training error and validation error during training for the model with L1 weight regularization (lambda=0.01)

  9. Repeat the previous step with lambda=0.005 for the L1 parameter—redefine the model with the new lambda parameter, fit the model to the training data, and repeat step 4 to plot the training error and validation error:

    """

    set up a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # define the keras model with l1 regularization with lambda = 0.1

    from keras.regularizers import l1

    l1_param = 0.005

    model_6 = Sequential()

    model_6.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu', \

                      kernel_regularizer=l1(l1_param)))

    model_6.add(Dense(6, activation='relu', \

                      kernel_regularizer=l1(l1_param)))

    model_6.add(Dense(4, activation='relu', \

                      kernel_regularizer=l1(l1_param)))

    model_6.add(Dense(1, activation='sigmoid'))

    model_6.compile(loss='binary_crossentropy', optimizer='sgd', \

                    metrics=['accuracy'])

    # train the model using training set while evaluating on test set

    history=model_6.fit(X_train, y_train, batch_size = 20, \

                        epochs = 100, validation_data=(X_test, y_test), \

                        verbose=0, shuffle=False)

    # plot training error and test error

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim(0,1)

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Best Accuracy on Validation Set =", \

           max(history.history['val_accuracy']))

    The following is the expected output:

    Figure 5.18: The plot of the training error and validation error during training for the model with L1 weight regularization (lambda=0.005)

    It seems that L1 weight regularization with lambda=0.005 achieves a better test error while preventing the model from overfitting since the value of lambda=0.01 is too restrictive and prevents the model from learning the underlying function of the training data.

  10. Add L1 and L2 regularizers with an L1 of lambda=0.005 and an L2 of lambda = 0.005 to the hidden layers of your model. Then, repeat step 4 to plot the training error and validation error:

    """

    set up a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    """

    define the keras model with l1_l2 regularization with l1_lambda = 0.005 and l2_lambda = 0.005

    """

    from keras.regularizers import l1_l2

    l1_param = 0.005

    l2_param = 0.005

    model_7 = Sequential()

    model_7.add(Dense(10, input_dim=X_train.shape[1], \

                activation='relu', \

                kernel_regularizer=l1_l2(l1=l1_param, l2=l2_param)))

    model_7.add(Dense(6, activation='relu', \

                      kernel_regularizer=l1_l2(l1=l1_param, \

                                               l2=l2_param)))

    model_7.add(Dense(4, activation='relu', \

                      kernel_regularizer=l1_l2(l1=l1_param, \

                                               l2=l2_param)))

    model_7.add(Dense(1, activation='sigmoid'))

    model_7.compile(loss='binary_crossentropy', optimizer='sgd', \

                    metrics=['accuracy'])

    # train the model using training set while evaluating on test set

    history=model_7.fit(X_train, y_train, batch_size = 20, \

                        epochs = 100, validation_data=(X_test, y_test), \

                        verbose=0, shuffle=True)

    # plot training error and test error

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim(0,1)

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Best Accuracy on Validation Set =", \

           max(history.history['val_accuracy']))

    The following is the expected output:

Figure 5.19: A plot of the training error and validation error during training for the model with L1 lambda equal to 0.005 and L2 lambda equal to 0.005

While L1 and L2 regularization are successful in preventing the model from overfitting, the variance in the model is very low. However, the accuracy that's obtained on the validation data is not as high as the model that was trained with no regularization or the model that was trained with the L2 regularization lambda=0.005 or L1 regularization lambda=0.005 parameters individually.

Note

To access the source code for this specific section, please refer to https://packt.live/31BUf34.

You can also run this example online at https://packt.live/38n291s.

Activity 5.02: Dropout Regularization on the Traffic Volume Dataset

In this activity, you will start with the model from Activity 4.03, Model Selection Using Cross-Validation on a Traffic Volume Dataset, of Chapter 4, Evaluating Your Model with Cross-Validation Using Keras Wrappers. You will use the training set/test set approach to train and evaluate the model, plot the trends in training error and the generalization error, and observe the model overfitting the data examples. Then, you will attempt to improve model performance by addressing the overfitting issue through the use of dropout regularization. In particular, you will try to find out which layers you should add dropout regularization to and what rate value will improve this specific model the most. Follow these steps to complete this exercise:

  1. Load the dataset using the pandas read_csv function, split the dataset into a training set and test set into an 80-20 ratio using train_test_split, and scale the input data using StandardScaler:

    # Load the dataset

    import pandas as pd

    X = pd.read_csv('../data/traffic_volume_feats.csv')

    y = pd.read_csv('../data/traffic_volume_target.csv')

    """

    Split the dataset into training set and test set with an 80-20 ratio

    """

    from sklearn.model_selection import train_test_split

    seed=1

    X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.2, random_state=seed)

  2. Set a seed so that the model can be reproduced. Next, define a Keras sequential model with two hidden layers of size 10, both with ReLU activation functions. Add an output layer with no activation function and compile the model with the given hyperparameters:

    """

    define a seed for random number generator so the result will be reproducible

    """

    import numpy as np

    from tensorflow import random

    np.random.seed(seed)

    random.set_seed(seed)

    from keras.models import Sequential

    from keras.layers import Dense

    # create model

    model_1 = Sequential()

    model_1.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu'))

    model_1.add(Dense(10, activation='relu'))

    model_1.add(Dense(1))

    # Compile model

    model_1.compile(loss='mean_squared_error', optimizer='rmsprop')

  3. Train the model on the training data with the given hyperparameters:

    # train the model using training set while evaluating on test set

    history=model_1.fit(X_train, y_train, batch_size = 50, \

                        epochs = 200, validation_data=(X_test, y_test), \

                        verbose=0)

  4. Plot the trends for the training error and test error. Print the best accuracy that was reached for the training and validation set:

    import matplotlib.pyplot as plt

    import matplotlib

    %matplotlib inline

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    # plot training error and test error plots

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim((0, 25000))

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Lowest error on training set = ", \

          min(history.history['loss']))

    print("Lowest error on validation set = ", \

          min(history.history['val_loss']))

    The following is the expected output:

    Lowest error on training set = 24.673954981565476

    Lowest error on validation set = 25.11553382873535

    Figure 5.20: A plot of the training error and validation error during training for the model without regularization

    In the training error and validation error values, there is a very small gap between the training error and validation error, which is indicative of a low variance model, which is good.

  5. Redefine the model by creating the same model architecture. However, this time, add a dropout regularization with rate=0.1 to the first hidden layer of your model. Repeat step 3 to train the model on the training data and repeat step 4 to plot the trends for the training and validation errors. Then, print the best accuracy that was reached on the validation set:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    from keras.layers import Dropout

    # create model

    model_2 = Sequential()

    model_2.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu'))

    model_2.add(Dropout(0.1))

    model_2.add(Dense(10, activation='relu'))

    model_2.add(Dense(1))

    # Compile model

    model_2.compile(loss='mean_squared_error', \

                    optimizer='rmsprop')

    # train the model using training set while evaluating on test set

    history=model_2.fit(X_train, y_train, batch_size = 50, \

                        epochs = 200, validation_data=(X_test, y_test), \

                        verbose=0, shuffle=False)

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim((0, 25000))

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Lowest error on training set = ", \

          min(history.history['loss']))

    print("Lowest error on validation set = ", \

          min(history.history['val_loss']))

    The following is the expected output:

    Lowest error on training set = 407.8203821182251

    Lowest error on validation set = 54.58488750457764

    Figure 5.21: A plot of the training error and validation error during training for the model with dropout regularization (rate=0.1) in the first layer

    There is a small gap between the training error and the validation error; however, the validation error is lower than the training error, indicating that the model is not overfitting the training data.

  6. Repeat the previous step, this time adding dropout regularization with rate=0.1 to both hidden layers of your model. Repeat step 3 to train the model on the training data and repeat step 4 to plot the trends for the training and validation errors. Then, print the best accuracy that was reached on the validation set:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # create model

    model_3 = Sequential()

    model_3.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu'))

    model_3.add(Dropout(0.1))

    model_3.add(Dense(10, activation='relu'))

    model_3.add(Dropout(0.1))

    model_3.add(Dense(1))

    # Compile model

    model_3.compile(loss='mean_squared_error', \

                    optimizer='rmsprop')

    # train the model using training set while evaluating on test set

    history=model_3.fit(X_train, y_train, batch_size = 50, \

                        epochs = 200, validation_data=(X_test, y_test), \

                        verbose=0, shuffle=False)

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim((0, 25000))

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Lowest error on training set = ", \

          min(history.history['loss']))

    print("Lowest error on validation set = ", \

          min(history.history['val_loss']))

    The following is the expected output:

    Lowest error on training set = 475.9299939632416

    Lowest error on validation set = 61.646054649353026

    Figure 5.22: A plot of the training error and validation error during training for the model with dropout regularization (rate=0.1) in both layers

    The gap between the training error and validation error is slightly higher here, mostly due to the increase in the training error as a result of the additional regularization on the second hidden layer of the model.

  7. Repeat the previous step, this time adding dropout regularization with rate=0.2 in the first layer and rate=0.1 in the second layer of your model. Repeat step 3 to train the model on the training data and repeat step 4 to plot the trends for the training and validation errors. Then, print the best accuracy that was reached on the validation set:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # create model

    model_4 = Sequential()

    model_4.add(Dense(10, input_dim=X_train.shape[1], \

                      activation='relu'))

    model_4.add(Dropout(0.2))

    model_4.add(Dense(10, activation='relu'))

    model_4.add(Dropout(0.1))

    model_4.add(Dense(1))

    # Compile model

    model_4.compile(loss='mean_squared_error', optimizer='rmsprop')

    # train the model using training set while evaluating on test set

    history=model_4.fit(X_train, y_train, batch_size = 50, epochs = 200, \

                        validation_data=(X_test, y_test), verbose=0)

    matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

    plt.plot(history.history['loss'])

    plt.plot(history.history['val_loss'])

    plt.ylim((0, 25000))

    plt.ylabel('loss')

    plt.xlabel('epoch')

    plt.legend(['train loss', 'validation loss'], loc='upper right')

    # print the best accuracy reached on the test set

    print("Lowest error on training set = ", \

          min(history.history['loss']))

    print("Lowest error on validation set = ", \

          min(history.history['val_loss']))

    The following is the expected output:

    Lowest error on training set = 935.1562484741211

    Lowest error on validation set = 132.39965686798095

Figure 5.23: A plot of training errors and validation errors while training the model with dropout regularization, with rate=0.2 in the first layer and rate 0.1 in the second layer

The gap between the training error and validation error is slightly larger due to the increase in regularization. In this case, there was no overfitting in the original model. As a result, regularization increased the error rate on the training and validation dataset.

Note

To access the source code for this specific section, please refer to https://packt.live/38mtDo7.

You can also run this example online at https://packt.live/31Isdmu.

Activity 5.03: Hyperparameter Tuning on the Avila Pattern Classifier

In this activity, you will build a Keras model similar to those in the previous activities, but this time, you will add regularization methods to your model as well. Then, you will use scikit-learn optimizers to perform tuning on the model hyperparameters, including the hyperparameters of the regularizers. Follow these steps to complete this activity:

  1. Load the dataset and import the libraries:

    # Load The dataset

    import pandas as pd

    X = pd.read_csv('../data/avila-tr_feats.csv')

    y = pd.read_csv('../data/avila-tr_target.csv')

  2. Define a function that returns a Keras model with three hidden layers, the first of size 10, the second of size 6, and the third of size 4, and apply L2 weight regularization and a ReLU activation function on each hidden layer. Compile the model with the given parameters and return it from the model:

    # Create the function that returns the keras model

    from keras.models import Sequential

    from keras.layers import Dense

    from keras.regularizers import l2

    def build_model(lambda_parameter):

        model = Sequential()

        model.add(Dense(10, input_dim=X.shape[1], \

                        activation='relu', \

                        kernel_regularizer=l2(lambda_parameter)))

        model.add(Dense(6, activation='relu', \

                        kernel_regularizer=l2(lambda_parameter)))

        model.add(Dense(4, activation='relu', \

                        kernel_regularizer=l2(lambda_parameter)))

        model.add(Dense(1, activation='sigmoid'))

        model.compile(loss='binary_crossentropy', \

                      optimizer='sgd', metrics=['accuracy'])

        return model

  3. Set a seed, use a scikit-learn wrapper to wrap the model that we created in the previous step, and define the hyperparameters to scan. Finally, perform GridSearchCV() on the model using the hyperparameter's grid and fit the model:

    from keras.wrappers.scikit_learn import KerasClassifier

    from sklearn.model_selection import GridSearchCV

    """

    define a seed for random number generator so the result will be reproducible

    """

    import numpy as np

    from tensorflow import random

    seed = 1

    np.random.seed(seed)

    random.set_seed(seed)

    # create the Keras wrapper with scikit learn

    model = KerasClassifier(build_fn=build_model, verbose=0, \

                            shuffle=False)

    # define all the possible values for each hyperparameter

    lambda_parameter = [0.01, 0.5, 1]

    epochs = [50, 100]

    batch_size = [20]

    """

    create the dictionary containing all possible values of hyperparameters

    """

    param_grid = dict(lambda_parameter=lambda_parameter, \

                      epochs=epochs, batch_size=batch_size)

    # perform 5-fold cross-validation for ??????? store the results

    grid_seach = GridSearchCV(estimator=model, \

                              param_grid=param_grid, cv=5)

    results_1 = grid_seach.fit(X, y)

  4. Print the results for the best cross-validation score that's stored within the variable we created in the fit process. Iterate through all the parameters and print the mean of the accuracy across all the folds, the standard deviation of the accuracy, and the parameters themselves:

    print("Best cross-validation score =", results_1.best_score_)

    print("Parameters for Best cross-validation score=", \

          results_1.best_params_)

    # print the results for all evaluated hyperparameter combinations

    accuracy_means = results_1.cv_results_['mean_test_score']

    accuracy_stds = results_1.cv_results_['std_test_score']

    parameters = results_1.cv_results_['params']

    for p in range(len(parameters)):

        print("Accuracy %f (std %f) for params %r" % \

              (accuracy_means[p], accuracy_stds[p], parameters[p]))

    The following is the expected output:

    Best cross-validation score = 0.7673058390617371

    Parameters for Best cross-validation score= {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.01}

    Accuracy 0.764621 (std 0.004330) for params {'batch_size': 20,

    'epochs': 50, 'lambda_parameter': 0.01}

    Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

    'epochs': 50, 'lambda_parameter': 0.5}

    Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

    'epochs': 50, 'lambda_parameter': 1}

    Accuracy 0.767306 (std 0.015872) for params {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.01}

    Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.5}

    Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 1}

  5. Repeat step 3 using GridSearchCV(), lambda_parameter = [0.001, 0.01, 0.05, 0.1], batch_size = [20], and epochs = [100]. Fit the model to the training data using 5-fold cross-validation and print the results for the entire grid:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # create the Keras wrapper with scikit learn

    model = KerasClassifier(build_fn=build_model, verbose=0, shuffle=False)

    # define all the possible values for each hyperparameter

    lambda_parameter = [0.001, 0.01, 0.05, 0.1]

    epochs = [100]

    batch_size = [20]

    """

    create the dictionary containing all possible values of hyperparameters

    """

    param_grid = dict(lambda_parameter=lambda_parameter, \

                      epochs=epochs, batch_size=batch_size)

    """

    search the grid, perform 5-fold cross-validation for each possible combination, store the results

    """

    grid_seach = GridSearchCV(estimator=model, \

                              param_grid=param_grid, cv=5)

    results_2 = grid_seach.fit(X, y)

    # print the results for best cross-validation score

    print("Best cross-validation score =", results_2.best_score_)

    print("Parameters for Best cross-validation score =", \

          results_2.best_params_)

    # print the results for the entire grid

    accuracy_means = results_2.cv_results_['mean_test_score']

    accuracy_stds = results_2.cv_results_['std_test_score']

    parameters = results_2.cv_results_['params']

    for p in range(len(parameters)):

        print("Accuracy %f (std %f) for params %r" % \

              (accuracy_means[p], accuracy_stds[p], parameters[p]))

    The following is the expected output:

    Best cross-validation score = 0.786385428905487

    Parameters for Best cross-validation score = {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.001}

    Accuracy 0.786385 (std 0.010177) for params {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.001}

    Accuracy 0.693960 (std 0.084994) for params {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.01}

    Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.05}

    Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

    'epochs': 100, 'lambda_parameter': 0.1}

  6. Redefine a function that returns a Keras model with three hidden layers, the first of size 10, the second of size 6, and the third of size 4, and apply dropout regularization and a ReLU activation function on each hidden layer. Compile the model with the given parameters and return it from the function:

    # Create the function that returns the keras model

    from keras.layers import Dropout

    def build_model(rate):

        model = Sequential()

        model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

        model.add(Dropout(rate))

        model.add(Dense(6, activation='relu'))

        model.add(Dropout(rate))

        model.add(Dense(4, activation='relu'))

        model.add(Dropout(rate))

        model.add(Dense(1, activation='sigmoid'))

        model.compile(loss='binary_crossentropy', \

                      optimizer='sgd', metrics=['accuracy'])

        return model

  7. Use rate = [0, 0.1, 0.2] and epochs = [50, 100] and perform GridSearchCV() on the model. Fit the model to the training data using 5-fold cross-validation and print the results for the entire grid:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # create the Keras wrapper with scikit learn

    model = KerasClassifier(build_fn=build_model, verbose=0,shuffle=False)

    # define all the possible values for each hyperparameter

    rate = [0, 0.1, 0.2]

    epochs = [50, 100]

    batch_size = [20]

    """

    create the dictionary containing all possible values of hyperparameters

    """

    param_grid = dict(rate=rate, epochs=epochs, batch_size=batch_size)

    """

    perform 5-fold cross-validation for 10 randomly selected combinations, store the results

    """

    grid_seach = GridSearchCV(estimator=model, \

                              param_grid=param_grid, cv=5)

    results_3 = grid_seach.fit(X, y)

    # print the results for best cross-validation score

    print("Best cross-validation score =", results_3.best_score_)

    print("Parameters for Best cross-validation score =", \

          results_3.best_params_)

    # print the results for the entire grid

    accuracy_means = results_3.cv_results_['mean_test_score']

    accuracy_stds = results_3.cv_results_['std_test_score']

    parameters = results_3.cv_results_['params']

    for p in range(len(parameters)):

        print("Accuracy %f (std %f) for params %r" % \

              (accuracy_means[p], accuracy_stds[p], parameters[p]))

    The following is the expected output:

    Best cross-validation score= 0.7918504476547241

    Parameters for Best cross-validation score= {'batch_size': 20,

    'epochs': 100, 'rate': 0}

    Accuracy 0.786769 (std 0.008255) for params {'batch_size': 20,

    'epochs': 50, 'rate': 0}

    Accuracy 0.764717 (std 0.007691) for params {'batch_size': 20,

    'epochs': 50, 'rate': 0.1}

    Accuracy 0.752637 (std 0.013546) for params {'batch_size': 20,

    'epochs': 50, 'rate': 0.2}

    Accuracy 0.791850 (std 0.008519) for params {'batch_size': 20,

    'epochs': 100, 'rate': 0}

    Accuracy 0.779291 (std 0.009504) for params {'batch_size': 20,

    'epochs': 100, 'rate': 0.1}

    Accuracy 0.767306 (std 0.005773) for params {'batch_size': 20,

    'epochs': 100, 'rate': 0.2}

  8. Repeat step 5 using rate = [0.0, 0.05, 0.1] and epochs = [100]. Fit the model to the training data using 5-fold cross-validation and print the results for the entire grid:

    """

    define a seed for random number generator so the result will be reproducible

    """

    np.random.seed(seed)

    random.set_seed(seed)

    # create the Keras wrapper with scikit learn

    model = KerasClassifier(build_fn=build_model, verbose=0, shuffle=False)

    # define all the possible values for each hyperparameter

    rate = [0.0, 0.05, 0.1]

    epochs = [100]

    batch_size = [20]

    """

    create the dictionary containing all possible values of hyperparameters

    """

    param_grid = dict(rate=rate, epochs=epochs, batch_size=batch_size)

    """

    perform 5-fold cross-validation for 10 randomly selected combinations, store the results

    """

    grid_seach = GridSearchCV(estimator=model, \

                              param_grid=param_grid, cv=5)

    results_4 = grid_seach.fit(X, y)

    # print the results for best cross-validation score

    print("Best cross-validation score =", results_4.best_score_)

    print("Parameters for Best cross-validation score =", \

          results_4.best_params_)

    # print the results for the entire grid

    accuracy_means = results_4.cv_results_['mean_test_score']

    accuracy_stds = results_4.cv_results_['std_test_score']

    parameters = results_4.cv_results_['params']

    for p in range(len(parameters)):

        print("Accuracy %f (std %f) for params %r" % \

              (accuracy_means[p], accuracy_stds[p], parameters[p]))

    The following is the expected output:

    Best cross-validation score= 0.7862895488739013

    Parameters for Best cross-validation score= {'batch_size': 20,

    'epochs': 100, 'rate': 0.0}

    Accuracy 0.786290 (std 0.013557) for params {'batch_size': 20,

    'epochs': 100, 'rate': 0.0}

    Accuracy 0.786098 (std 0.005184) for params {'batch_size': 20,

    'epochs': 100, 'rate': 0.05}

    Accuracy 0.772004 (std 0.013733) for params {'batch_size': 20,

    'epochs': 100, 'rate': 0.1}

    Note

    To access the source code for this specific section, please refer to https://packt.live/2D7HN0L.

    This section does not currently have an online interactive example and will need to be run locally.

6. Model Evaluation

Activity 6.01: Computing the Accuracy and Null Accuracy of a Neural Network When We Change the Train/Test Split

In this activity, we will see that our null accuracy and accuracy will be affected by changing the train/test split. To implement this, the part of the code where the train/test split was defined has to be changed. We will use the same dataset that we used in Exercise 6.02, Computing Accuracy and Null Accuracy with APS Failure for Scania Trucks Data. Follow these steps to complete this activity:

  1. Import the required libraries. Load the dataset using the pandas read_csv function and look at the first five rows of the dataset:

    # Import the libraries

    import numpy as np

    import pandas as pd

    # Load the Data

    X = pd.read_csv("../data/aps_failure_training_feats.csv")

    y = pd.read_csv("../data/aps_failure_training_target.csv")

    # Use the head function to get a glimpse data

    X.head()

    The following table shows the output of the preceding code:

    Figure 6.13: Initial five rows of the dataset

  2. Change the test_size and random_state from 0.20 to 0.3 and 42 to 13, respectively:

    # Split the data into training and testing sets

    from sklearn.model_selection import train_test_split

    seed = 13

    X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.3, random_state=seed)

    Note

    If you use a different random_state, you may get a different train/test split, which may yield slightly different final results.

  3. Scale the data using the StandardScaler function and use the scaler to scale the test data. Convert both into pandas DataFrames:

    # Initialize StandardScaler

    from sklearn.preprocessing import StandardScaler

    sc = StandardScaler()

    # Transform the training data

    X_train = sc.fit_transform(X_train)

    X_train = pd.DataFrame(X_train, columns=X_test.columns)

    # Transform the testing data

    X_test = sc.transform(X_test)

    X_test = pd.DataFrame(X_test, columns = X_train.columns)

    Note

    The sc.fit_transform() function transforms the data, and the data is also converted into a NumPy array. We may need the data later for analysis as a DataFrame object, so the pd.DataFrame() function reconverts data into a DataFrame.

  4. Import the libraries that are required to build a neural network architecture:

    # Import the relevant Keras libraries

    from keras.models import Sequential

    from keras.layers import Dense

    from keras.layers import Dropout

    from tensorflow import random

  5. Initiate the Sequential class:

    # Initiate the Model with Sequential Class

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

  6. Add five Dense layers to the network with Dropout. Set the first hidden layer so that it has a size of 64 with a dropout rate of 0.5, the second hidden layer so that it has a size of 32 with a dropout rate of 0.4, the third hidden layer so that it has a size of 16 with a dropout rate of 0.3, the fourth hidden layer so that it has a size of 8 with a dropout rate of 0.2, and the final hidden layer so that it has a size of 4 with a dropout rate of 0.1. Set all the activation functions to ReLU:

    # Add the hidden dense layers and with dropout Layer

    model.add(Dense(units=64, activation='relu', \

                    kernel_initializer='uniform', \

                    input_dim=X_train.shape[1]))

    model.add(Dropout(rate=0.5))

    model.add(Dense(units=32, activation='relu', \

                    kernel_initializer='uniform', \

                    input_dim=X_train.shape[1]))

    model.add(Dropout(rate=0.4))

    model.add(Dense(units=16, activation='relu', \

                    kernel_initializer='uniform', \

                    input_dim=X_train.shape[1]))

    model.add(Dropout(rate=0.3))

    model.add(Dense(units=8, activation='relu', \

                    kernel_initializer='uniform', \

                    input_dim=X_train.shape[1]))

    model.add(Dropout(rate=0.2))

    model.add(Dense(units=4, activation='relu', \

                    kernel_initializer='uniform'))

    model.add(Dropout(rate=0.1))

  7. Add an output Dense layer with a sigmoid activation function:

    # Add Output Dense Layer

    model.add(Dense(units=1, activation='sigmoid', \

                    kernel_initializer='uniform'))

    Note

    Since the output is binary, we are using the sigmoid function. If the output is multiclass (that is, more than two classes), then the softmax function should be used.

  8. Compile the network and fit the model. The metric that's being used here is accuracy:

    # Compile the Model

    model.compile(optimizer='adam', loss='binary_crossentropy', \

                  metrics=['accuracy'])

    Note

    The metric name, which in our case is accuracy, is defined in the preceding code.

  9. Fit the model with 100 epochs, a batch size of 20, and a validation split of 0.2:

    # Fit the Model

    model.fit(X_train, y_train, epochs=100, batch_size=20, \

              verbose=1, validation_split=0.2, shuffle=False)

  10. Evaluate the model on the test dataset and print out the values for the loss and accuracy:

    test_loss, test_acc = model.evaluate(X_test, y_test)

    print(f'The loss on the test set is {test_loss:.4f} and \

    the accuracy is {test_acc*100:.4f}%')

    The preceding code produces the following output:

    18000/18000 [==============================] - 0s 19us/step

    The loss on the test set is 0.0766 and the accuracy is 98.9833%

    The model returns an accuracy of 98.9833%. But is it good enough? We can only get the answer to this question by comparing it against the null accuracy.

  11. Now, compute the null accuracy. The null accuracy can be calculated using the value_count function of the pandas library, which we used in Exercise 6.01, Calculating Null Accuracy on a Pacific Hurricanes Dataset, of this chapter:

    # Use the value_count function to calculate distinct class values

    y_test['class'].value_counts()

    The preceding code produces the following output:

    0 17700

    1 300

    Name: class, dtype: int64

  12. Calculate the null accuracy:

    # Calculate the null accuracy

    y_test['class'].value_counts(normalize=True).loc[0]

    The preceding code produces the following output:

    0.9833333333333333

    Note

    To access the source code for this specific section, please refer to https://packt.live/3eY7y1E.

    You can also run this example online at https://packt.live/2BzBO4n.

Activity 6.02: Calculating the ROC Curve and AUC Score

The ROC curve and AUC score is an effective way to easily evaluate the performance of a binary classifier. In this activity, we will plot the ROC curve and calculate the AUC score of a model. We will use the same dataset and train the same model that we used in Exercise 6.03, Deriving and Computing Metrics Based on a Confusion Matrix. Continue with the same APS failure data, plot the ROC curve, and compute the AUC score of the model. Follow these steps to complete this activity:

  1. Import the necessary libraries and load the data using the pandas read_csv function:

    # Import the libraries

    import numpy as np

    import pandas as pd

    # Load the Data

    X = pd.read_csv("../data/aps_failure_training_feats.csv")

    y = pd.read_csv("../data/aps_failure_training_target.csv")

  2. Split the data into training and test datasets using the train_test_split function:

    from sklearn.model_selection import train_test_split

    seed = 42

    X_train, X_test, y_train, y_test = \

    train_test_split(X, y, test_size=0.20, random_state=seed)

  3. Scale the feature data so that it has a mean of 0 and a standard deviation of 1 using the StandardScaler function. Fit the scaler in the training data and apply it to the test data:

    from sklearn.preprocessing import StandardScaler

    sc = StandardScaler()

    # Transform the training data

    X_train = sc.fit_transform(X_train)

    X_train = pd.DataFrame(X_train,columns=X_test.columns)

    # Transform the testing data

    X_test = sc.transform(X_test)

    X_test = pd.DataFrame(X_test,columns=X_train.columns)

  4. Import the Keras libraries that are required for creating the model. Instantiate a Keras model of the Sequential class and add five hidden layers to the model, including dropout for each layer. The first hidden layer should have a size of 64 and a dropout rate of 0.5. The second hidden layer should have a size of 32 and a dropout rate of 0.4. The third hidden layer should have a size of 16 and a dropout rate of 0.3. The fourth hidden layer should have a size of 8 and a dropout rate of 0.2. The final hidden layer should have a size of 4 and a dropout rate of 0.1. All the hidden layers should have ReLU activation functions and set kernel_initializer = 'uniform'. Add a final output layer to the model with a sigmoid activation function. Compile the model by calculating the accuracy metric during the training process:

    # Import the relevant Keras libraries

    from keras.models import Sequential

    from keras.layers import Dense

    from keras.layers import Dropout

    from tensorflow import random

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

    # Add the hidden dense layers with dropout Layer

    model.add(Dense(units=64, activation='relu', \

                    kernel_initializer='uniform', \

                    input_dim=X_train.shape[1]))

    model.add(Dropout(rate=0.5))

    model.add(Dense(units=32, activation='relu', \

                    kernel_initializer='uniform'))

    model.add(Dropout(rate=0.4))

    model.add(Dense(units=16, activation='relu', \

                    kernel_initializer='uniform'))

    model.add(Dropout(rate=0.3))

    model.add(Dense(units=8, activation='relu', \

              kernel_initializer='uniform'))

    model.add(Dropout(rate=0.2))

    model.add(Dense(units=4, activation='relu', \

                    kernel_initializer='uniform'))

    model.add(Dropout(rate=0.1))

    # Add Output Dense Layer

    model.add(Dense(units=1, activation='sigmoid', \

                    kernel_initializer='uniform'))

    # Compile the Model

    model.compile(optimizer='adam', loss='binary_crossentropy', \

                  metrics=['accuracy'])

  5. Fit the model to the training data by training for 100 epochs with batch_size=20 and with validation_split=0.2:

    model.fit(X_train, y_train, epochs=100, batch_size=20, \

              verbose=1, validation_split=0.2, shuffle=False)

  6. Once the model has finished fitting to the training data, create a variable that is the result of the model's prediction on the test data using the model's predict_proba methods:

    y_pred_prob = model.predict_proba(X_test)

  7. Import roc_curve from scikit-learn and run the following code:

    from sklearn.metrics import roc_curve

    fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)

    fpr = False positive rate (1 - specificity)

    tpr = True positive rate (sensitivity)

    thresholds = The threshold value of y_pred_prob

  8. Run the following code to plot the ROC curve using matplotlib.pyplot:

    import matplotlib.pyplot as plt

    plt.plot(fpr, tpr)

    plt.title("ROC Curve for APS Failure")

    plt.xlabel("False Positive rate (1-Specificity)")

    plt.ylabel("True Positive rate (Sensitivity)")

    plt.grid(True)

    plt.show()

    The following plot shows the output of the preceding code:

    Figure 6.14: ROC curve of the APS failure dataset

  9. Calculate the AUC score using the roc_auc_score function:

    from sklearn.metrics import roc_auc_score

    roc_auc_score(y_test,y_pred_prob)

    The following is the output of the preceding code:

    0.944787151628455

    The AUC score of 94.4479% suggests that our model is excellent, as per the general acceptable AUC score shown above.

    Note

    To access the source code for this specific section, please refer to https://packt.live/2NUOgyh.

    You can also run this example online at https://packt.live/2As33NH.

7. Computer Vision with Convolutional Neural Networks

Activity 7.01: Amending Our Model with Multiple Layers and the Use of softmax

Let's try and improve the performance of our image classification algorithm. There are many ways to improve its performance, and one of the most straightforward ways is by adding multiple ANN layers to the model, which we will learn about in this activity. We will also change the activation from sigmoid to softmax. Then, we can compare the result with that of the previous exercise. Follow these steps to complete this activity:

  1. Import the numpy library and the necessary Keras libraries and classes:

    # Import the Libraries

    from keras.models import Sequential

    from keras.layers import Conv2D, MaxPool2D, Flatten, Dense

    import numpy as np

    from tensorflow import random

  2. Now, initiate the model with the Sequential class:

    # Initiate the classifier

    seed = 1

    np.random.seed(seed)

    random.set_seed(seed)

    classifier=Sequential()

  3. Add the first layer of the CNN, set the input shape to (64, 64, 3), the dimension of each image, and the activation function as a ReLU. Then, add 32 feature detectors of size (3, 3). Add two additional convolutional layers with 32 feature detectors of size (3, 3), also with ReLU activation functions:

    classifier.add(Conv2D(32,(3,3),input_shape=(64,64,3),\

                   activation='relu'))

    classifier.add(Conv2D(32,(3,3),activation = 'relu'))

    classifier.add(Conv2D(32,(3,3),activation = 'relu'))

    32, (3, 3) means that there are 32 feature detectors of size 3x3. As a good practice, always start with 32; you can add 64 or 128 later.

  4. Now, add the pooling layer with an image size of 2x2:

    classifier.add(MaxPool2D(pool_size=(2,2)))

  5. Flatten the output of the pooling layer by adding a flattening layer to the CNN model:

    classifier.add(Flatten())

  6. Add the first dense layer of the ANN. Here, 128 is the output of the number of nodes. As a good practice, 128 is good to get started. activation is relu. As a good practice, the power of two is preferred:

    classifier.add(Dense(units=128,activation='relu'))

  7. Add three more layers to the ANN of the same size, 128, along with ReLU activation functions:

    classifier.add(Dense(128,activation='relu'))

    classifier.add(Dense(128,activation='relu'))

    classifier.add(Dense(128,activation='relu'))

  8. Add the output layer of the ANN. Replace the sigmoid function with softmax:

    classifier.add(Dense(units=1,activation='softmax'))

  9. Compile the network with an Adam optimizer and compute the accuracy during the training process:

    # Compile The network

    classifier.compile(optimizer='adam', loss='binary_crossentropy', \

                       metrics=['accuracy'])

  10. Create training and test data generators. Rescale the training and test images by 1/255 so that all the values are between 0 and 1. Set these parameters for the training data generators only – shear_range=0.2, zoom_range=0.2, and horizontal_flip=True:

    from keras.preprocessing.image import ImageDataGenerator

    train_datagen = ImageDataGenerator(rescale = 1./255, \

                                       shear_range = 0.2, \

                                       zoom_range = 0.2, \

                                       horizontal_flip = True)

    test_datagen = ImageDataGenerator(rescale = 1./255)

  11. Create a training set from the training set folder. '../dataset/training_set' is the folder where our data has been placed. Our CNN model has an image size of 64x64, so the same size should be passed here too. batch_size is the number of images in a single batch, which is 32. class_mode is set to binary since we are working on binary classifiers:

    training_set = \

    train_datagen.flow_from_directory('../dataset/training_set', \

                                      target_size = (64, 64), \

                                      batch_size = 32, \

                                      class_mode = 'binary')

  12. Repeat step 6 for the test by setting the folder to the location of the test images, that is, '../dataset/test_set':

    test_set = \

    test_datagen.flow_from_directory('../dataset/test_set', \

                                     target_size = (64, 64), \

                                     batch_size = 32, \

                                     class_mode = 'binary')

  13. Finally, fit the data. Set the steps_per_epoch to 10000 and the validation_steps to 2500. The following step might take some time to execute:

    classifier.fit_generator(training_set, steps_per_epoch = 10000, \

                             epochs = 2, validation_data = test_set, \

                             validation_steps = 2500, shuffle=False)

    The preceding code produces the following output:

    Epoch 1/2

    10000/10000 [==============================] - 2452s 245ms/step - loss: 8.1783 - accuracy: 0.4667 - val_loss: 11.4999 - val_accuracy: 0.4695

    Epoch 2/2

    10000/10000 [==============================] - 2496s 250ms/step - loss: 8.1726 - accuracy: 0.4671 - val_loss: 10.5416 - val_accuracy: 0.4691

    Note that the accuracy has decreased to 46.91% due to the new softmax activation function.

    Note

    To access the source code for this specific section, please refer to https://packt.live/3gj0TiA.

    You can also run this example online at https://packt.live/2VIDj7e.

Activity 7.02: Classifying a New Image

In this activity, you will try to classify another new image, just like we did in the preceding exercise. The image hasn't been exposed to the algorithm, so we will use this activity to test our algorithm. You can run any of the algorithms in this chapter (although the one that gets the highest accuracy is preferred) and then use the model to classify your images. Follow these steps to complete this activity:

  1. Run one of the algorithms from this chapter.
  2. Load the image and process it. 'test_image_2.jpg' is the path of the test image. Change the path in the code where you have saved the dataset:

    from keras.preprocessing import image

    new_image = \

    image.load_img('../test_image_2.jpg', target_size = (64, 64))

    new_image

  3. You can view the class labels using the following code:

    training_set.class_indices

  4. Process the image by converting it into a numpy array using the img_to_array function. Then, add an additional dimension along the 0th axis using numpy's expand_dims function:

    new_image = image.img_to_array(new_image)

    new_image = np.expand_dims(new_image, axis = 0)

  5. Predict the new image by calling the predict method of the classifier:

    result = classifier.predict(new_image)

  6. Use the class_indices method with an if…else statement to map the 0 or 1 output of the prediction to a class label:

    if result[0][0] == 1:

        prediction = 'It is a flower'

    else:

        prediction = 'It is a car'

    print(prediction)

    The preceding code produces the following output:

    It is a flower

    test_image_2 is an image of a flower and was predicted to be a flower.

    Note

    To access the source code for this specific section, please refer to https://packt.live/38ny95E.

    You can also run this example online at https://packt.live/2VIM4Ow.

8. Transfer Learning and Pre-Trained Models

Activity 8.01: Using the VGG16 Network to Train a Deep Learning Network to Identify Images

Use the VGG16 network to predict the image given (test_image_1). Before you start, ensure that you have downloaded the image (test_image_1) to your working directory. Follow these steps to complete this activity:

  1. Import the numpy library and the necessary Keras libraries:

    import numpy as np

    from keras.applications.vgg16 import VGG16, preprocess_input

    from keras.preprocessing import image

  2. Initiate the model (note that, at this point, you can also view the architecture of the network, as shown in the following code):

    classifier = VGG16()

    classifier.summary()

    classifier.summary() shows us the architecture of the network. The following points should be noted: it has a four-dimensional input shape (None, 224, 224, 3) and it has three convolutional layers.

    The last four layers of the output are as follows:

    Figure 8.16: The architecture of the network

  3. Load the image. '../Data/Prediction/test_image_1.jpg' is the path of the image on our system. It will be different on your system:

    new_image = \

    image.load_img('../Data/Prediction/test_image_1.jpg', \

                   target_size=(224, 224))

    new_image

    The following figure shows the output of the preceding code:

    Figure 8.17: The sample motorbike image

    The target size should be 224x 224 since VGG16 only accepts (224,224).

  4. Change the image into an array by using the img_to_array function:

    transformed_image = image.img_to_array(new_image)

    transformed_image.shape

    The preceding code provides the following output:

    (224, 224, 3)

  5. The image should be in a four-dimensional form for VGG16 to allow further processing. Expand the dimension of the image, as follows:

    transformed_image = np.expand_dims(transformed_image, axis=0)

    transformed_image.shape

    The preceding code provides the following output:

    (1, 224, 224, 3)

  6. Preprocess the image:

    transformed_image = preprocess_input(transformed_image)

    transformed_image

    The following figure shows the output of the preceding code:

    Figure 8.18: Image preprocessing

  7. Create the predictor variable:

    y_pred = classifier.predict(transformed_image)

    y_pred

    The following figure shows the output of the preceding code:

    Figure 8.19: Creating the predictor variable

  8. Check the shape of the image. It should be (1,1000). It's 1000 because, as we mentioned previously, the ImageNet database has 1000 categories of images. The predictor variable shows the probabilities of our image being one of those images:

    y_pred.shape

    The preceding code provides the following output:

    (1, 1000)

  9. Print the top five probabilities of what our image is using the decode_predictions function and pass the function of the predictor variable, y_pred, and the number of predictions and corresponding labels to output:

    from keras.applications.vgg16 import decode_predictions

    decode_predictions(y_pred, top=5)

    The preceding code provides the following output:

    [[('n03785016', 'moped', 0.8433369),

      ('n03791053', 'motor_scooter', 0.14188054),

      ('n03127747', 'crash_helmet', 0.007004856),

      ('n03208938', 'disk_brake', 0.0022349996),

      ('n04482393', 'tricycle', 0.0007717237)]]

    The first column of the array is an internal code number. The second is the label, while the third is the probability of the image being the label.

  10. Transform the predictions into a human-readable format. We need to extract the most probable label from the output, as follows:

    label = decode_predictions(y_pred)

    """

    Most likely result is retrieved, for example, the highest probability

    """

    decoded_label = label[0][0]

    # The classification is printed

    print('%s (%.2f%%)' % (decoded_label[1], decoded_label[2]*100 ))

    The preceding code provides the following output:

    moped (84.33%)

    Here, we can see that we have an 84.33% probability that the picture is of a moped, which is close enough to a motorbike and probably represents the fact that motorbikes in the ImageNet dataset were labeled as mopeds.

    Note

    To access the source code for this specific section, please refer to https://packt.live/2C4nqRo.

    You can also run this example online at https://packt.live/31JMPL4.

Activity 8.02: Image Classification with ResNet

In this activity, we will use another pre-trained network, known as ResNet. We have an image of television located at ../Data/Prediction/test_image_4. We will use the ResNet50 network to predict the image. Follow these steps to complete this activity:

  1. Import the numpy library and the necessary Keras libraries:

    import numpy as np

    from keras.applications.resnet50 import ResNet50, preprocess_input

    from keras.preprocessing import image

  2. Initiate the ResNet50 model and print a summary of the model:

    classifier = ResNet50()

    classifier.summary()

    classifier.summary() shows us the architecture of the network. The following points should be noted:

    Figure 8.20: The last four layers of the output

    Note

    The last layer predictions (Dense) have 1000 values. This means that VGG16 has a total of 1000 labels and that our image will be one of those 1000 labels.

  3. Load the image. '../Data/Prediction/test_image_4.jpg' is the path of the image on our system. It will be different on your system:

    new_image = \

    image.load_img('../Data/Prediction/test_image_4.jpg', \

                   target_size=(224, 224))

    new_image

    The following is the output of the preceding code:

    Figure 8.21: A sample image of a television

    The target size should be 224x224 since ResNet50 only accepts (224,224).

  4. Change the image into an array by using the img_to_array function:

    transformed_image = image.img_to_array(new_image)

    transformed_image.shape

  5. The image has to be in a four-dimensional form for ResNet50 to allow further processing. Expand the dimensions of the image along the 0th axis using the expand_dims function:

    transformed_image = np.expand_dims(transformed_image, axis=0)

    transformed_image.shape

  6. Preprocess the image using the preprocess_input function:

    transformed_image = preprocess_input(transformed_image)

    transformed_image

  7. Create the predictor variable by using the classifier to predict the image using it's predict method:

    y_pred = classifier.predict(transformed_image)

    y_pred

  8. Check the shape of the image. It should be (1,1000):

    y_pred.shape

    The preceding code provides the following output:

    (1, 1000)

  9. Select the top five probabilities of what our image is using the decode_predictions function and by passing the predictor variable, y_pred, as the argument and the top number of predictions and corresponding labels:

    from keras.applications.resnet50 import decode_predictions

    decode_predictions(y_pred, top=5)

    The preceding code provides the following output:

    [[('n04404412', 'television', 0.99673873),

      ('n04372370', 'switch', 0.0009829825),

      ('n04152593', 'screen', 0.00095111143),

      ('n03782006', 'monitor', 0.0006477369),

      ('n04069434', 'reflex_camera', 8.5398955e-05)]]

    The first column of the array is an internal code number. The second is the label, while the third is the probability of the image matching the label.

  10. Put the predictions in a human-readable format. Print the most probable label from the output from the result of the decode_predictions function:

    label = decode_predictions(y_pred)

    """

    Most likely result is retrieved, for example,

    the highest probability

    """

    decoded_label = label[0][0]

    # The classification is printed

    print('%s (%.2f%%)' % (decoded_label[1], decoded_label[2]*100 ))

    The preceding code produces the following output:

    television (99.67%)

    Note

    To access the source code for this specific section, please refer to https://packt.live/38rEe0M.

    You can also run this example online at https://packt.live/2YV5xxo.

9. Sequential Modeling with Recurrent Neural Networks

Activity 9.01: Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons)

In this activity, we will examine the stock price of Amazon for the last 5 years—from January 1, 2014, to December 31, 2018. In doing so, we will try to predict and forecast the company's future trend for January 2019 using an RNN and LSTM. We have the actual values for January 2019, so we can compare our predictions to the actual values later. Follow these steps to complete this activity:

  1. Import the required libraries:

    import numpy as np

    import matplotlib.pyplot as plt

    import pandas as pd

    from tensorflow import random

  2. Import the dataset using the pandas read_csv function and look at the first five rows of the dataset using the head method:

    dataset_training = pd.read_csv('../AMZN_train.csv')

    dataset_training.head()

    The following figure shows the output of the preceding code:

    Figure 9.24: The first five rows of the dataset

  3. We are going to make our prediction using the Open stock price; therefore, select the Open stock price column from the dataset and print the values:

    training_data = dataset_training[['Open']].values

    training_data

    The preceding code produces the following output:

    array([[ 398.799988],

           [ 398.290009],

           [ 395.850006],

           ...,

           [1454.199951],

           [1473.349976],

           [1510.800049]])

  4. Then, perform feature scaling by normalizing the data using MinMaxScaler and setting the range of the features so that they have a minimum value of zero and a maximum value of one. Use the fit_transform method of the scaler on the training data:

    from sklearn.preprocessing import MinMaxScaler

    sc = MinMaxScaler(feature_range = (0, 1))

    training_data_scaled = sc.fit_transform(training_data)

    training_data_scaled

    The preceding code produces the following output:

    array([[0.06523313],

           [0.06494233],

           [0.06355099],

           ...,

           [0.66704299],

           [0.67796271],

           [0.69931748]])

  5. Create the data to get 60 timestamps from the current instance. We chose 60 here as it will give us a sufficient number of previous instances in order to understand the trend; technically, this can be any number, but 60 is the optimal value. Additionally, the upper bound value here is 1258, which is the index or count of rows (or records) in the training set:

    X_train = []

    y_train = []

    for i in range(60, 1258):

        X_train.append(training_data_scaled[i-60:i, 0])

        y_train.append(training_data_scaled[i, 0])

    X_train, y_train = np.array(X_train), np.array(y_train)

  6. Reshape the data to add an extra dimension to the end of X_train using NumPy's reshape function:

    X_train = np.reshape(X_train, (X_train.shape[0], \

                         X_train.shape[1], 1))

  7. Import the following libraries to build the RNN:

    from keras.models import Sequential

    from keras.layers import Dense, LSTM, Dropout

  8. Set the seed and initiate the sequential model, as follows:

    seed = 1

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

  9. Add an LSTM layer to the network with 50 units, set the return_sequences argument to True, and set the input_shape argument to (X_train.shape[1], 1). Add three additional LSTM layers, each with 50 units, and set the return_sequences argument to True for the first two. Add a final output layer of size 1:

    model.add(LSTM(units = 50, return_sequences = True, \

              input_shape = (X_train.shape[1], 1)))

    # Adding a second LSTM layer

    model.add(LSTM(units = 50, return_sequences = True))

    # Adding a third LSTM layer

    model.add(LSTM(units = 50, return_sequences = True))

    # Adding a fourth LSTM layer

    model.add(LSTM(units = 50))

    # Adding the output layer

    model.add(Dense(units = 1))

  10. Compile the network with an adam optimizer and use Mean Squared Error for the loss. Fit the model to the training data for 100 epochs with a batch size of 32:

    # Compiling the RNN

    model.compile(optimizer = 'adam', loss = 'mean_squared_error')

    # Fitting the RNN to the Training set

    model.fit(X_train, y_train, epochs = 100, batch_size = 32)

  11. Load and process the test data (which is treated as actual data here) and select the column representing the value of Open stock data:

    dataset_testing = pd.read_csv('../AMZN_test.csv')

    actual_stock_price = dataset_testing[['Open']].values

    actual_stock_price

  12. Concatenate the data since we will need 60 previous instances to get the stock price for each day. Therefore, we will need both the training and test data:

    total_data = pd.concat((dataset_training['Open'], \

                            dataset_testing['Open']), axis = 0)

  13. Reshape and scale the input to prepare the test data. Note that we are predicting the January monthly trend, which has 21 financial days, so in order to prepare the test set, we take the lower bound value as 60 and the upper bound value as 81. This ensures that the difference of 21 is maintained:

    inputs = total_data[len(total_data) \

             - len(dataset_testing) - 60:].values

    inputs = inputs.reshape(-1,1)

    inputs = sc.transform(inputs)

    X_test = []

    for i in range(60, 81):

        X_test.append(inputs[i-60:i, 0])

    X_test = np.array(X_test)

    X_test = np.reshape(X_test, (X_test.shape[0], \

                                 X_test.shape[1], 1))

    predicted_stock_price = model.predict(X_test)

    predicted_stock_price = \

    sc.inverse_transform(predicted_stock_price)

  14. Visualize the results by plotting the actual stock price and plotting the predicted stock price:

    # Visualizing the results

    plt.plot(actual_stock_price, color = 'green', \

             label = 'Real Amazon Stock Price',ls='--')

    plt.plot(predicted_stock_price, color = 'red', \

             label = 'Predicted Amazon Stock Price',ls='-')

    plt.title('Predicted Stock Price')

    plt.xlabel('Time in days')

    plt.ylabel('Real Stock Price')

    plt.legend()

    plt.show()

    Please note that your results may differ slightly from the actual stock price of Amazon.

    Expected output:

Figure 9.25: Real versus predicted stock prices

As shown in the preceding plot, the trends of the predicted and real prices are pretty much the same; the line has the same peaks and troughs. This is possible because of LSTM's ability to remember sequenced data. A traditional feedforward neural network would not have been able to forecast this result. This is the true power of LSTM and RNNs.

Note

To access the source code for this specific section, please refer to https://packt.live/3goQO3I.

You can also run this example online at https://packt.live/2VIMq7O.

Activity 9.02: Predicting Amazon's Stock Price with Added Regularization

In this activity, we will examine the stock price of Amazon over the last 5 years, from January 1, 2014, to December 31, 2018. In doing so, we will try to predict and forecast the company's future trend for January 2019 using RNNs and an LSTM. We have the actual values for January 2019, so we will be able to compare our predictions with the actual values later. Initially, we predicted the trend of Amazon's stock price using an LSTM with 50 units (or neurons). In this activity, we will also add dropout regularization and compare the results with Activity 9.01, Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons). Follow these steps to complete this activity:

  1. Import the required libraries:

    import numpy as np

    import matplotlib.pyplot as plt

    import pandas as pd

    from tensorflow import random

  2. Import the dataset using the pandas read_csv function and look at the first five rows of the dataset using the head method:

    dataset_training = pd.read_csv('../AMZN_train.csv')

    dataset_training.head()

  3. We are going to make our prediction using the Open stock price; therefore, select the Open stock price column from the dataset and print the values:

    training_data = dataset_training[['Open']].values

    training_data

    The preceding code produces the following output:

    array([[ 398.799988],

           [ 398.290009],

           [ 395.850006],

           ...,

           [1454.199951],

           [1473.349976],

           [1510.800049]])

  4. Then, perform feature scaling by normalizing the data using MinMaxScaler and setting the range of the features so that they have a minimum value of 0 and a maximum value of one. Use the fit_transform method of the scaler on the training data:

    from sklearn.preprocessing import MinMaxScaler

    sc = MinMaxScaler(feature_range = (0, 1))

    training_data_scaled = sc.fit_transform(training_data)

    training_data_scaled

    The preceding code produces the following output:

    array([[0.06523313],

           [0.06494233],

           [0.06355099],

           ...,

           [0.66704299],

           [0.67796271],

           [0.69931748]])

  5. Create the data to get 60 timestamps from the current instance. We chose 60 here as it will give us a sufficient number of previous instances in order to understand the trend; technically, this can be any number, but 60 is the optimal value. Additionally, the upper bound value here is 1258, which is the index or count of rows (or records) in the training set:

    X_train = []

    y_train = []

    for i in range(60, 1258):

        X_train.append(training_data_scaled[i-60:i, 0])

        y_train.append(training_data_scaled[i, 0])

    X_train, y_train = np.array(X_train), np.array(y_train)

  6. Reshape the data to add an extra dimension to the end of X_train using NumPy's reshape function:

    X_train = np.reshape(X_train, (X_train.shape[0], \

                                   X_train.shape[1], 1))

  7. Import the following Keras libraries to build the RNN:

    from keras.models import Sequential

    from keras.layers import Dense, LSTM, Dropout

  8. Set the seed and initiate the sequential model, as follows:

    seed = 1

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

  9. Add an LSTM layer to the network with 50 units, set the return_sequences argument to True, and set the input_shape argument to (X_train.shape[1], 1). Add dropout to the model with rate=0.2. Add three additional LSTM layers, each with 50 units, and set the return_sequences argument to True for the first two. After each LSTM layer, add a dropout with rate=0.2. Add a final output layer of size 1:

    model.add(LSTM(units = 50, return_sequences = True, \

                   input_shape = (X_train.shape[1], 1)))

    model.add(Dropout(0.2))

    # Adding a second LSTM layer and some Dropout regularization

    model.add(LSTM(units = 50, return_sequences = True))

    model.add(Dropout(0.2))

    # Adding a third LSTM layer and some Dropout regularization

    model.add(LSTM(units = 50, return_sequences = True))

    model.add(Dropout(0.2))

    # Adding a fourth LSTM layer and some Dropout regularization

    model.add(LSTM(units = 50))

    model.add(Dropout(0.2))

    # Adding the output layer

    model.add(Dense(units = 1))

  10. Compile the network with an adam optimizer and use Mean Squared Error for the loss. Fit the model to the training data for 100 epochs with a batch size of 32:

    # Compiling the RNN

    model.compile(optimizer = 'adam', loss = 'mean_squared_error')

    # Fitting the RNN to the Training set

    model.fit(X_train, y_train, epochs = 100, batch_size = 32)

  11. Load and process the test data (which is treated as actual data here) and select the column representing the value of Open stock data:

    dataset_testing = pd.read_csv('../AMZN_test.csv')

    actual_stock_price = dataset_testing[['Open']].values

    actual_stock_price

  12. Concatenate the data since we will need 60 previous instances to get the stock price for each day. Therefore, we will need both the training and test data:

    total_data = pd.concat((dataset_training['Open'], \

                            dataset_testing['Open']), axis = 0)

  13. Reshape and scale the input to prepare the test data. Note that we are predicting the January monthly trend, which has 21 financial days, so in order to prepare the test set, we take the lower bound value as 60 and the upper bound value as 81. This ensures that the difference of 21 is maintained:

    inputs = total_data[len(total_data) \

             - len(dataset_testing) - 60:].values

    inputs = inputs.reshape(-1,1)

    inputs = sc.transform(inputs)

    X_test = []

    for i in range(60, 81):

        X_test.append(inputs[i-60:i, 0])

    X_test = np.array(X_test)

    X_test = np.reshape(X_test, (X_test.shape[0], \

                                 X_test.shape[1], 1))

    predicted_stock_price = model.predict(X_test)

    predicted_stock_price = \

    sc.inverse_transform(predicted_stock_price)

  14. Visualize the results by plotting the actual stock price and plotting the predicted stock price:

    # Visualizing the results

    plt.plot(actual_stock_price, color = 'green', \

             label = 'Real Amazon Stock Price',ls='--')

    plt.plot(predicted_stock_price, color = 'red', \

             label = 'Predicted Amazon Stock Price',ls='-')

    plt.title('Predicted Stock Price')

    plt.xlabel('Time in days')

    plt.ylabel('Real Stock Price')

    plt.legend()

    plt.show()

Please note that your results may differ slightly to the actual stock price.

Expected output:

Figure 9.26: Real versus predicted stock prices

In the following figure, the first plot displays the predicted output of the model with regularization from Activity 9.02, and the second displays the predicted output without regularization from Activity 9.01. As you can see, adding dropout regularization does not fit the data as accurately. So, in this case, it is better not to use regularization, or to use dropout regularization with a lower dropout rate :

Figure 9.27: Comparing the results of Activity 9.01 and Activity 9.02

Note

To access the source code for this specific section, please refer to https://packt.live/2YTpxR7.

You can also run this example online at https://packt.live/3dY5Bku.

Activity 9.03: Predicting the Trend of Amazon's Stock Price Using an LSTM with an Increasing Number of LSTM Neurons (100 Units)

In this activity, we will examine the stock price of Amazon over the last 5 years, from January 1, 2014, to December 31, 2018. We will try to predict and forecast the company's future trend for January 2019 using RNNs with four LSTM layers, each with 100 units. We have the actual values for January 2019, so we will be able to compare our predictions with the actual values later. You can also compare the output difference with Activity 9.01, Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons). Follow these steps to complete this activity:

  1. Import the required libraries:

    import numpy as np

    import matplotlib.pyplot as plt

    import pandas as pd

    from tensorflow import random

  2. Import the dataset using the pandas read_csv function and look at the first five rows of the dataset using the head method:

    dataset_training = pd.read_csv('../AMZN_train.csv')

    dataset_training.head()

  3. We are going to make our prediction using the Open stock price; therefore, select the Open stock price column from the dataset and print the values:

    training_data = dataset_training[['Open']].values

    training_data

  4. Then, perform feature scaling by normalizing the data using MinMaxScaler and setting the range of the features so that they have a minimum value of zero and a maximum value of one. Use the fit_transform method of the scaler on the training data:

    from sklearn.preprocessing import MinMaxScaler

    sc = MinMaxScaler(feature_range = (0, 1))

    training_data_scaled = sc.fit_transform(training_data)

    training_data_scaled

  5. Create the data to get 60 timestamps from the current instance. We chose 60 here as it will give us a sufficient number of previous instances in order to understand the trend; technically, this can be any number, but 60 is the optimal value. Additionally, the upper bound value here is 1258, which is the index or count of rows (or records) in the training set:

    X_train = []

    y_train = []

    for i in range(60, 1258):

        X_train.append(training_data_scaled[i-60:i, 0])

        y_train.append(training_data_scaled[i, 0])

    X_train, y_train = np.array(X_train), np.array(y_train)

  6. Reshape the data to add an extra dimension to the end of X_train using NumPy's reshape function:

    X_train = np.reshape(X_train, (X_train.shape[0], \

                                   X_train.shape[1], 1))

  7. Import the following Keras libraries to build the RNN:

    from keras.models import Sequential

    from keras.layers import Dense, LSTM, Dropout

  8. Set the seed and initiate the sequential model:

    seed = 1

    np.random.seed(seed)

    random.set_seed(seed)

    model = Sequential()

  9. Add an LSTM layer to the network with 100 units, set the return_sequences argument to True, and set the input_shape argument to (X_train.shape[1], 1). Add three additional LSTM layers, each with 100 units, and set the return_sequences argument to True for the first two. Add a final output layer of size 1:

    model.add(LSTM(units = 100, return_sequences = True, \

                   input_shape = (X_train.shape[1], 1)))

    # Adding a second LSTM layer

    model.add(LSTM(units = 100, return_sequences = True))

    # Adding a third LSTM layer

    model.add(LSTM(units = 100, return_sequences = True))

    # Adding a fourth LSTM layer

    model.add(LSTM(units = 100))

    # Adding the output layer

    model.add(Dense(units = 1))

  10. Compile the network with an adam optimizer and use Mean Squared Error for the loss. Fit the model to the training data for 100 epochs with a batch size of 32:

    # Compiling the RNN

    model.compile(optimizer = 'adam', loss = 'mean_squared_error')

    # Fitting the RNN to the Training set

    model.fit(X_train, y_train, epochs = 100, batch_size = 32)

  11. Load and process the test data (which is treated as actual data here) and select the column representing the value of open stock data:

    dataset_testing = pd.read_csv('../AMZN_test.csv')

    actual_stock_price = dataset_testing[['Open']].values

    actual_stock_price

  12. Concatenate the data since we will need 60 previous instances to get the stock price for each day. Therefore, we will need both the training and test data:

    total_data = pd.concat((dataset_training['Open'], \

                            dataset_testing['Open']), axis = 0)

  13. Reshape and scale the input to prepare the test data. Note that we are predicting the January monthly trend, which has 21 financial days, so in order to prepare the test set, we take the lower bound value as 60 and the upper bound value as 81. This ensures that the difference of 21 is maintained:

    inputs = total_data[len(total_data) \

             - len(dataset_testing) - 60:].values

    inputs = inputs.reshape(-1,1)

    inputs = sc.transform(inputs)

    X_test = []

    for i in range(60, 81):

        X_test.append(inputs[i-60:i, 0])

    X_test = np.array(X_test)

    X_test = np.reshape(X_test, (X_test.shape[0], \

                                 X_test.shape[1], 1))

    predicted_stock_price = model.predict(X_test)

    predicted_stock_price = \

    sc.inverse_transform(predicted_stock_price)

  14. Visualize the results by plotting the actual stock price and plotting the predicted stock price:

    plt.plot(actual_stock_price, color = 'green', \

             label = 'Actual Amazon Stock Price',ls='--')

    plt.plot(predicted_stock_price, color = 'red', \

             label = 'Predicted Amazon Stock Price',ls='-')

    plt.title('Predicted Stock Price')

    plt.xlabel('Time in days')

    plt.ylabel('Real Stock Price')

    plt.legend()

    plt.show()

    Please note that your results may differ slightly from the actual stock price.

    Expected output:

Figure 9.28: Real versus predicted stock prices

So, if we compare the results of the LSTM with 50 units (from Activity 9.01, Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons)) and the LSTM with 100 units in this activity, we get trends with 100 units. Also, note that when we run the LSTM with 100 units, it takes more computational time than the LSTM with 50 units. A trade-off needs to be considered in such cases:

Figure 9.29: Comparing the real versus predicted stock price with 50 and 100 units

Note

To access the source code for this specific section, please refer to https://packt.live/31NQkQy.

You can also run this example online at https://packt.live/2ZCZ4GR.