# 1. Introduction to Machine Learning with Keras

## Activity 1.01: Adding Regularization to the Model

In this activity, we will utilize the same logistic regression model from the scikit-learn package. This time, however, we will add regularization to the model and search for the optimum regularization parameter - a process often called **hyperparameter tuning**. After training the models, we will test the predictions and compare the model evaluation metrics to the ones that were produced by the baseline model and the model without regularization.

- Load the feature data from
*Exercise 1.03*,*Appropriate Representation of the Data*, and the target data from*Exercise 1.02*,*Cleaning the Data*:import pandas as pd

feats = pd.read_csv('../data/OSI_feats_e3.csv')

target = pd.read_csv('../data/OSI_target_e2.csv')

- Create a
**test**and**train**dataset. Train the data using the training dataset. This time, however, use part of the**training**dataset for validation in order to choose the most appropriate hyperparameter.Once again, we will use

**test_size = 0.2**, which means that**20%**of the data will be reserved for testing. The size of our validation set will be determined by how many validation folds we have. If we do**10-fold cross-validation**, this equates to reserving**10%**of the**training**dataset to validate our model on. Each fold will use a different**10%**of the**training**dataset, and the average error across all folds is used to compare models with different hyperparameters. Assign a random value to the**random_state**variable:from sklearn.model_selection import train_test_split

test_size = 0.2

random_state = 13

X_train, X_test, y_train, y_test = \

train_test_split(feats, target, test_size=test_size, \

random_state=random_state)

- Check the dimensions of the DataFrames:
print(f'Shape of X_train: {X_train.shape}')

print(f'Shape of y_train: {y_train.shape}')

print(f'Shape of X_test: {X_test.shape}')

print(f'Shape of y_test: {y_test.shape}')

The preceding code produces the following output:

Shape of X_train: (9864, 68)

Shape of y_train: (9864, 1)

Shape of X_test: (2466, 68)

Shape of y_test: (2466, 1)

- Next, instantiate the models. Try two types of regularization parameters,
**l1**and**l2**, with 10-fold cross-validation. Iterate our regularization parameter from 1x10-2 to 1x106 equally in the logarithmic space to observe how the parameters affect the results:import numpy as np

from sklearn.linear_model import LogisticRegressionCV

Cs = np.logspace(-2, 6, 9)

model_l1 = LogisticRegressionCV(Cs=Cs, penalty='l1', \

cv=10, solver='liblinear', \

random_state=42, max_iter=10000)

model_l2 = LogisticRegressionCV(Cs=Cs, penalty='l2', cv=10, \

random_state=42, max_iter=10000)

Note

For a logistic regression model with the

**l1**regularization parameter, only the**liblinear**solver can be used. - Next, fit the models to the training data:
model_l1.fit(X_train, y_train['Revenue'])

model_l2.fit(X_train, y_train['Revenue'])

The following figure shows the output of the preceding code:

- Here, we can see what the value of the regularization parameter was for the two different models. The regularization parameter is chosen according to which produced a model with the lowest error:
print(f'Best hyperparameter for l1 regularization model: \

{model_l1.C_[0]}')

print(f'Best hyperparameter for l2 regularization model: \

{model_l2.C_[0]}')

The preceding code produces the following output:

Best hyperparameter for l1 regularization model: 1000000.0

Best hyperparameter for l2 regularization model: 1.0

Note

The

**C_**attribute is only available once the model has been trained because it is set once the best parameter from the cross-validation process has been determined. - To evaluate the performance of the models, make predictions on the
**test**set, which we'll compare against the**true**values:y_pred_l1 = model_l1.predict(X_test)

y_pred_l2 = model_l2.predict(X_test)

- To compare these models, calculate the evaluation metrics. First, look at the accuracy of the model:
from sklearn import metrics

accuracy_l1 = metrics.accuracy_score(y_pred=y_pred_l1, \

y_true=y_test)

accuracy_l2 = metrics.accuracy_score(y_pred=y_pred_l2, \

y_true=y_test)

print(f'Accuracy of the model with l1 regularization is \

{accuracy_l1*100:.4f}%')

print(f'Accuracy of the model with l2 regularization is \

{accuracy_l2*100:.4f}%')

The preceding code produces the following output:

Accuracy of the model with l1 regularization is 89.2133%

Accuracy of the model with l2 regularization is 89.2944%

- Also, look at the other evaluation metrics:
precision_l1, recall_l1, fscore_l1, _ = \

metrics.precision_recall_fscore_support(y_pred=y_pred_l1, \

y_true=y_test, \

average='binary')

precision_l2, recall_l2, fscore_l2, _ = \

metrics.precision_recall_fscore_support(y_pred=y_pred_l2, \

y_true=y_test, \

average='binary')

print(f'l1\nPrecision: {precision_l1:.4f}\nRecall: \

{recall_l1:.4f}\nfscore: {fscore_l1:.4f}\n\n')

print(f'l2\nPrecision: {precision_l2:.4f}\nRecall: \

{recall_l2:.4f}\nfscore: {fscore_l2:.4f}')

The preceding code produces the following output:

l1

Precision: 0.7300

Recall: 0.4078

fscore: 0.5233

l2

Precision: 0.7350

Recall: 0.4106

fscore: 0.5269

- Observe the values of the coefficients once the model has been trained:
coef_list = [f'{feature}: {coef}' for coef, \

feature in sorted(zip(model_l1.coef_[0], \

X_train.columns.values.tolist()))]

for item in coef_list:

print(item)

Note

The

**coef_**attribute is only available once the model has been trained because it is set once the best parameter from the cross-validation process has been determined.The following figure shows the output of the preceding code:

- Do the same for the model with an
**l2**regularization parameter type:coef_list = [f'{feature}: {coef}' for coef, \

feature in sorted(zip(model_l2.coef_[0], \

X_train.columns.values.tolist()))]

for item in coef_list:

print(item)

The following figure shows the output of the preceding code:

Note

To access the source code for this specific section, please refer to https://packt.live/2VIoe5M.

This section does not currently have an online interactive example, and will need to be run locally.

# 2. Machine Learning versus Deep Learning

## Activity 2.01: Creating a Logistic Regression Model Using Keras

In this activity, we are going to create a basic model using the Keras library. The model that we will build will classify users of a website into those that will purchase a product from a website and those that will not. To do this, we will utilize the same online shopping purchasing intention dataset that we did previously and attempt to predict the same variables that we did in *Chapter 1*, *Introduction to Machine Learning with Keras*.

Perform the following steps to complete this activity:

- Open a Jupyter notebook from the start menu to implement this activity. Load in the online shopping purchasing intention datasets, which you can download from the GitHub repository. We will use the pandas library for data loading, so import the
**pandas**library. Ensure you have saved the csv files to an appropriate data folder for this chapter first. Alternatively, you can change the path to the files that you use in your code.import pandas as pd

feats = pd.read_csv('../data/OSI_feats.csv')

target = pd.read_csv('../data/OSI_target.csv')

- For the purposes of this activity, we will not perform any further preprocessing. As we did in the previous chapter, we will split the dataset into training and testing and leave the testing until the very end when we evaluate our models. We will reserve
**20%**of our data for testing by setting the**test_size=0.2**parameter, and we will create a**random_state**parameter so that we can recreate the results:from sklearn.model_selection import train_test_split

test_size = 0.2

random_state = 42

X_train, X_test, y_train, y_test = \

train_test_split(feats, target, test_size=test_size, \

random_state=random_state)

- Set a seed in
**numpy**and**tensorflow**for reproducibility. Begin creating the model by initializing a model of the**Sequential**class:from keras.models import Sequential

import numpy as np

from tensorflow import random

np.random.seed(random_state)

random.set_seed(random_state)

model = Sequential()

- To add a fully connected layer to the model, add a layer of the
**Dense**class. Here, we include the number of nodes in the layer. In our case, this will be one since we are performing binary classification and our desired output is**zero**or**one**. Also, specify the input dimensions, which is only done on the first layer of the model. It is there to indicate the format of the input data. Pass the number of features:from keras.layers import Dense

model.add(Dense(1, input_dim=X_train.shape[1]))

- Add a sigmoid activation function to the output of the previous layer to replicate the
**logistic regression**algorithm:from keras.layers import Activation

model.add(Activation('sigmoid'))

- Once we have all the model components in the correct order, we must compile the model so that all the learning processes are configured. Use the
**adam**optimizer, a**binary_crossentropy**for the loss, and track the accuracy of the model by passing the parameter into the**metrics**argument:model.compile(optimizer='adam', loss='binary_crossentropy', \

metrics=['accuracy'])

- Print the model summary to verify the model is as we expect it to be:
print(model.summary())

The following figure shows the output of the preceding code:

- Next, fit the model using the
**fit**method of the**model**class. Provide the training data, as well as the number of epochs and how much data to use for validation after each epoch:history = model.fit(X_train, y_train['Revenue'], epochs=10, \

validation_split=0.2, shuffle=False)

The following figure shows the output of the preceding code:

- The values for the loss and accuracy have been stored within the
**history**variable. Plot the values for each using the loss and accuracy we tracked after each epoch:import matplotlib.pyplot as plt

%matplotlib inline

# Plot training and validation accuracy values

plt.plot(history.history['accuracy'])

plt.plot(history.history['val_accuracy'])

plt.title('Model accuracy')

plt.ylabel('Accuracy')

plt.xlabel('Epoch')

plt.legend(['Train', 'Validation'], loc='upper left')

plt.show()

# Plot training and validation loss values

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.title('Model loss')

plt.ylabel('Loss')

plt.xlabel('Epoch')

plt.legend(['Train', 'Validation'], loc='upper left')

plt.show()

The following plots show the output of the preceding code:

- Finally, evaluate the model on the test data we held out from the beginning, which will give an objective evaluation of the performance of the model:
test_loss, test_acc = model.evaluate(X_test, y_test['Revenue'])

print(f'The loss on the test set is {test_loss:.4f} \

and the accuracy is {test_acc*100:.3f}%')

The output of the preceding code can be found below. Here, the model predicts the purchasing intention of users in the test dataset and evaluates the performance by comparing it to the real values in

**y_test**. Evaluating the model on the test dataset produces loss and accuracy values that we can print out:2466/2466 [==============================] - 0s 15us/step

The loss on the test set is 0.3632 and the accuracy is 86.902%

Note

To access the source code for this specific section, please refer to

You can also run this example online at https://packt.live/2ZxEhV4.

# 3. Deep Learning with Keras

## Activity 3.01: Building a Single-Layer Neural Network for Performing Binary Classification

In this activity, we will compare the results of a logistic regression model and single-layer neural networks of different node sizes and different activation functions. The dataset we will use represents the normalized test results of aircraft propeller inspections, while the class represents whether they passed or failed a manual visual inspection. We will create models to predict the results of the manual inspection when given the automated test results. Follow these steps to complete this activity:

- Load all the required packages:
# import required packages from Keras

from keras.models import Sequential

from keras.layers import Dense, Activation

import numpy as np

import pandas as pd

from tensorflow import random

from sklearn.model_selection import train_test_split

# import required packages for plotting

import matplotlib.pyplot as plt

import matplotlib

%matplotlib inline

import matplotlib.patches as mpatches

# import the function for plotting decision boundary

from utils import plot_decision_boundary

- Set up a
**seed**:"""

define a seed for random number generator so the result will be reproducible

"""

seed = 1

- Load the simulated dataset and print the size of
**X**and**Y**and the number of examples:"""

load the dataset, print the shapes of input and output and the number of examples

"""

feats = pd.read_csv('../data/outlier_feats.csv')

target = pd.read_csv('../data/outlier_target.csv')

print("X size = ", feats.shape)

print("Y size = ", target.shape)

print("Number of examples = ", feats.shape[0])

**Expected output**:X size = (3359, 2)

Y size = (3359, 1)

Number of examples = 3359

- Plot the dataset. The x and y coordinates of each point will be the two input features. The color of each record represents the
**pass**/**fail**result:class_1=plt.scatter(feats.loc[target['Class']==0,'feature1'], \

feats.loc[target['Class']==0,'feature2'], \

c="red", s=40, edgecolor='k')

class_2=plt.scatter(feats.loc[target['Class']==1,'feature1'], \

feats.loc[target['Class']==1,'feature2'], \

c="blue", s=40, edgecolor='k')

plt.legend((class_1, class_2),('Fail','Pass'))

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

The following image shows the output of the preceding code:

- Build the
**logistic regression**model, which will be a one-node sequential model with no hidden layers and a**sigmoid activation**function:np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

model.add(Dense(1, activation='sigmoid', input_dim=2))

model.compile(optimizer='sgd', loss='binary_crossentropy')

- Fit the model to the training data:
model.fit(feats, target, batch_size=5, epochs=100, verbose=1, \

validation_split=0.2, shuffle=False)

**Expected output**:The loss on the validation set after

**100**epochs =**0.3537**: - Plot the decision boundary on the training data:
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plot_decision_boundary(lambda x: model.predict(x), feats, target)

plt.title("Logistic Regression")

The following image shows the output of the preceding code:

The linear decision boundary of the logistic regression model is obviously unable to capture the circular decision boundary between the two classes and predicts all the results as a passed result.

- Create a neural network with one hidden layer with three nodes and a
**relu activation function**and an output layer with one node and a**sigmoid activation function**. Finally, compile the model:np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

model.add(Dense(3, activation='relu', input_dim=2))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='sgd', loss='binary_crossentropy')

- Fit the model to the training data:
model.fit(feats, target, batch_size=5, epochs=200, verbose=1, \

validation_split=0.2, shuffle=False)

**Expected output**:The loss that's evaluated on the validation set after

**200**epochs =**0.0260**: - Plot the decision boundary that was created:
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plot_decision_boundary(lambda x: model.predict(x), feats, target)

plt.title("Decision Boundary for Neural Network with "\

"hidden layer size 3")

The following image shows the output of the preceding code:

Having three processing units instead of one dramatically improved the capability of the model in capturing the non-linear boundary between the two classes. Notice that the loss value decreased drastically in comparison to the previous step.

- Create a neural network with one hidden layer with six nodes and a
**relu activation function**and an output layer with one node and a**sigmoid activation function**. Finally, compile the model:np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

model.add(Dense(6, activation='relu', input_dim=2))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='sgd', loss='binary_crossentropy')

- Fit the model to the training data:
model.fit(feats, target, batch_size=5, epochs=400, verbose=1, \

validation_split=0.2, shuffle=False)

**Expected output**:The loss after

**400**epochs =**0.0231**: - Plot the decision boundary:
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plot_decision_boundary(lambda x: model.predict(x), feats, target)

plt.title("Decision Boundary for Neural Network with "\

"hidden layer size 6")

The following image shows the output of the preceding code:

By doubling the number of units in the hidden layer, the decision boundary of the model gets closer to a true circular shape, and the loss value is decreased even more in comparison to the previous step.

- Create a neural network with one hidden layer with three nodes and a
**tanh activation function**and an output layer with one node and a**sigmoid activation function**. Finally, compile the model:np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

model.add(Dense(3, activation='tanh', input_dim=2))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='sgd', loss='binary_crossentropy')

- Fit the model to the training data:
model.fit(feats, target, batch_size=5, epochs=200, verbose=1, \

validation_split=0.2, shuffle=False)

**Expected output**:The loss after

**200**epochs =**0.0426**: - Plot the decision boundary:
plot_decision_boundary(lambda x: model.predict(x), feats, target)

plt.title("Decision Boundary for Neural Network with "\

"hidden layer size 3")

The following image shows the output of the preceding code:

Using the

**tanh**activation function has eliminated the sharp edges in the decision boundary. In other words, it has made the decision boundary smoother. However, the model is not performing better since we can see an increase in the loss value. We achieved similar loss and accuracy scores when we evaluated on the test dataset, despite mentioning previously that the learning parameters for**tanh**are slower than they are for**relu**. - Create a neural network with one hidden layer with six nodes and a
**tanh activation function**and an output layer with one node and a**sigmoid activation function**. Finally, compile the model:np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

model.add(Dense(6, activation='tanh', input_dim=2))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='sgd', loss='binary_crossentropy')

- Fit the model to the training data:
model.fit(feats, target, batch_size=5, epochs=400, verbose=1, \

validation_split=0.2, shuffle=False)

**Expected output**:The loss after

**400**epochs =**0.0215**: - Plot the decision boundary:
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plot_decision_boundary(lambda x: model.predict(x), feats, target)

plt.title("Decision Boundary for Neural Network with "\

"hidden layer size 6")

The following image shows the output of the preceding code:

Again, using the **tanh** activation function instead of **relu** and adding more nodes to our hidden layer has smoothed the curves on the decision boundary more, fitting the training data better according to the accuracy of the training data. We should be careful not to add too many nodes to the hidden layer as we may begin to overfit the data. This can be observed by evaluating the test set, where there is a slight decrease in the accuracy of the neural network with six nodes compared to a neural network with three.

Note

To access the source code for this specific section, please refer to https://packt.live/3iv0wn1.

You can also run this example online at https://packt.live/2BqumZt.

## Activity 3.02: Advanced Fibrosis Diagnosis with Neural Networks

In this activity, you are going to use a real dataset to predict whether a patient has advanced fibrosis based on measurements such as age, gender, and BMI. The dataset consists of information for 1,385 patients who underwent treatment dosages for hepatitis C. For each patient, **28** different attributes are available, as well as a class label, which can only take two values: **1**, indicating advanced fibrosis, and **0**, indicating no indication of advanced fibrosis. This is a binary/two-class classification problem with an input dimension equal to 28.

In this activity, you will implement different deep neural network architectures to perform this classification, plot the trends in training error rates and test error rates, and determine how many epochs the final classifier needs to be trained for. Follow these steps to complete this activity:

- Import all the necessary libraries and load the dataset using the pandas
**read_csv**function:import pandas as pd

import numpy as np

from tensorflow import random

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from keras.models import Sequential

from keras.layers import Dense

import matplotlib.pyplot as plt

import matplotlib

%matplotlib inline

X = pd.read_csv('../data/HCV_feats.csv')

y = pd.read_csv('../data/HCV_target.csv')

- Print the number of
**records**and**features**in the**feature**dataset and the number of unique classes in the**target**dataset:print("Number of Examples in the Dataset = ", X.shape[0])

print("Number of Features for each example = ", X.shape[1])

print("Possible Output Classes = ", \

y['AdvancedFibrosis'].unique())

**Expected output**:Number of Examples in the Dataset = 1385

Number of Features for each example = 28

Possible Output Classes = [0 1]

- Normalize the data and scale it. Following this, split the dataset into the
**training**and**test**sets:seed = 1

np.random.seed(seed)

sc = StandardScaler()

X = pd.DataFrame(sc.fit_transform(X), columns=X.columns)

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.2, random_state=seed)

# Print the information regarding dataset sizes

print(X_train.shape)

print(y_train.shape)

print(X_test.shape)

print(y_test.shape)

print ("Number of examples in training set = ", X_train.shape[0])

print ("Number of examples in test set = ", X_test.shape[0])

**Expected output**:(1108, 28)

(1108, 1)

(277, 28)

(277, 1)

Number of examples in training set = 1108

Number of examples in test set = 277

- Implement a deep neural network with one hidden layer of size
**3**and a**tanh activation function**, an output layer with one node, and a**sigmoid activation function**. Finally, compile the model and print out a summary of the model:np.random.seed(seed)

random.set_seed(seed)

# define the keras model

classifier = Sequential()

classifier.add(Dense(units = 3, activation = 'tanh', \

input_dim=X_train.shape[1]))

classifier.add(Dense(units = 1, activation = 'sigmoid'))

classifier.compile(optimizer = 'sgd', loss = 'binary_crossentropy', \

metrics = ['accuracy'])

classifier.summary()

The following image shows the output of the preceding code:

- Fit the model to the training data:
history=classifier.fit(X_train, y_train, batch_size = 20, \

epochs = 100, validation_split=0.1, \

shuffle=False)

- Plot the
**training error rate**and**test error rate**for every epoch:plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

**Expected output**: - Print the values of the best accuracy that was reached on the training set and on the test set, as well as the
**loss**and**accuracy**that was evaluated on the**test**dataset.print(f"Best Accuracy on training set = \

{max(history.history['accuracy'])*100:.3f}%")

print(f"Best Accuracy on validation set = \

{max(history.history['val_accuracy'])*100:.3f}%")

test_loss, test_acc = \

classifier.evaluate(X_test, y_test['AdvancedFibrosis'])

print(f'The loss on the test set is {test_loss:.4f} and \

the accuracy is {test_acc*100:.3f}%')

The following image shows the output of the preceding code:

Best Accuracy on training set = 52.959%

Best Accuracy on validation set = 58.559%

277/277 [==============================] - 0s 25us/step

The loss on the test set is 0.6885 and the accuracy is 55.235%

- Implement a deep neural network with two hidden layers of sizes
**4**and**2**with a**tanh activation function**, an output layer with one node, and a**sigmoid activation function**. Finally, compile the model and print out a summary of the model:np.random.seed(seed)

random.set_seed(seed)

# define the keras model

classifier = Sequential()

classifier.add(Dense(units = 4, activation = 'tanh', \

input_dim = X_train.shape[1]))

classifier.add(Dense(units = 2, activation = 'tanh'))

classifier.add(Dense(units = 1, activation = 'sigmoid'))

classifier.compile(optimizer = 'sgd', loss = 'binary_crossentropy', \

metrics = ['accuracy'])

classifier.summary()

- Fit the model to the training data:
history=classifier.fit(X_train, y_train, batch_size = 20, \

epochs = 100, validation_split=0.1, \

shuffle=False)

- Plot training and test error plots with two hidden layers of size 4 and 2. Print the best accuracy that was reached on the training and test sets:
# plot training error and test error plots

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

**Expected output**: - Print the values of the best accuracy that was achieved on the
**training**set and on the**test**set, as well as the**loss**and**accuracy**that was evaluated on the test dataset.print(f"Best Accuracy on training set = \

{max(history.history['accuracy'])*100:.3f}%")

print(f"Best Accuracy on validation set = \

{max(history.history['val_accuracy'])*100:.3f}%")

test_loss, test_acc = \

classifier.evaluate(X_test, y_test['AdvancedFibrosis'])

print(f'The loss on the test set is {test_loss:.4f} and \

the accuracy is {test_acc*100:.3f}%')

The following shows the output of the preceding code:

Best Accuracy on training set = 57.272%

Best Accuracy on test set = 54.054%

277/277 [==============================] - 0s 41us/step

The loss on the test set is 0.7016 and the accuracy is 49.819%

Note

To access the source code for this specific section, please refer to https://packt.live/2BrIRMF.

You can also run this example online at https://packt.live/2NUl22A.

# 4. Evaluating Your Model with Cross-Validation Using Keras Wrappers

## Activity 4.01: Model Evaluation Using Cross-Validation for an Advanced Fibrosis Diagnosis Classifier

In this activity, we are going to use what we learned in this topic to train and evaluate a deep learning model using **k-fold cross-validation**. We will use the model that resulted in the best test error rate from the previous activity and the goal will be to compare the cross-validation error rate with the training set/test set approach error rate. The dataset we will use is the hepatitis C dataset, in which we will build a classification model to predict which patients get advanced fibrosis. Follow these steps to complete this activity:

- Load the dataset and print the number of records and features in the dataset, as well as the number of possible classes in the target dataset:
# Load the dataset

import pandas as pd

X = pd.read_csv('../data/HCV_feats.csv')

y = pd.read_csv('../data/HCV_target.csv')

# Print the sizes of the dataset

print("Number of Examples in the Dataset = ", X.shape[0])

print("Number of Features for each example = ", X.shape[1])

print("Possible Output Classes = ", \

y['AdvancedFibrosis'].unique())

Here's the expected output:

Number of Examples in the Dataset = 1385

Number of Features for each example = 28

Possible Output Classes = [0 1]

- Define the function that returns the Keras model. First, import the necessary libraries for Keras. Inside the function, instantiate the sequential model and add two dense layers, with the first of
**size 4**and the second of**size 2**, both with**tanh activation**functions. Add the output layer with a**sigmoid activation**function. Compile the model and return the model from the function:from keras.models import Sequential

from keras.layers import Dense

# Create the function that returns the keras model

def build_model():

model = Sequential()

model.add(Dense(4, input_dim=X.shape[1], activation='tanh'))

model.add(Dense(2, activation='tanh'))

model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', \

metrics=['accuracy'])

return model

- Scale the training data using the
**StandardScaler**function. Set the seed so that the model is reproducible. Define the**n_folds**,**epochs**, and**batch_size**hyperparameters. Then, build the Keras wrapper with scikit-learn, define the**cross-validation**iterator, perform**k-fold cross-validation**, and store the scores:# import required packages

import numpy as np

from tensorflow import random

from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import StratifiedKFold

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X = pd.DataFrame(sc.fit_transform(X), columns=X.columns)

"""

define a seed for random number generator so the result will be reproducible

"""

seed = 1

np.random.seed(seed)

random.set_seed(seed)

"""

determine the number of folds for k-fold cross-validation, number of epochs and batch size

"""

n_folds = 5

epochs = 100

batch_size = 20

# build the scikit-learn interface for the keras model

classifier = KerasClassifier(build_fn=build_model, \

epochs=epochs, \

batch_size=batch_size, \

verbose=1, shuffle=False)

# define the cross-validation iterator

kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

random_state=seed)

"""

perform the k-fold cross-validation and store the scores in results

"""

results = cross_val_score(classifier, X, y, cv=kfold)

- For each of the folds, print the accuracy stored in the
**results**parameter:# print accuracy for each fold

for f in range(n_folds):

print("Test accuracy at fold ", f+1, " = ", results[f])

print("\n")

"""

print overall cross-validation accuracy plus the standard deviation of the accuracies

"""

print("Final Cross-validation Test Accuracy:", results.mean())

print("Standard Deviation of Final Test Accuracy:", results.std())

Here's the expected output:

Test accuracy at fold 1 = 0.5198556184768677

Test accuracy at fold 2 = 0.4693140685558319

Test accuracy at fold 3 = 0.512635350227356

Test accuracy at fold 4 = 0.5740072131156921

Test accuracy at fold 5 = 0.5523465871810913

Final Cross-Validation Test Accuracy: 0.5256317675113678

Standard Deviation of Final Test Accuracy: 0.03584760640500936

Note

To access the source code for this specific section, please refer to https://packt.live/3eWgR2b.

You can also run this example online at https://packt.live/3iBYtOi.

## Activity 4.02: Model Selection Using Cross-Validation for the Advanced Fibrosis Diagnosis Classifier

In this activity, we are going to improve our classifier for the hepatitis C dataset by using cross-validation for model selection and hyperparameter selection. Follow these steps to complete this activity:

- Import all the required packages and load the dataset. Scale the dataset using the
**StandardScaler**function:# import the required packages

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import StratifiedKFold

from sklearn.model_selection import cross_val_score

import numpy as np

import pandas as pd

from sklearn.preprocessing import StandardScaler

from tensorflow import random

# Load the dataset

X = pd.read_csv('../data/HCV_feats.csv')

y = pd.read_csv('../data/HCV_target.csv')

sc = StandardScaler()

X = pd.DataFrame(sc.fit_transform(X), columns=X.columns)

- Define three functions, each returning a different Keras model. The first model should have three hidden layers of
**size 4**, the second model should have two hidden layers, the first of**size 4**and the second of**size 2**, and the third model should have two hidden layers of**size 8**. Use function parameters for the activation functions and optimizers so that they can be passed through to the model. The goal is to find out which of these three models leads to the lowest cross-validation error rate:# Create the function that returns the keras model 1

def build_model_1(activation='relu', optimizer='adam'):

# create model 1

model = Sequential()

model.add(Dense(4, input_dim=X.shape[1], \

activation=activation))

model.add(Dense(4, activation=activation))

model.add(Dense(4, activation=activation))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', \

optimizer=optimizer, metrics=['accuracy'])

return model

# Create the function that returns the keras model 2

def build_model_2(activation='relu', optimizer='adam'):

# create model 2

model = Sequential()

model.add(Dense(4, input_dim=X.shape[1], \

activation=activation))

model.add(Dense(2, activation=activation))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', \

optimizer=optimizer, metrics=['accuracy'])

return model

# Create the function that returns the keras model 3

def build_model_3(activation='relu', optimizer='adam'):

# create model 3

model = Sequential()

model.add(Dense(8, input_dim=X.shape[1], \

activation=activation))

model.add(Dense(8, activation=activation))

model.add(Dense(1, activation='sigmoid'))

# Compile model

model.compile(loss='binary_crossentropy', \

optimizer=optimizer, metrics=['accuracy'])

return model

Write the code that will loop over the three models and perform

**5-fold cross-validation**. Set the seed so that the models are reproducible and define the**n_folds**,**batch_size**, and**epochs**hyperparameters. Store the results from applying the**cross_val_score**function when training the models:"""

define a seed for random number generator so the result will be reproducible

"""

seed = 2

np.random.seed(seed)

random.set_seed(seed)

"""

determine the number of folds for k-fold cross-validation, number of epochs and batch size

"""

n_folds = 5

batch_size=20

epochs=100

# define the list to store cross-validation scores

results_1 = []

# define the possible options for the model

models = [build_model_1, build_model_2, build_model_3]

# loop over models

for m in range(len(models)):

# build the scikit-learn interface for the keras model

classifier = KerasClassifier(build_fn=models[m], \

epochs=epochs, \

batch_size=batch_size, \

verbose=0, shuffle=False)

# define the cross-validation iterator

kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

random_state=seed)

"""

perform the k-fold cross-validation and store the scores

in result

"""

result = cross_val_score(classifier, X, y, cv=kfold)

# add the scores to the results list

results_1.append(result)

# Print cross-validation score for each model

for m in range(len(models)):

print("Model", m+1,"Test Accuracy =", results_1[m].mean())

Here's an example output. In this instance,

**Model 2**has the best cross-validation test accuracy, as you can see below:Model 1 Test Accuracy = 0.4996389865875244

Model 2 Test Accuracy = 0.5148014307022095

Model 3 Test Accuracy = 0.5097472846508027

- Choose the model with the highest accuracy score and repeat
*step 2*by iterating over the**epochs = [100, 200]**and**batches = [10, 20]**values and performing**5-fold cross-validation**:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# determine the number of folds for k-fold cross-validation

n_folds = 5

# define possible options for epochs and batch_size

epochs = [100, 200]

batches = [10, 20]

# define the list to store cross-validation scores

results_2 = []

# loop over all possible pairs of epochs, batch_size

for e in range(len(epochs)):

for b in range(len(batches)):

# build the scikit-learn interface for the keras model

classifier = KerasClassifier(build_fn=build_model_2, \

epochs=epochs[e], \

batch_size=batches[b], \

verbose=0)

# define the cross-validation iterator

kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

random_state=seed)

# perform the k-fold cross-validation.

# store the scores in result

result = cross_val_score(classifier, X, y, cv=kfold)

# add the scores to the results list

results_2.append(result)

"""

Print cross-validation score for each possible pair of epochs, batch_size

"""

c = 0

for e in range(len(epochs)):

for b in range(len(batches)):

print("batch_size =", batches[b],", epochs =", epochs[e], \

", Test Accuracy =", results_2[c].mean())

c += 1

Here's an example output:

batch_size = 10 , epochs = 100 , Test Accuracy = 0.5010830342769623

batch_size = 20 , epochs = 100 , Test Accuracy = 0.5126353740692139

batch_size = 10 , epochs = 200 , Test Accuracy = 0.5176895320416497

batch_size = 20 , epochs = 200 , Test Accuracy = 0.5075812220573426

In this case, the

**batch_size= 10**,**epochs=200**pair has the best cross-validation test accuracy. - Choose the batch size and epochs with the highest accuracy score and repeat
*step 3*by iterating over the**optimizers = ['rmsprop', 'adam','sgd']**and**activations = ['relu', 'tanh']**values and performing**5-fold cross-validation**:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

"""

determine the number of folds for k-fold cross-validation, number of epochs and batch size

"""

n_folds = 5

batch_size = 10

epochs = 200

# define the list to store cross-validation scores

results_3 = []

# define possible options for optimizer and activation

optimizers = ['rmsprop', 'adam','sgd']

activations = ['relu', 'tanh']

# loop over all possible pairs of optimizer, activation

for o in range(len(optimizers)):

for a in range(len(activations)):

optimizer = optimizers[o]

activation = activations[a]

# build the scikit-learn interface for the keras model

classifier = KerasClassifier(build_fn=build_model_2, \

epochs=epochs, \

batch_size=batch_size, \

verbose=0, shuffle=False)

# define the cross-validation iterator

kfold = StratifiedKFold(n_splits=n_folds, shuffle=True, \

random_state=seed)

# perform the k-fold cross-validation.

# store the scores in result

result = cross_val_score(classifier, X, y, cv=kfold)

# add the scores to the results list

results_3.append(result)

"""

Print cross-validation score for each possible pair of optimizer, activation

"""

c = 0

for o in range(len(optimizers)):

for a in range(len(activations)):

print("activation = ", activations[a],", optimizer = ", \

optimizers[o], ", Test accuracy = ", \

results_3[c].mean())

c += 1

Here's the expected output:

activation = relu , optimizer = rmsprop ,

Test accuracy = 0.5234657049179077

activation = tanh , optimizer = rmsprop ,

Test accuracy = 0.49602887630462644

activation = relu , optimizer = adam ,

Test accuracy = 0.5039711117744445

activation = tanh , optimizer = adam ,

Test accuracy = 0.4989169597625732

activation = relu , optimizer = sgd ,

Test accuracy = 0.48953068256378174

activation = tanh , optimizer = sgd ,

Test accuracy = 0.5191335678100586

Here, the

**activation='relu'**and**optimizer='rmsprop'**pair has the best cross-validation test accuracy. Also, the**activation='tanh'**and**optimizer='sgd'**pair results in the second-best performance.Note

To access the source code for this specific section, please refer to https://packt.live/2D3AIhD.

You can also run this example online at https://packt.live/2NUpiiC.

## Activity 4.03: Model Selection Using Cross-validation on a Traffic Volume Dataset

In this activity, you are going to practice model selection using cross-validation one more time. Here, we are going to use a simulated dataset that represents a target variable representing the volume of traffic in cars/hour across a city bridge and various normalized features related to traffic data such as time of day and the traffic volume on the previous day. Our goal is to build a model that predicts the traffic volume across the city bridge given the various features. Follow these steps to complete this activity:

- Import all the required packages and load the dataset:
# import the required packages

from keras.models import Sequential

from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasRegressor

from sklearn.model_selection import KFold

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import make_pipeline

import numpy as np

import pandas as pd

from tensorflow import random

- Load the dataset, print the input and output size for the feature dataset, and print the possible classes in the target dataset. Also, print the range of the output:
# Load the dataset

# Load the dataset

X = pd.read_csv('../data/traffic_volume_feats.csv')

y = pd.read_csv('../data/traffic_volume_target.csv')

# Print the sizes of input data and output data

print("Input data size = ", X.shape)

print("Output size = ", y.shape)

# Print the range for output

print(f"Output Range = ({y['Volume'].min()}, \

{ y['Volume'].max()})")

Here's the expected output:

Input data size = (10000, 10)

Output size = (10000, 1)

Output Range = (0.000000, 584.000000)

- Define three functions, each returning a different Keras model. The first model should have one hidden layer of
**size 10**, the second model should have two hidden layers of**size 10**, and the third model should have three hidden layers of**size 10**. Use function parameters for the optimizers so that they can be passed through to the model. The goal is to find out which of these three models leads to the lowest cross-validation error rate:# Create the function that returns the keras model 1

def build_model_1(optimizer='adam'):

# create model 1

model = Sequential()

model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

model.add(Dense(1))

# Compile model

model.compile(loss='mean_squared_error', optimizer=optimizer)

return model

# Create the function that returns the keras model 2

def build_model_2(optimizer='adam'):

# create model 2

model = Sequential()

model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

model.add(Dense(10, activation='relu'))

model.add(Dense(1))

# Compile model

model.compile(loss='mean_squared_error', optimizer=optimizer)

return model

# Create the function that returns the keras model 3

def build_model_3(optimizer='adam'):

# create model 3

model = Sequential()

model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

model.add(Dense(10, activation='relu'))

model.add(Dense(10, activation='relu'))

model.add(Dense(1))

# Compile model

model.compile(loss='mean_squared_error', optimizer=optimizer)

return model

- Write the code that will loop over the three models and perform
**5-fold cross-validation**. Set the seed so that the models are reproducible and define the**n_folds**hyperparameters. Store the results from applying the**cross_val_score**function when training the models:"""

define a seed for random number generator so the result will be reproducible

"""

seed = 1

np.random.seed(seed)

random.set_seed(seed)

# determine the number of folds for k-fold cross-validation

n_folds = 5

# define the list to store cross-validation scores

results_1 = []

# define the possible options for the model

models = [build_model_1, build_model_2, build_model_3]

# loop over models

for i in range(len(models)):

# build the scikit-learn interface for the keras model

regressor = KerasRegressor(build_fn=models[i], epochs=100, \

batch_size=50, verbose=0, \

shuffle=False)

"""

build the pipeline of transformations so for each fold training

set will be scaled and test set will be scaled accordingly.

"""

model = make_pipeline(StandardScaler(), regressor)

# define the cross-validation iterator

kfold = KFold(n_splits=n_folds, shuffle=True, \

random_state=seed)

# perform the k-fold cross-validation.

# store the scores in result

result = cross_val_score(model, X, y, cv=kfold)

# add the scores to the results list

results_1.append(result)

# Print cross-validation score for each model

for i in range(len(models)):

print("Model ", i+1," test error rate = ", \

abs(results_1[i].mean()))

The following is the expected output:

Model 1 test error rate = 25.48777518749237

Model 2 test error rate = 25.30460816860199

Model 3 test error rate = 25.390239462852474

**Model 2**(a two-layer neural network) has the lowest test error rate. - Choose the model with the lowest test error rate and repeat
*step 4*while iterating over**epochs = [80, 100]**and**batches = [50, 25]**and performing**5-fold cross-validation**:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# determine the number of folds for k-fold cross-validation

n_folds = 5

# define the list to store cross-validation scores

results_2 = []

# define possible options for epochs and batch_size

epochs = [80, 100]

batches = [50, 25]

# loop over all possible pairs of epochs, batch_size

for i in range(len(epochs)):

for j in range(len(batches)):

# build the scikit-learn interface for the keras model

regressor = KerasRegressor(build_fn=build_model_2, \

epochs=epochs[i], \

batch_size=batches[j], \

verbose=0, shuffle=False)

"""

build the pipeline of transformations so for each fold

training set will be scaled and test set will be scaled

accordingly.

"""

model = make_pipeline(StandardScaler(), regressor)

# define the cross-validation iterator

kfold = KFold(n_splits=n_folds, shuffle=True, \

random_state=seed)

# perform the k-fold cross-validation.

# store the scores in result

result = cross_val_score(model, X, y, cv=kfold)

# add the scores to the results list

results_2.append(result)

"""

Print cross-validation score for each possible pair of epochs, batch_size

"""

c = 0

for i in range(len(epochs)):

for j in range(len(batches)):

print("batch_size = ", batches[j],\

", epochs = ", epochs[i], \

", Test error rate = ", abs(results_2[c].mean()))

c += 1

Here's the expected output:

batch_size = 50 , epochs = 80 , Test error rate = 25.270704221725463

batch_size = 25 , epochs = 80 , Test error rate = 25.309741401672362

batch_size = 50 , epochs = 100 , Test error rate = 25.095393986701964

batch_size = 25 , epochs = 100 , Test error rate = 25.24592453837395

The

**batch_size=5**and**epochs=100**pair has the lowest test error rate. - Choose the model with the highest accuracy score and repeat
*step 2*by iterating over**optimizers = ['rmsprop', 'sgd', 'adam']**and performing**5-fold cross-validation**:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# determine the number of folds for k-fold cross-validation

n_folds = 5

# define the list to store cross-validation scores

results_3 = []

# define the possible options for the optimizer

optimizers = ['adam', 'sgd', 'rmsprop']

# loop over optimizers

for i in range(len(optimizers)):

optimizer=optimizers[i]

# build the scikit-learn interface for the keras model

regressor = KerasRegressor(build_fn=build_model_2, \

epochs=100, batch_size=50, \

verbose=0, shuffle=False)

"""

build the pipeline of transformations so for each fold training

set will be scaled and test set will be scaled accordingly.

"""

model = make_pipeline(StandardScaler(), regressor)

# define the cross-validation iterator

kfold = KFold(n_splits=n_folds, shuffle=True, \

random_state=seed)

# perform the k-fold cross-validation.

# store the scores in result

result = cross_val_score(model, X, y, cv=kfold)

# add the scores to the results list

results_3.append(result)

# Print cross-validation score for each optimizer

for i in range(len(optimizers)):

print("optimizer=", optimizers[i]," test error rate = ", \

abs(results_3[i].mean()))

Here's the expected output:

optimizer= adam test error rate = 25.391812739372256

optimizer= sgd test error rate = 25.140230269432067

optimizer= rmsprop test error rate = 25.217947859764102

**optimizer='sgd'**has the lowest test error rate, so we should proceed with this particular model.Note

To access the source code for this specific section, please refer to https://packt.live/31TcYaD.

You can also run this example online at https://packt.live/3iq6iqb.

# 5. Improving Model Accuracy

## Activity 5.01: Weight Regularization on an Avila Pattern Classifier

In this activity, you will build a Keras model to perform classification on the Avila pattern dataset according to given network architecture and hyperparameter values. The goal is to apply different types of weight regularization on the model, that is, **L1** and **L2**, and observe how each type changes the result. Follow these steps to complete this activity:

- Load the dataset and split the dataset into a
**training set**and a**test set**:# Load the dataset

import pandas as pd

X = pd.read_csv('../data/avila-tr_feats.csv')

y = pd.read_csv('../data/avila-tr_target.csv')

"""

Split the dataset into training set and test set with a 0.8-0.2 ratio

"""

from sklearn.model_selection import train_test_split

seed = 1

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.2, random_state=seed)

- Define a Keras sequential model with three hidden layers, the first of
**size 10**, the second of**size 6**, and the third of**size 4**. Finally, compile the model:"""

define a seed for random number generator so the result will be reproducible

"""

import numpy as np

from tensorflow import random

np.random.seed(seed)

random.set_seed(seed)

# define the keras model

from keras.models import Sequential

from keras.layers import Dense

model_1 = Sequential()

model_1.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu'))

model_1.add(Dense(6, activation='relu'))

model_1.add(Dense(4, activation='relu'))

model_1.add(Dense(1, activation='sigmoid'))

model_1.compile(loss='binary_crossentropy', optimizer='sgd', \

metrics=['accuracy'])

- Fit the model to the training data to perform the classification, saving the results of the training process:
history=model_1.fit(X_train, y_train, batch_size = 20, epochs = 100, \

validation_data=(X_test, y_test), \

verbose=0, shuffle=False)

- Plot the trends in training error and test error by importing the necessary libraries for plotting the loss and validation loss and saving them in the variable that was created when the model was fit to the training process. Print out the maximum validation accuracy:
import matplotlib.pyplot as plt

import matplotlib

%matplotlib inline

# plot training error and test error

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim(0,1)

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Best Accuracy on Validation Set =", \

max(history.history['val_accuracy']))

The following is the expected output:

The validation loss keeps decreasing along with the training loss. Despite having no regularization, this is a fairly good example of the training process since the bias and variance are fairly low.

- Redefine the model, adding
**L2 regularizers**with**lambda=0.01**to each hidden layer of the model. Repeat*steps 3*and*4*to train the model and plot the**training error**and**validation error**:"""

set up a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# define the keras model with l2 regularization with lambda = 0.01

from keras.regularizers import l2

l2_param = 0.01

model_2 = Sequential()

model_2.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu', \

kernel_regularizer=l2(l2_param)))

model_2.add(Dense(6, activation='relu', \

kernel_regularizer=l2(l2_param)))

model_2.add(Dense(4, activation='relu', \

kernel_regularizer=l2(l2_param)))

model_2.add(Dense(1, activation='sigmoid'))

model_2.compile(loss='binary_crossentropy', optimizer='sgd', \

metrics=['accuracy'])

# train the model using training set while evaluating on test set

history=model_2.fit(X_train, y_train, batch_size = 20, epochs = 100, \

validation_data=(X_test, y_test), \

verbose=0, shuffle=False)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim(0,1)

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Best Accuracy on Validation Set =", \

max(history.history['val_accuracy']))

The following is the expected output:

As shown from the preceding plots, the test error almost plateaus after being decreased to a certain amount. The gap between the training error and the validation error at the end of the training process (the bias) is slightly smaller, which is indicative of reduced overfitting of the model for the training examples.

- Repeat the previous step with
**lambda=0.1**for the**L2 parameter**—redefine the model with the new lambda parameter, fit the model to the training data, and repeat*step 4*to plot the training error and validation error:"""

set up a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

from keras.regularizers import l2

l2_param = 0.1

model_3 = Sequential()

model_3.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu', \

kernel_regularizer=l2(l2_param)))

model_3.add(Dense(6, activation='relu', \

kernel_regularizer=l2(l2_param)))

model_3.add(Dense(4, activation='relu', \

kernel_regularizer=l2(l2_param)))

model_3.add(Dense(1, activation='sigmoid'))

model_3.compile(loss='binary_crossentropy', optimizer='sgd', \

metrics=['accuracy'])

# train the model using training set while evaluating on test set

history=model_3.fit(X_train, y_train, batch_size = 20, \

epochs = 100, validation_data=(X_test, y_test), \

verbose=0, shuffle=False)

# plot training error and test error

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim(0,1)

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Best Accuracy on Validation Set =", \

max(history.history['val_accuracy']))

The following is the expected output:

The training and validation error quickly plateau and are much higher than they were for the models we created with a lower

**L2 parameter**, indicating that we have penalized the model so much that it has not had the flexibility to learn the underlying function of the training data. Following this, we will reduce the value of the regularization parameter to prevent it from penalizing the model as much. - Repeat the previous step, this time with
**lambda=0.005**. Repeat*step 4*to plot the training error and validation error:"""

set up a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# define the keras model with l2 regularization with lambda = 0.05

from keras.regularizers import l2

l2_param = 0.005

model_4 = Sequential()

model_4.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu', \

kernel_regularizer=l2(l2_param)))

model_4.add(Dense(6, activation='relu', \

kernel_regularizer=l2(l2_param)))

model_4.add(Dense(4, activation='relu', \

kernel_regularizer=l2(l2_param)))

model_4.add(Dense(1, activation='sigmoid'))

model_4.compile(loss='binary_crossentropy', optimizer='sgd', \

metrics=['accuracy'])

# train the model using training set while evaluating on test set

history=model_4.fit(X_train, y_train, batch_size = 20, \

epochs = 100, validation_data=(X_test, y_test), \

verbose=0, shuffle=False)

# plot training error and test error

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim(0,1)

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Best Accuracy on Validation Set =", \

max(history.history['val_accuracy']))

The following is the expected output:

The value for the

**L2 weight**regularization achieves the highest accuracy that was evaluated on the validation data of all the models with**L2 regularization**, but it is slightly lower than without regularization. Again, the test error does not increase a significant amount after being decreased to a certain value, which is indicative of the model not overfitting the training examples. It seems that**L2 weight regularization**with**lambda=0.005**achieves the lowest validation error while preventing the model from overfitting. - Add
**L1 regularizers**with**lambda=0.01**to the hidden layers of your model. Redefine the model with the new lambda parameter, fit the model to the training data, and repeat*step 4*to plot the training error and validation error:"""

set up a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# define the keras model with l1 regularization with lambda = 0.01

from keras.regularizers import l1

l1_param = 0.01

model_5 = Sequential()

model_5.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu', \

kernel_regularizer=l1(l1_param)))

model_5.add(Dense(6, activation='relu', \

kernel_regularizer=l1(l1_param)))

model_5.add(Dense(4, activation='relu', \

kernel_regularizer=l1(l1_param)))

model_5.add(Dense(1, activation='sigmoid'))

model_5.compile(loss='binary_crossentropy', optimizer='sgd', \

metrics=['accuracy'])

# train the model using training set while evaluating on test set

history=model_5.fit(X_train, y_train, batch_size = 20, \

epochs = 100, validation_data=(X_test, y_test), \

verbose=0, shuffle=True)

# plot training error and test error

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim(0,1)

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Best Accuracy on Validation Set =", \

max(history.history['val_accuracy']))

The following is the expected output:

- Repeat the previous step with
**lambda=0.005**for the**L1 parameter**—redefine the model with the new lambda parameter, fit the model to the training data, and repeat*step 4*to plot the**training error**and**validation error**:"""

set up a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# define the keras model with l1 regularization with lambda = 0.1

from keras.regularizers import l1

l1_param = 0.005

model_6 = Sequential()

model_6.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu', \

kernel_regularizer=l1(l1_param)))

model_6.add(Dense(6, activation='relu', \

kernel_regularizer=l1(l1_param)))

model_6.add(Dense(4, activation='relu', \

kernel_regularizer=l1(l1_param)))

model_6.add(Dense(1, activation='sigmoid'))

model_6.compile(loss='binary_crossentropy', optimizer='sgd', \

metrics=['accuracy'])

# train the model using training set while evaluating on test set

history=model_6.fit(X_train, y_train, batch_size = 20, \

epochs = 100, validation_data=(X_test, y_test), \

verbose=0, shuffle=False)

# plot training error and test error

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim(0,1)

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Best Accuracy on Validation Set =", \

max(history.history['val_accuracy']))

The following is the expected output:

It seems that

**L1 weight regularization**with**lambda=0.005**achieves a better test error while preventing the model from overfitting since the value of**lambda=0.01**is too restrictive and prevents the model from learning the underlying function of the training data. - Add
**L1**and**L2 regularizers**with an**L1**of**lambda=0.005**and an**L2**of**lambda = 0.005**to the hidden layers of your model. Then, repeat*step 4*to plot the training error and validation error:"""

set up a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

"""

define the keras model with l1_l2 regularization with l1_lambda = 0.005 and l2_lambda = 0.005

"""

from keras.regularizers import l1_l2

l1_param = 0.005

l2_param = 0.005

model_7 = Sequential()

model_7.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu', \

kernel_regularizer=l1_l2(l1=l1_param, l2=l2_param)))

model_7.add(Dense(6, activation='relu', \

kernel_regularizer=l1_l2(l1=l1_param, \

l2=l2_param)))

model_7.add(Dense(4, activation='relu', \

kernel_regularizer=l1_l2(l1=l1_param, \

l2=l2_param)))

model_7.add(Dense(1, activation='sigmoid'))

model_7.compile(loss='binary_crossentropy', optimizer='sgd', \

metrics=['accuracy'])

# train the model using training set while evaluating on test set

history=model_7.fit(X_train, y_train, batch_size = 20, \

epochs = 100, validation_data=(X_test, y_test), \

verbose=0, shuffle=True)

# plot training error and test error

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim(0,1)

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Best Accuracy on Validation Set =", \

max(history.history['val_accuracy']))

The following is the expected output:

While **L1** and **L2 regularization** are successful in preventing the model from overfitting, the variance in the model is very low. However, the accuracy that's obtained on the validation data is not as high as the model that was trained with no regularization or the model that was trained with the **L2 regularization** **lambda=0.005** or **L1 regularization** **lambda=0.005** parameters individually.

Note

To access the source code for this specific section, please refer to https://packt.live/31BUf34.

You can also run this example online at https://packt.live/38n291s.

## Activity 5.02: Dropout Regularization on the Traffic Volume Dataset

In this activity, you will start with the model from* Activity 4.03*, *Model Selection Using Cross-Validation on a Traffic Volume Dataset*, of *Chapter 4*, *Evaluating Your Model with Cross-Validation Using Keras Wrappers*. You will use the training set/test set approach to train and evaluate the model, plot the trends in training error and the generalization error, and observe the model overfitting the data examples. Then, you will attempt to improve model performance by addressing the overfitting issue through the use of dropout regularization. In particular, you will try to find out which layers you should add dropout regularization to and what **rate** value will improve this specific model the most. Follow these steps to complete this exercise:

- Load the dataset using the pandas
**read_csv**function, split the dataset into a training set and test set into an**80-20**ratio using**train_test_split**, and scale the input data using**StandardScaler**:# Load the dataset

import pandas as pd

X = pd.read_csv('../data/traffic_volume_feats.csv')

y = pd.read_csv('../data/traffic_volume_target.csv')

"""

Split the dataset into training set and test set with an 80-20 ratio

"""

from sklearn.model_selection import train_test_split

seed=1

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.2, random_state=seed)

- Set a seed so that the model can be reproduced. Next, define a Keras sequential model with two hidden layers of
**size 10**, both with**ReLU activation**functions. Add an output layer with no activation function and compile the model with the given hyperparameters:"""

define a seed for random number generator so the result will be reproducible

"""

import numpy as np

from tensorflow import random

np.random.seed(seed)

random.set_seed(seed)

from keras.models import Sequential

from keras.layers import Dense

# create model

model_1 = Sequential()

model_1.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu'))

model_1.add(Dense(10, activation='relu'))

model_1.add(Dense(1))

# Compile model

model_1.compile(loss='mean_squared_error', optimizer='rmsprop')

- Train the model on the training data with the given hyperparameters:
# train the model using training set while evaluating on test set

history=model_1.fit(X_train, y_train, batch_size = 50, \

epochs = 200, validation_data=(X_test, y_test), \

verbose=0)

- Plot the trends for the
**training error**and**test error**. Print the best accuracy that was reached for the training and validation set:import matplotlib.pyplot as plt

import matplotlib

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

# plot training error and test error plots

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim((0, 25000))

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Lowest error on training set = ", \

min(history.history['loss']))

print("Lowest error on validation set = ", \

min(history.history['val_loss']))

The following is the expected output:

Lowest error on training set = 24.673954981565476

Lowest error on validation set = 25.11553382873535

In the training error and validation error values, there is a very small gap between the training error and validation error, which is indicative of a low variance model, which is good.

- Redefine the model by creating the same model architecture. However, this time, add a dropout regularization with
**rate=0.1**to the first hidden layer of your model. Repeat*step 3*to train the model on the training data and repeat*step 4*to plot the trends for the training and validation errors. Then, print the best accuracy that was reached on the validation set:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

from keras.layers import Dropout

# create model

model_2 = Sequential()

model_2.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu'))

model_2.add(Dropout(0.1))

model_2.add(Dense(10, activation='relu'))

model_2.add(Dense(1))

# Compile model

model_2.compile(loss='mean_squared_error', \

optimizer='rmsprop')

# train the model using training set while evaluating on test set

history=model_2.fit(X_train, y_train, batch_size = 50, \

epochs = 200, validation_data=(X_test, y_test), \

verbose=0, shuffle=False)

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim((0, 25000))

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Lowest error on training set = ", \

min(history.history['loss']))

print("Lowest error on validation set = ", \

min(history.history['val_loss']))

The following is the expected output:

Lowest error on training set = 407.8203821182251

Lowest error on validation set = 54.58488750457764

There is a small gap between the training error and the validation error; however, the validation error is lower than the training error, indicating that the model is not overfitting the training data.

- Repeat the previous step, this time adding dropout regularization with
**rate=0.1**to both hidden layers of your model. Repeat*step 3*to train the model on the training data and repeat*step 4*to plot the trends for the training and validation errors. Then, print the best accuracy that was reached on the validation set:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# create model

model_3 = Sequential()

model_3.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu'))

model_3.add(Dropout(0.1))

model_3.add(Dense(10, activation='relu'))

model_3.add(Dropout(0.1))

model_3.add(Dense(1))

# Compile model

model_3.compile(loss='mean_squared_error', \

optimizer='rmsprop')

# train the model using training set while evaluating on test set

history=model_3.fit(X_train, y_train, batch_size = 50, \

epochs = 200, validation_data=(X_test, y_test), \

verbose=0, shuffle=False)

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim((0, 25000))

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Lowest error on training set = ", \

min(history.history['loss']))

print("Lowest error on validation set = ", \

min(history.history['val_loss']))

The following is the expected output:

Lowest error on training set = 475.9299939632416

Lowest error on validation set = 61.646054649353026

The gap between the training error and validation error is slightly higher here, mostly due to the increase in the training error as a result of the additional regularization on the second hidden layer of the model.

- Repeat the previous step, this time adding dropout regularization with
**rate=0.2**in the first layer and**rate=0.1**in the second layer of your model. Repeat*step 3*to train the model on the training data and repeat*step 4*to plot the trends for the training and validation errors. Then, print the best accuracy that was reached on the validation set:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# create model

model_4 = Sequential()

model_4.add(Dense(10, input_dim=X_train.shape[1], \

activation='relu'))

model_4.add(Dropout(0.2))

model_4.add(Dense(10, activation='relu'))

model_4.add(Dropout(0.1))

model_4.add(Dense(1))

# Compile model

model_4.compile(loss='mean_squared_error', optimizer='rmsprop')

# train the model using training set while evaluating on test set

history=model_4.fit(X_train, y_train, batch_size = 50, epochs = 200, \

validation_data=(X_test, y_test), verbose=0)

matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

plt.plot(history.history['loss'])

plt.plot(history.history['val_loss'])

plt.ylim((0, 25000))

plt.ylabel('loss')

plt.xlabel('epoch')

plt.legend(['train loss', 'validation loss'], loc='upper right')

# print the best accuracy reached on the test set

print("Lowest error on training set = ", \

min(history.history['loss']))

print("Lowest error on validation set = ", \

min(history.history['val_loss']))

The following is the expected output:

Lowest error on training set = 935.1562484741211

Lowest error on validation set = 132.39965686798095

The gap between the training error and validation error is slightly larger due to the increase in regularization. In this case, there was no overfitting in the original model. As a result, regularization increased the error rate on the training and validation dataset.

Note

To access the source code for this specific section, please refer to https://packt.live/38mtDo7.

You can also run this example online at https://packt.live/31Isdmu.

## Activity 5.03: Hyperparameter Tuning on the Avila Pattern Classifier

In this activity, you will build a Keras model similar to those in the previous activities, but this time, you will add regularization methods to your model as well. Then, you will use scikit-learn optimizers to perform tuning on the model hyperparameters, including the hyperparameters of the regularizers. Follow these steps to complete this activity:

- Load the dataset and import the libraries:
# Load The dataset

import pandas as pd

X = pd.read_csv('../data/avila-tr_feats.csv')

y = pd.read_csv('../data/avila-tr_target.csv')

- Define a function that returns a Keras model with three hidden layers, the first of
**size 10**, the second of**size 6**, and the third of**size 4**, and apply**L2 weight regularization**and a**ReLU activation**function on each hidden layer. Compile the model with the given parameters and return it from the model:# Create the function that returns the keras model

from keras.models import Sequential

from keras.layers import Dense

from keras.regularizers import l2

def build_model(lambda_parameter):

model = Sequential()

model.add(Dense(10, input_dim=X.shape[1], \

activation='relu', \

kernel_regularizer=l2(lambda_parameter)))

model.add(Dense(6, activation='relu', \

kernel_regularizer=l2(lambda_parameter)))

model.add(Dense(4, activation='relu', \

kernel_regularizer=l2(lambda_parameter)))

model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', \

optimizer='sgd', metrics=['accuracy'])

return model

- Set a seed, use a scikit-learn wrapper to wrap the model that we created in the previous step, and define the hyperparameters to scan. Finally, perform
**GridSearchCV()**on the model using the hyperparameter's grid and fit the model:from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import GridSearchCV

"""

define a seed for random number generator so the result will be reproducible

"""

import numpy as np

from tensorflow import random

seed = 1

np.random.seed(seed)

random.set_seed(seed)

# create the Keras wrapper with scikit learn

model = KerasClassifier(build_fn=build_model, verbose=0, \

shuffle=False)

# define all the possible values for each hyperparameter

lambda_parameter = [0.01, 0.5, 1]

epochs = [50, 100]

batch_size = [20]

"""

create the dictionary containing all possible values of hyperparameters

"""

param_grid = dict(lambda_parameter=lambda_parameter, \

epochs=epochs, batch_size=batch_size)

# perform 5-fold cross-validation for ??????? store the results

grid_seach = GridSearchCV(estimator=model, \

param_grid=param_grid, cv=5)

results_1 = grid_seach.fit(X, y)

- Print the results for the best cross-validation score that's stored within the variable we created in the fit process. Iterate through all the parameters and print the mean of the accuracy across all the folds, the standard deviation of the accuracy, and the parameters themselves:
print("Best cross-validation score =", results_1.best_score_)

print("Parameters for Best cross-validation score=", \

results_1.best_params_)

# print the results for all evaluated hyperparameter combinations

accuracy_means = results_1.cv_results_['mean_test_score']

accuracy_stds = results_1.cv_results_['std_test_score']

parameters = results_1.cv_results_['params']

for p in range(len(parameters)):

print("Accuracy %f (std %f) for params %r" % \

(accuracy_means[p], accuracy_stds[p], parameters[p]))

The following is the expected output:

Best cross-validation score = 0.7673058390617371

Parameters for Best cross-validation score= {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.01}

Accuracy 0.764621 (std 0.004330) for params {'batch_size': 20,

'epochs': 50, 'lambda_parameter': 0.01}

Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

'epochs': 50, 'lambda_parameter': 0.5}

Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

'epochs': 50, 'lambda_parameter': 1}

Accuracy 0.767306 (std 0.015872) for params {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.01}

Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.5}

Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 1}

- Repeat
*step 3*using**GridSearchCV()**,**lambda_parameter = [0.001, 0.01, 0.05, 0.1]**,**batch_size = [20]**, and**epochs = [100]**. Fit the model to the training data using**5-fold cross-validation**and print the results for the entire grid:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# create the Keras wrapper with scikit learn

model = KerasClassifier(build_fn=build_model, verbose=0, shuffle=False)

# define all the possible values for each hyperparameter

lambda_parameter = [0.001, 0.01, 0.05, 0.1]

epochs = [100]

batch_size = [20]

"""

create the dictionary containing all possible values of hyperparameters

"""

param_grid = dict(lambda_parameter=lambda_parameter, \

epochs=epochs, batch_size=batch_size)

"""

search the grid, perform 5-fold cross-validation for each possible combination, store the results

"""

grid_seach = GridSearchCV(estimator=model, \

param_grid=param_grid, cv=5)

results_2 = grid_seach.fit(X, y)

# print the results for best cross-validation score

print("Best cross-validation score =", results_2.best_score_)

print("Parameters for Best cross-validation score =", \

results_2.best_params_)

# print the results for the entire grid

accuracy_means = results_2.cv_results_['mean_test_score']

accuracy_stds = results_2.cv_results_['std_test_score']

parameters = results_2.cv_results_['params']

for p in range(len(parameters)):

print("Accuracy %f (std %f) for params %r" % \

(accuracy_means[p], accuracy_stds[p], parameters[p]))

The following is the expected output:

Best cross-validation score = 0.786385428905487

Parameters for Best cross-validation score = {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.001}

Accuracy 0.786385 (std 0.010177) for params {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.001}

Accuracy 0.693960 (std 0.084994) for params {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.01}

Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.05}

Accuracy 0.589070 (std 0.008244) for params {'batch_size': 20,

'epochs': 100, 'lambda_parameter': 0.1}

- Redefine a function that returns a Keras model with three hidden layers, the first of
**size 10**, the second of**size 6**, and the third of**size 4**, and apply**dropout regularization**and a**ReLU activation**function on each hidden layer. Compile the model with the given parameters and return it from the function:# Create the function that returns the keras model

from keras.layers import Dropout

def build_model(rate):

model = Sequential()

model.add(Dense(10, input_dim=X.shape[1], activation='relu'))

model.add(Dropout(rate))

model.add(Dense(6, activation='relu'))

model.add(Dropout(rate))

model.add(Dense(4, activation='relu'))

model.add(Dropout(rate))

model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', \

optimizer='sgd', metrics=['accuracy'])

return model

- Use
**rate = [0, 0.1, 0.2]**and**epochs = [50, 100]**and perform**GridSearchCV()**on the model. Fit the model to the training data using**5-fold cross-validation**and print the results for the entire grid:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# create the Keras wrapper with scikit learn

model = KerasClassifier(build_fn=build_model, verbose=0,shuffle=False)

# define all the possible values for each hyperparameter

rate = [0, 0.1, 0.2]

epochs = [50, 100]

batch_size = [20]

"""

create the dictionary containing all possible values of hyperparameters

"""

param_grid = dict(rate=rate, epochs=epochs, batch_size=batch_size)

"""

perform 5-fold cross-validation for 10 randomly selected combinations, store the results

"""

grid_seach = GridSearchCV(estimator=model, \

param_grid=param_grid, cv=5)

results_3 = grid_seach.fit(X, y)

# print the results for best cross-validation score

print("Best cross-validation score =", results_3.best_score_)

print("Parameters for Best cross-validation score =", \

results_3.best_params_)

# print the results for the entire grid

accuracy_means = results_3.cv_results_['mean_test_score']

accuracy_stds = results_3.cv_results_['std_test_score']

parameters = results_3.cv_results_['params']

for p in range(len(parameters)):

print("Accuracy %f (std %f) for params %r" % \

(accuracy_means[p], accuracy_stds[p], parameters[p]))

The following is the expected output:

Best cross-validation score= 0.7918504476547241

Parameters for Best cross-validation score= {'batch_size': 20,

'epochs': 100, 'rate': 0}

Accuracy 0.786769 (std 0.008255) for params {'batch_size': 20,

'epochs': 50, 'rate': 0}

Accuracy 0.764717 (std 0.007691) for params {'batch_size': 20,

'epochs': 50, 'rate': 0.1}

Accuracy 0.752637 (std 0.013546) for params {'batch_size': 20,

'epochs': 50, 'rate': 0.2}

Accuracy 0.791850 (std 0.008519) for params {'batch_size': 20,

'epochs': 100, 'rate': 0}

Accuracy 0.779291 (std 0.009504) for params {'batch_size': 20,

'epochs': 100, 'rate': 0.1}

Accuracy 0.767306 (std 0.005773) for params {'batch_size': 20,

'epochs': 100, 'rate': 0.2}

- Repeat
*step 5*using**rate = [0.0, 0.05, 0.1]**and**epochs = [100]**. Fit the model to the training data using**5-fold cross-validation**and print the results for the entire grid:"""

define a seed for random number generator so the result will be reproducible

"""

np.random.seed(seed)

random.set_seed(seed)

# create the Keras wrapper with scikit learn

model = KerasClassifier(build_fn=build_model, verbose=0, shuffle=False)

# define all the possible values for each hyperparameter

rate = [0.0, 0.05, 0.1]

epochs = [100]

batch_size = [20]

"""

create the dictionary containing all possible values of hyperparameters

"""

param_grid = dict(rate=rate, epochs=epochs, batch_size=batch_size)

"""

perform 5-fold cross-validation for 10 randomly selected combinations, store the results

"""

grid_seach = GridSearchCV(estimator=model, \

param_grid=param_grid, cv=5)

results_4 = grid_seach.fit(X, y)

# print the results for best cross-validation score

print("Best cross-validation score =", results_4.best_score_)

print("Parameters for Best cross-validation score =", \

results_4.best_params_)

# print the results for the entire grid

accuracy_means = results_4.cv_results_['mean_test_score']

accuracy_stds = results_4.cv_results_['std_test_score']

parameters = results_4.cv_results_['params']

for p in range(len(parameters)):

print("Accuracy %f (std %f) for params %r" % \

(accuracy_means[p], accuracy_stds[p], parameters[p]))

The following is the expected output:

Best cross-validation score= 0.7862895488739013

Parameters for Best cross-validation score= {'batch_size': 20,

'epochs': 100, 'rate': 0.0}

Accuracy 0.786290 (std 0.013557) for params {'batch_size': 20,

'epochs': 100, 'rate': 0.0}

Accuracy 0.786098 (std 0.005184) for params {'batch_size': 20,

'epochs': 100, 'rate': 0.05}

Accuracy 0.772004 (std 0.013733) for params {'batch_size': 20,

'epochs': 100, 'rate': 0.1}

Note

To access the source code for this specific section, please refer to https://packt.live/2D7HN0L.

This section does not currently have an online interactive example and will need to be run locally.

# 6. Model Evaluation

## Activity 6.01: Computing the Accuracy and Null Accuracy of a Neural Network When We Change the Train/Test Split

In this activity, we will see that our **null accuracy** and **accuracy** will be affected by changing the **train**/**test** split. To implement this, the part of the code where the train/test split was defined has to be changed. We will use the same dataset that we used in *Exercise 6.02*, *Computing Accuracy and Null Accuracy with APS Failure for Scania Trucks Data*. Follow these steps to complete this activity:

- Import the required libraries. Load the dataset using the pandas
**read_csv**function and look at the first**five**rows of the dataset:# Import the libraries

import numpy as np

import pandas as pd

# Load the Data

X = pd.read_csv("../data/aps_failure_training_feats.csv")

y = pd.read_csv("../data/aps_failure_training_target.csv")

# Use the head function to get a glimpse data

X.head()

The following table shows the output of the preceding code:

- Change the
**test_size**and**random_state**from**0.20**to**0.3**and**42**to**13**, respectively:# Split the data into training and testing sets

from sklearn.model_selection import train_test_split

seed = 13

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.3, random_state=seed)

Note

If you use a different

**random_state**, you may get a different**train**/**test**split, which may yield slightly different final results. - Scale the data using the
**StandardScaler**function and use the scaler to scale the test data. Convert both into pandas DataFrames:# Initialize StandardScaler

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

# Transform the training data

X_train = sc.fit_transform(X_train)

X_train = pd.DataFrame(X_train, columns=X_test.columns)

# Transform the testing data

X_test = sc.transform(X_test)

X_test = pd.DataFrame(X_test, columns = X_train.columns)

Note

The

**sc.fit_transform()**function transforms the data, and the data is also converted into a**NumPy**array. We may need the data later for analysis as a DataFrame object, so the**pd.DataFrame()**function reconverts data into a DataFrame. - Import the libraries that are required to build a neural network architecture:
# Import the relevant Keras libraries

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Dropout

from tensorflow import random

- Initiate the
**Sequential**class:# Initiate the Model with Sequential Class

np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

- Add five
**Dense**layers to the network with**Dropout**. Set the first hidden layer so that it has a size of**64**with a dropout rate of**0.5**, the second hidden layer so that it has a size of**32**with a dropout rate of**0.4**, the third hidden layer so that it has a size of**16**with a dropout rate of**0.3**, the fourth hidden layer so that it has a size of**8**with a dropout rate of**0.2**, and the final hidden layer so that it has a size of**4**with a dropout rate of**0.1**. Set all the activation functions to**ReLU**:# Add the hidden dense layers and with dropout Layer

model.add(Dense(units=64, activation='relu', \

kernel_initializer='uniform', \

input_dim=X_train.shape[1]))

model.add(Dropout(rate=0.5))

model.add(Dense(units=32, activation='relu', \

kernel_initializer='uniform', \

input_dim=X_train.shape[1]))

model.add(Dropout(rate=0.4))

model.add(Dense(units=16, activation='relu', \

kernel_initializer='uniform', \

input_dim=X_train.shape[1]))

model.add(Dropout(rate=0.3))

model.add(Dense(units=8, activation='relu', \

kernel_initializer='uniform', \

input_dim=X_train.shape[1]))

model.add(Dropout(rate=0.2))

model.add(Dense(units=4, activation='relu', \

kernel_initializer='uniform'))

model.add(Dropout(rate=0.1))

- Add an output
**Dense**layer with a**sigmoid**activation function:# Add Output Dense Layer

model.add(Dense(units=1, activation='sigmoid', \

kernel_initializer='uniform'))

Note

Since the output is binary, we are using the

**sigmoid**function. If the output is multiclass (that is, more than two classes), then the**softmax**function should be used. - Compile the network and fit the model. The metric that's being used here is
**accuracy**:# Compile the Model

model.compile(optimizer='adam', loss='binary_crossentropy', \

metrics=['accuracy'])

Note

The metric name, which in our case is

**accuracy**, is defined in the preceding code. - Fit the model with
**100**epochs, a batch size of**20**, and a validation split of**0.2**:# Fit the Model

model.fit(X_train, y_train, epochs=100, batch_size=20, \

verbose=1, validation_split=0.2, shuffle=False)

- Evaluate the model on the test dataset and print out the values for the
**loss**and**accuracy**:test_loss, test_acc = model.evaluate(X_test, y_test)

print(f'The loss on the test set is {test_loss:.4f} and \

the accuracy is {test_acc*100:.4f}%')

The preceding code produces the following output:

18000/18000 [==============================] - 0s 19us/step

The loss on the test set is 0.0766 and the accuracy is 98.9833%

The model returns an accuracy of

**98.9833%**. But is it good enough? We can only get the answer to this question by comparing it against the null accuracy. - Now, compute the null accuracy. The
**null accuracy**can be calculated using the**value_count**function of the**pandas**library, which we used in*Exercise 6.01*,*Calculating Null Accuracy on a Pacific Hurricanes Dataset*, of this chapter:# Use the value_count function to calculate distinct class values

y_test['class'].value_counts()

The preceding code produces the following output:

0 17700

1 300

Name: class, dtype: int64

- Calculate the
**null accuracy**:# Calculate the null accuracy

y_test['class'].value_counts(normalize=True).loc[0]

The preceding code produces the following output:

0.9833333333333333

Note

To access the source code for this specific section, please refer to https://packt.live/3eY7y1E.

You can also run this example online at https://packt.live/2BzBO4n.

## Activity 6.02: Calculating the ROC Curve and AUC Score

The **ROC curve** and **AUC score** is an effective way to easily evaluate the performance of a binary classifier. In this activity, we will plot the **ROC curve** and calculate the **AUC score** of a model. We will use the same dataset and train the same model that we used in *Exercise 6.03*, *Deriving and Computing Metrics Based on a Confusion Matrix*. Continue with the same APS failure data, plot the **ROC curve**, and compute the **AUC score** of the model. Follow these steps to complete this activity:

- Import the necessary libraries and load the data using the pandas
**read_csv**function:# Import the libraries

import numpy as np

import pandas as pd

# Load the Data

X = pd.read_csv("../data/aps_failure_training_feats.csv")

y = pd.read_csv("../data/aps_failure_training_target.csv")

- Split the data into training and test datasets using the
**train_test_split**function:from sklearn.model_selection import train_test_split

seed = 42

X_train, X_test, y_train, y_test = \

train_test_split(X, y, test_size=0.20, random_state=seed)

- Scale the feature data so that it has a
**mean**of**0**and a**standard deviation**of**1**using the**StandardScaler**function. Fit the scaler in the**training data**and apply it to the**test data**:from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

# Transform the training data

X_train = sc.fit_transform(X_train)

X_train = pd.DataFrame(X_train,columns=X_test.columns)

# Transform the testing data

X_test = sc.transform(X_test)

X_test = pd.DataFrame(X_test,columns=X_train.columns)

- Import the Keras libraries that are required for creating the model. Instantiate a Keras model of the
**Sequential**class and add five hidden layers to the model, including dropout for each layer. The first hidden layer should have a size of**64**and a dropout rate of**0.5**. The second hidden layer should have a size of**32**and a dropout rate of**0.4**. The third hidden layer should have a size of**16**and a dropout rate of**0.3**. The fourth hidden layer should have a size of**8**and a dropout rate of**0.2**. The final hidden layer should have a size of**4**and a dropout rate of**0.1**. All the hidden layers should have**ReLU activation**functions and set**kernel_initializer = 'uniform'**. Add a final output layer to the model with a sigmoid activation function. Compile the model by calculating the accuracy metric during the training process:# Import the relevant Keras libraries

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Dropout

from tensorflow import random

np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

# Add the hidden dense layers with dropout Layer

model.add(Dense(units=64, activation='relu', \

kernel_initializer='uniform', \

input_dim=X_train.shape[1]))

model.add(Dropout(rate=0.5))

model.add(Dense(units=32, activation='relu', \

kernel_initializer='uniform'))

model.add(Dropout(rate=0.4))

model.add(Dense(units=16, activation='relu', \

kernel_initializer='uniform'))

model.add(Dropout(rate=0.3))

model.add(Dense(units=8, activation='relu', \

kernel_initializer='uniform'))

model.add(Dropout(rate=0.2))

model.add(Dense(units=4, activation='relu', \

kernel_initializer='uniform'))

model.add(Dropout(rate=0.1))

# Add Output Dense Layer

model.add(Dense(units=1, activation='sigmoid', \

kernel_initializer='uniform'))

# Compile the Model

model.compile(optimizer='adam', loss='binary_crossentropy', \

metrics=['accuracy'])

- Fit the model to the training data by training for
**100**epochs with**batch_size=20**and with**validation_split=0.2**:model.fit(X_train, y_train, epochs=100, batch_size=20, \

verbose=1, validation_split=0.2, shuffle=False)

- Once the model has finished fitting to the training data, create a variable that is the result of the model's prediction on the test data using the model's
**predict_proba**methods:y_pred_prob = model.predict_proba(X_test)

- Import
**roc_curve**from scikit-learn and run the following code:from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)

**fpr**= False positive rate (1 - specificity)**tpr**= True positive rate (sensitivity)**thresholds**= The threshold value of**y_pred_prob** - Run the following code to plot the
**ROC curve**using**matplotlib.pyplot**:import matplotlib.pyplot as plt

plt.plot(fpr, tpr)

plt.title("ROC Curve for APS Failure")

plt.xlabel("False Positive rate (1-Specificity)")

plt.ylabel("True Positive rate (Sensitivity)")

plt.grid(True)

plt.show()

The following plot shows the output of the preceding code:

- Calculate the AUC score using the
**roc_auc_score**function:from sklearn.metrics import roc_auc_score

roc_auc_score(y_test,y_pred_prob)

The following is the output of the preceding code:

0.944787151628455

The AUC score of

**94.4479%**suggests that our model is excellent, as per the general acceptable**AUC score**shown above.Note

To access the source code for this specific section, please refer to https://packt.live/2NUOgyh.

You can also run this example online at https://packt.live/2As33NH.

# 7. Computer Vision with Convolutional Neural Networks

## Activity 7.01: Amending Our Model with Multiple Layers and the Use of softmax

Let's try and improve the performance of our image classification algorithm. There are many ways to improve its performance, and one of the most straightforward ways is by adding multiple ANN layers to the model, which we will learn about in this activity. We will also change the activation from sigmoid to softmax. Then, we can compare the result with that of the previous exercise. Follow these steps to complete this activity:

- Import the
**numpy**library and the necessary Keras libraries and classes:# Import the Libraries

from keras.models import Sequential

from keras.layers import Conv2D, MaxPool2D, Flatten, Dense

import numpy as np

from tensorflow import random

- Now, initiate the model with the
**Sequential**class:# Initiate the classifier

seed = 1

np.random.seed(seed)

random.set_seed(seed)

classifier=Sequential()

- Add the first layer of the CNN, set the input shape to
**(64, 64, 3)**, the dimension of each image, and the activation function as a ReLU. Then, add**32**feature detectors of size**(3, 3)**. Add two additional convolutional layers with**32**feature detectors of size**(3, 3)**, also with**ReLU activation**functions:classifier.add(Conv2D(32,(3,3),input_shape=(64,64,3),\

activation='relu'))

classifier.add(Conv2D(32,(3,3),activation = 'relu'))

classifier.add(Conv2D(32,(3,3),activation = 'relu'))

**32, (3, 3)**means that there are**32**feature detectors of size**3x3**. As a good practice, always start with**32**; you can add**64**or**128**later. - Now, add the pooling layer with an image size of
**2x2**:classifier.add(MaxPool2D(pool_size=(2,2)))

- Flatten the output of the pooling layer by adding a flattening layer to the
**CNN model**:classifier.add(Flatten())

- Add the first dense layer of the ANN. Here,
**128**is the output of the number of nodes. As a good practice,**128**is good to get started.**activation**is**relu**. As a good practice, the power of two is preferred:classifier.add(Dense(units=128,activation='relu'))

- Add three more layers to the ANN of the same size,
**128**, along with**ReLU activation**functions:classifier.add(Dense(128,activation='relu'))

classifier.add(Dense(128,activation='relu'))

classifier.add(Dense(128,activation='relu'))

- Add the output layer of the ANN. Replace the sigmoid function with
**softmax**:classifier.add(Dense(units=1,activation='softmax'))

- Compile the network with an
**Adam optimizer**and compute the accuracy during the training process:# Compile The network

classifier.compile(optimizer='adam', loss='binary_crossentropy', \

metrics=['accuracy'])

- Create training and test data generators. Rescale the training and test images by
**1/255**so that all the values are between**0**and**1**. Set these parameters for the training data generators only –**shear_range=0.2**,**zoom_range=0.2**, and**horizontal_flip=True**:from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255, \

shear_range = 0.2, \

zoom_range = 0.2, \

horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

- Create a training set from the
**training set**folder.**'../dataset/training_set'**is the folder where our data has been placed. Our CNN model has an image size of**64x64**, so the same size should be passed here too.**batch_size**is the number of images in a single batch, which is**32**.**class_mode**is set to**binary**since we are working on binary classifiers:training_set = \

train_datagen.flow_from_directory('../dataset/training_set', \

target_size = (64, 64), \

batch_size = 32, \

class_mode = 'binary')

- Repeat
*step 6*for the test by setting the folder to the location of the test images, that is,**'../dataset/test_set'**:test_set = \

test_datagen.flow_from_directory('../dataset/test_set', \

target_size = (64, 64), \

batch_size = 32, \

class_mode = 'binary')

- Finally, fit the data. Set the
**steps_per_epoch**to**10000**and the**validation_steps**to**2500**. The following step might take some time to execute:classifier.fit_generator(training_set, steps_per_epoch = 10000, \

epochs = 2, validation_data = test_set, \

validation_steps = 2500, shuffle=False)

The preceding code produces the following output:

Epoch 1/2

10000/10000 [==============================] - 2452s 245ms/step - loss: 8.1783 - accuracy: 0.4667 - val_loss: 11.4999 - val_accuracy: 0.4695

Epoch 2/2

10000/10000 [==============================] - 2496s 250ms/step - loss: 8.1726 - accuracy: 0.4671 - val_loss: 10.5416 - val_accuracy: 0.4691

Note that the accuracy has decreased to

**46.91%**due to the new softmax activation function.Note

To access the source code for this specific section, please refer to https://packt.live/3gj0TiA.

You can also run this example online at https://packt.live/2VIDj7e.

## Activity 7.02: Classifying a New Image

In this activity, you will try to classify another new image, just like we did in the preceding exercise. The image hasn't been exposed to the algorithm, so we will use this activity to test our algorithm. You can run any of the algorithms in this chapter (although the one that gets the highest accuracy is preferred) and then use the model to classify your images. Follow these steps to complete this activity:

- Run one of the algorithms from this chapter.
- Load the image and process it.
**'test_image_2.jpg'**is the path of the test image. Change the path in the code where you have saved the dataset:from keras.preprocessing import image

new_image = \

image.load_img('../test_image_2.jpg', target_size = (64, 64))

new_image

- You can view the class labels using the following code:
training_set.class_indices

- Process the image by converting it into a
**numpy**array using the**img_to_array**function. Then, add an additional dimension along the 0th axis using numpy's**expand_dims**function:new_image = image.img_to_array(new_image)

new_image = np.expand_dims(new_image, axis = 0)

- Predict the new image by calling the
**predict**method of the classifier:result = classifier.predict(new_image)

- Use the
**class_indices**method with an**if…else**statement to map the 0 or 1 output of the prediction to a class label:if result[0][0] == 1:

prediction = 'It is a flower'

else:

prediction = 'It is a car'

print(prediction)

The preceding code produces the following output:

It is a flower

**test_image_2**is an image of a flower and was predicted to be a flower.Note

To access the source code for this specific section, please refer to https://packt.live/38ny95E.

You can also run this example online at https://packt.live/2VIM4Ow.

# 8. Transfer Learning and Pre-Trained Models

## Activity 8.01: Using the VGG16 Network to Train a Deep Learning Network to Identify Images

Use the **VGG16** network to predict the image given (**test_image_1**). Before you start, ensure that you have downloaded the image (**test_image_1**) to your working directory. Follow these steps to complete this activity:

- Import the
**numpy**library and the necessary**Keras**libraries:import numpy as np

from keras.applications.vgg16 import VGG16, preprocess_input

from keras.preprocessing import image

- Initiate the model (note that, at this point, you can also view the architecture of the network, as shown in the following code):
classifier = VGG16()

classifier.summary()

**classifier.summary()**shows us the architecture of the network. The following points should be noted: it has a four-dimensional input shape (**None, 224, 224, 3**) and it has three convolutional layers.The last four layers of the output are as follows:

- Load the image.
**'../Data/Prediction/test_image_1.jpg'**is the path of the image on our system. It will be different on your system:new_image = \

image.load_img('../Data/Prediction/test_image_1.jpg', \

target_size=(224, 224))

new_image

The following figure shows the output of the preceding code:

The target size should be

**224x 224**since**VGG16**only accepts (**224,224**). - Change the image into an array by using the
**img_to_array**function:transformed_image = image.img_to_array(new_image)

transformed_image.shape

The preceding code provides the following output:

(224, 224, 3)

- The image should be in a four-dimensional form for
**VGG16**to allow further processing. Expand the dimension of the image, as follows:transformed_image = np.expand_dims(transformed_image, axis=0)

transformed_image.shape

The preceding code provides the following output:

(1, 224, 224, 3)

- Preprocess the image:
transformed_image = preprocess_input(transformed_image)

transformed_image

The following figure shows the output of the preceding code:

- Create the
**predictor**variable:y_pred = classifier.predict(transformed_image)

y_pred

The following figure shows the output of the preceding code:

- Check the shape of the image. It should be (
**1,1000**). It's**1000**because, as we mentioned previously, the ImageNet database has**1000**categories of images. The predictor variable shows the probabilities of our image being one of those images:y_pred.shape

The preceding code provides the following output:

(1, 1000)

- Print the top five probabilities of what our image is using the
**decode_predictions**function and pass the function of the predictor variable,**y_pred**, and the number of predictions and corresponding labels to output:from keras.applications.vgg16 import decode_predictions

decode_predictions(y_pred, top=5)

The preceding code provides the following output:

[[('n03785016', 'moped', 0.8433369),

('n03791053', 'motor_scooter', 0.14188054),

('n03127747', 'crash_helmet', 0.007004856),

('n03208938', 'disk_brake', 0.0022349996),

('n04482393', 'tricycle', 0.0007717237)]]

The first column of the array is an internal code number. The second is the label, while the third is the probability of the image being the label.

- Transform the predictions into a human-readable format. We need to extract the most probable label from the output, as follows:
label = decode_predictions(y_pred)

"""

Most likely result is retrieved, for example, the highest probability

"""

decoded_label = label[0][0]

# The classification is printed

print('%s (%.2f%%)' % (decoded_label[1], decoded_label[2]*100 ))

The preceding code provides the following output:

moped (84.33%)

Here, we can see that we have an

**84.33%**probability that the picture is of a moped, which is close enough to a motorbike and probably represents the fact that motorbikes in the ImageNet dataset were labeled as mopeds.Note

To access the source code for this specific section, please refer to https://packt.live/2C4nqRo.

You can also run this example online at https://packt.live/31JMPL4.

## Activity 8.02: Image Classification with ResNet

In this activity, we will use another pre-trained network, known as **ResNet**. We have an image of television located at **../Data/Prediction/test_image_4**. We will use the **ResNet50** network to predict the image. Follow these steps to complete this activity:

- Import the
**numpy**library and the necessary**Keras**libraries:import numpy as np

from keras.applications.resnet50 import ResNet50, preprocess_input

from keras.preprocessing import image

- Initiate the ResNet50 model and print a summary of the model:
classifier = ResNet50()

classifier.summary()

**classifier.summary()**shows us the architecture of the network. The following points should be noted:Note

The last layer predictions (

**Dense**) have**1000**values. This means that**VGG16**has a total of**1000**labels and that our image will be one of those**1000**labels. - Load the image.
**'../Data/Prediction/test_image_4.jpg'**is the path of the image on our system. It will be different on your system:new_image = \

image.load_img('../Data/Prediction/test_image_4.jpg', \

target_size=(224, 224))

new_image

The following is the output of the preceding code:

The target size should be

**224x224**since**ResNet50**only accepts (**224,224**). - Change the image into an array by using the
**img_to_array**function:transformed_image = image.img_to_array(new_image)

transformed_image.shape

- The image has to be in a four-dimensional form for
**ResNet50**to allow further processing. Expand the dimensions of the image along the 0th axis using the**expand_dims**function:transformed_image = np.expand_dims(transformed_image, axis=0)

transformed_image.shape

- Preprocess the image using the
**preprocess_input**function:transformed_image = preprocess_input(transformed_image)

transformed_image

- Create the predictor variable by using the classifier to predict the image using it's
**predict**method:y_pred = classifier.predict(transformed_image)

y_pred

- Check the shape of the image. It should be (
**1,1000**):y_pred.shape

The preceding code provides the following output:

(1, 1000)

- Select the top five probabilities of what our image is using the
**decode_predictions**function and by passing the predictor variable,**y_pred**, as the argument and the top number of predictions and corresponding labels:from keras.applications.resnet50 import decode_predictions

decode_predictions(y_pred, top=5)

The preceding code provides the following output:

[[('n04404412', 'television', 0.99673873),

('n04372370', 'switch', 0.0009829825),

('n04152593', 'screen', 0.00095111143),

('n03782006', 'monitor', 0.0006477369),

('n04069434', 'reflex_camera', 8.5398955e-05)]]

The first column of the array is an internal code number. The second is the label, while the third is the probability of the image matching the label.

- Put the predictions in a human-readable format. Print the most probable label from the output from the result of the
**decode_predictions**function:label = decode_predictions(y_pred)

"""

Most likely result is retrieved, for example,

the highest probability

"""

decoded_label = label[0][0]

# The classification is printed

print('%s (%.2f%%)' % (decoded_label[1], decoded_label[2]*100 ))

The preceding code produces the following output:

television (99.67%)

Note

To access the source code for this specific section, please refer to https://packt.live/38rEe0M.

You can also run this example online at https://packt.live/2YV5xxo.

# 9. Sequential Modeling with Recurrent Neural Networks

## Activity 9.01: Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons)

In this activity, we will examine the stock price of Amazon for the last 5 years—from January 1, 2014, to December 31, 2018. In doing so, we will try to predict and forecast the company's future trend for January 2019 using an **RNN** and **LSTM**. We have the actual values for January 2019, so we can compare our predictions to the actual values later. Follow these steps to complete this activity:

- Import the required libraries:
import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from tensorflow import random

- Import the dataset using the pandas
**read_csv**function and look at the first five rows of the dataset using the**head**method:dataset_training = pd.read_csv('../AMZN_train.csv')

dataset_training.head()

The following figure shows the output of the preceding code:

- We are going to make our prediction using the
**Open**stock price; therefore, select the**Open**stock price column from the dataset and print the values:training_data = dataset_training[['Open']].values

training_data

The preceding code produces the following output:

array([[ 398.799988],

[ 398.290009],

[ 395.850006],

...,

[1454.199951],

[1473.349976],

[1510.800049]])

- Then, perform feature scaling by normalizing the data using
**MinMaxScaler**and setting the range of the features so that they have a minimum value of zero and a maximum value of one. Use the**fit_transform**method of the scaler on the training data:from sklearn.preprocessing import MinMaxScaler

sc = MinMaxScaler(feature_range = (0, 1))

training_data_scaled = sc.fit_transform(training_data)

training_data_scaled

The preceding code produces the following output:

array([[0.06523313],

[0.06494233],

[0.06355099],

...,

[0.66704299],

[0.67796271],

[0.69931748]])

- Create the data to get
**60**timestamps from the current instance. We chose**60**here as it will give us a sufficient number of previous instances in order to understand the trend; technically, this can be any number, but**60**is the optimal value. Additionally, the upper bound value here is**1258**, which is the index or count of rows (or records) in the training set:X_train = []

y_train = []

for i in range(60, 1258):

X_train.append(training_data_scaled[i-60:i, 0])

y_train.append(training_data_scaled[i, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

- Reshape the data to add an extra dimension to the end of
**X_train**using NumPy's**reshape**function:X_train = np.reshape(X_train, (X_train.shape[0], \

X_train.shape[1], 1))

- Import the following libraries to build the RNN:
from keras.models import Sequential

from keras.layers import Dense, LSTM, Dropout

- Set the seed and initiate the sequential model, as follows:
seed = 1

np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

- Add an
**LSTM**layer to the network with**50**units, set the**return_sequences**argument to**True**, and set the**input_shape**argument to**(X_train.shape[1], 1)**. Add three additional**LSTM**layers, each with**50**units, and set the**return_sequences**argument to**True**for the first two. Add a final output layer of size 1:model.add(LSTM(units = 50, return_sequences = True, \

input_shape = (X_train.shape[1], 1)))

# Adding a second LSTM layer

model.add(LSTM(units = 50, return_sequences = True))

# Adding a third LSTM layer

model.add(LSTM(units = 50, return_sequences = True))

# Adding a fourth LSTM layer

model.add(LSTM(units = 50))

# Adding the output layer

model.add(Dense(units = 1))

- Compile the network with an
**adam**optimizer and use**Mean Squared Error**for the loss. Fit the model to the training data for**100**epochs with a batch size of**32**:# Compiling the RNN

model.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set

model.fit(X_train, y_train, epochs = 100, batch_size = 32)

- Load and process the test data (which is treated as actual data here) and select the column representing the value of
**Open**stock data:dataset_testing = pd.read_csv('../AMZN_test.csv')

actual_stock_price = dataset_testing[['Open']].values

actual_stock_price

- Concatenate the data since we will need
**60**previous instances to get the stock price for each day. Therefore, we will need both the training and test data:total_data = pd.concat((dataset_training['Open'], \

dataset_testing['Open']), axis = 0)

- Reshape and scale the input to prepare the test data. Note that we are predicting the January monthly trend, which has
**21**financial days, so in order to prepare the test set, we take the lower bound value as**60**and the upper bound value as**81**. This ensures that the difference of**21**is maintained:inputs = total_data[len(total_data) \

- len(dataset_testing) - 60:].values

inputs = inputs.reshape(-1,1)

inputs = sc.transform(inputs)

X_test = []

for i in range(60, 81):

X_test.append(inputs[i-60:i, 0])

X_test = np.array(X_test)

X_test = np.reshape(X_test, (X_test.shape[0], \

X_test.shape[1], 1))

predicted_stock_price = model.predict(X_test)

predicted_stock_price = \

sc.inverse_transform(predicted_stock_price)

- Visualize the results by plotting the actual stock price and plotting the predicted stock price:
# Visualizing the results

plt.plot(actual_stock_price, color = 'green', \

label = 'Real Amazon Stock Price',ls='--')

plt.plot(predicted_stock_price, color = 'red', \

label = 'Predicted Amazon Stock Price',ls='-')

plt.title('Predicted Stock Price')

plt.xlabel('Time in days')

plt.ylabel('Real Stock Price')

plt.legend()

plt.show()

Please note that your results may differ slightly from the actual stock price of Amazon.

**Expected output**:

As shown in the preceding plot, the trends of the predicted and real prices are pretty much the same; the line has the same peaks and troughs. This is possible because of LSTM's ability to remember sequenced data. A traditional feedforward neural network would not have been able to forecast this result. This is the true power of **LSTM** and **RNNs**.

Note

To access the source code for this specific section, please refer to https://packt.live/3goQO3I.

You can also run this example online at https://packt.live/2VIMq7O.

## Activity 9.02: Predicting Amazon's Stock Price with Added Regularization

In this activity, we will examine the stock price of Amazon over the last 5 years, from January 1, 2014, to December 31, 2018. In doing so, we will try to predict and forecast the company's future trend for January 2019 using RNNs and an LSTM. We have the actual values for January 2019, so we will be able to compare our predictions with the actual values later. Initially, we predicted the trend of Amazon's stock price using an LSTM with 50 units (or neurons). In this activity, we will also add dropout regularization and compare the results with *Activity 9.01*, *Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons)*. Follow these steps to complete this activity:

- Import the required libraries:
import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from tensorflow import random

- Import the dataset using the pandas
**read_csv**function and look at the first five rows of the dataset using the**head**method:dataset_training = pd.read_csv('../AMZN_train.csv')

dataset_training.head()

- We are going to make our prediction using the
**Open**stock price; therefore, select the**Open**stock price column from the dataset and print the values:training_data = dataset_training[['Open']].values

training_data

The preceding code produces the following output:

array([[ 398.799988],

[ 398.290009],

[ 395.850006],

...,

[1454.199951],

[1473.349976],

[1510.800049]])

- Then, perform feature scaling by normalizing the data using
**MinMaxScaler**and setting the range of the features so that they have a minimum value of**0**and a maximum value of one. Use the**fit_transform**method of the scaler on the training data:from sklearn.preprocessing import MinMaxScaler

sc = MinMaxScaler(feature_range = (0, 1))

training_data_scaled = sc.fit_transform(training_data)

training_data_scaled

The preceding code produces the following output:

array([[0.06523313],

[0.06494233],

[0.06355099],

...,

[0.66704299],

[0.67796271],

[0.69931748]])

- Create the data to get
**60**timestamps from the current instance. We chose**60**here as it will give us a sufficient number of previous instances in order to understand the trend; technically, this can be any number, but**60**is the optimal value. Additionally, the upper bound value here is**1258**, which is the index or count of rows (or records) in the training set:X_train = []

y_train = []

for i in range(60, 1258):

X_train.append(training_data_scaled[i-60:i, 0])

y_train.append(training_data_scaled[i, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

- Reshape the data to add an extra dimension to the end of
**X_train**using NumPy's**reshape**function:X_train = np.reshape(X_train, (X_train.shape[0], \

X_train.shape[1], 1))

- Import the following Keras libraries to build the RNN:
from keras.models import Sequential

from keras.layers import Dense, LSTM, Dropout

- Set the seed and initiate the sequential model, as follows:
seed = 1

np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

- Add an LSTM layer to the network with 50 units, set the
**return_sequences**argument to**True**, and set the**input_shape**argument to**(X_train.shape[1], 1)**. Add dropout to the model with**rate=0.2**. Add three additional LSTM layers, each with**50**units, and set the**return_sequences**argument to**True**for the first two. After each**LSTM**layer, add a dropout with**rate=0.2**. Add a final output layer of size**1**:model.add(LSTM(units = 50, return_sequences = True, \

input_shape = (X_train.shape[1], 1)))

model.add(Dropout(0.2))

# Adding a second LSTM layer and some Dropout regularization

model.add(LSTM(units = 50, return_sequences = True))

model.add(Dropout(0.2))

# Adding a third LSTM layer and some Dropout regularization

model.add(LSTM(units = 50, return_sequences = True))

model.add(Dropout(0.2))

# Adding a fourth LSTM layer and some Dropout regularization

model.add(LSTM(units = 50))

model.add(Dropout(0.2))

# Adding the output layer

model.add(Dense(units = 1))

- Compile the network with an
**adam**optimizer and use**Mean Squared Error**for the loss. Fit the model to the training data for**100**epochs with a batch size of**32**:# Compiling the RNN

model.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set

model.fit(X_train, y_train, epochs = 100, batch_size = 32)

- Load and process the test data (which is treated as actual data here) and select the column representing the value of
**Open**stock data:dataset_testing = pd.read_csv('../AMZN_test.csv')

actual_stock_price = dataset_testing[['Open']].values

actual_stock_price

- Concatenate the data since we will need
**60**previous instances to get the stock price for each day. Therefore, we will need both the training and test data:total_data = pd.concat((dataset_training['Open'], \

dataset_testing['Open']), axis = 0)

- Reshape and scale the input to prepare the test data. Note that we are predicting the January monthly trend, which has
**21**financial days, so in order to prepare the test set, we take the lower bound value as**60**and the upper bound value as**81**. This ensures that the difference of**21**is maintained:inputs = total_data[len(total_data) \

- len(dataset_testing) - 60:].values

inputs = inputs.reshape(-1,1)

inputs = sc.transform(inputs)

X_test = []

for i in range(60, 81):

X_test.append(inputs[i-60:i, 0])

X_test = np.array(X_test)

X_test = np.reshape(X_test, (X_test.shape[0], \

X_test.shape[1], 1))

predicted_stock_price = model.predict(X_test)

predicted_stock_price = \

sc.inverse_transform(predicted_stock_price)

- Visualize the results by plotting the actual stock price and plotting the predicted stock price:
# Visualizing the results

plt.plot(actual_stock_price, color = 'green', \

label = 'Real Amazon Stock Price',ls='--')

plt.plot(predicted_stock_price, color = 'red', \

label = 'Predicted Amazon Stock Price',ls='-')

plt.title('Predicted Stock Price')

plt.xlabel('Time in days')

plt.ylabel('Real Stock Price')

plt.legend()

plt.show()

Please note that your results may differ slightly to the actual stock price.

**Expected output**:

In the following figure, the first plot displays the predicted output of the model with regularization from Activity 9.02, and the second displays the predicted output without regularization from Activity 9.01. As you can see, adding dropout regularization does not fit the data as accurately. So, in this case, it is better not to use regularization, or to use dropout regularization with a lower dropout rate :

Note

To access the source code for this specific section, please refer to https://packt.live/2YTpxR7.

You can also run this example online at https://packt.live/3dY5Bku.

## Activity 9.03: Predicting the Trend of Amazon's Stock Price Using an LSTM with an Increasing Number of LSTM Neurons (100 Units)

In this activity, we will examine the stock price of Amazon over the last 5 years, from January 1, 2014, to December 31, 2018. We will try to predict and forecast the company's future trend for January 2019 using **RNNs** with four **LSTM** layers, each with **100** units. We have the actual values for January 2019, so we will be able to compare our predictions with the actual values later. You can also compare the output difference with *Activity 9.01*, *Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons)*. Follow these steps to complete this activity:

- Import the required libraries:
import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

from tensorflow import random

- Import the dataset using the pandas
**read_csv**function and look at the first five rows of the dataset using the**head**method:dataset_training = pd.read_csv('../AMZN_train.csv')

dataset_training.head()

- We are going to make our prediction using the
**Open**stock price; therefore, select the**Open**stock price column from the dataset and print the values:training_data = dataset_training[['Open']].values

training_data

- Then, perform feature scaling by normalizing the data using
**MinMaxScaler**and setting the range of the features so that they have a minimum value of zero and a maximum value of one. Use the**fit_transform**method of the scaler on the training data:from sklearn.preprocessing import MinMaxScaler

sc = MinMaxScaler(feature_range = (0, 1))

training_data_scaled = sc.fit_transform(training_data)

training_data_scaled

- Create the data to get
**60**timestamps from the current instance. We chose**60**here as it will give us a sufficient number of previous instances in order to understand the trend; technically, this can be any number, but**60**is the optimal value. Additionally, the upper bound value here is**1258**, which is the index or count of rows (or records) in the training set:X_train = []

y_train = []

for i in range(60, 1258):

X_train.append(training_data_scaled[i-60:i, 0])

y_train.append(training_data_scaled[i, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

- Reshape the data to add an extra dimension to the end of
**X_train**using NumPy's**reshape**function:X_train = np.reshape(X_train, (X_train.shape[0], \

X_train.shape[1], 1))

- Import the following Keras libraries to build the RNN:
from keras.models import Sequential

from keras.layers import Dense, LSTM, Dropout

- Set the seed and initiate the sequential model:
seed = 1

np.random.seed(seed)

random.set_seed(seed)

model = Sequential()

- Add an LSTM layer to the network with
**100**units, set the**return_sequences**argument to**True**, and set the**input_shape**argument to**(X_train.shape[1], 1)**. Add three additional**LSTM**layers, each with**100**units, and set the**return_sequences**argument to**True**for the first two. Add a final output layer of size**1**:model.add(LSTM(units = 100, return_sequences = True, \

input_shape = (X_train.shape[1], 1)))

# Adding a second LSTM layer

model.add(LSTM(units = 100, return_sequences = True))

# Adding a third LSTM layer

model.add(LSTM(units = 100, return_sequences = True))

# Adding a fourth LSTM layer

model.add(LSTM(units = 100))

# Adding the output layer

model.add(Dense(units = 1))

- Compile the network with an
**adam**optimizer and use**Mean Squared Error**for the loss. Fit the model to the training data for**100**epochs with a batch size of**32**:# Compiling the RNN

model.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fitting the RNN to the Training set

model.fit(X_train, y_train, epochs = 100, batch_size = 32)

- Load and process the test data (which is treated as actual data here) and select the column representing the value of open stock data:
dataset_testing = pd.read_csv('../AMZN_test.csv')

actual_stock_price = dataset_testing[['Open']].values

actual_stock_price

- Concatenate the data since we will need
**60**previous instances to get the stock price for each day. Therefore, we will need both the training and test data:total_data = pd.concat((dataset_training['Open'], \

dataset_testing['Open']), axis = 0)

- Reshape and scale the input to prepare the test data. Note that we are predicting the January monthly trend, which has
**21**financial days, so in order to prepare the test set, we take the lower bound value as**60**and the upper bound value as**81**. This ensures that the difference of**21**is maintained:inputs = total_data[len(total_data) \

- len(dataset_testing) - 60:].values

inputs = inputs.reshape(-1,1)

inputs = sc.transform(inputs)

X_test = []

for i in range(60, 81):

X_test.append(inputs[i-60:i, 0])

X_test = np.array(X_test)

X_test = np.reshape(X_test, (X_test.shape[0], \

X_test.shape[1], 1))

predicted_stock_price = model.predict(X_test)

predicted_stock_price = \

sc.inverse_transform(predicted_stock_price)

- Visualize the results by plotting the actual stock price and plotting the predicted stock price:
plt.plot(actual_stock_price, color = 'green', \

label = 'Actual Amazon Stock Price',ls='--')

plt.plot(predicted_stock_price, color = 'red', \

label = 'Predicted Amazon Stock Price',ls='-')

plt.title('Predicted Stock Price')

plt.xlabel('Time in days')

plt.ylabel('Real Stock Price')

plt.legend()

plt.show()

Please note that your results may differ slightly from the actual stock price.

**Expected output**:

So, if we compare the results of the **LSTM** with **50** units (from *Activity 9.01*, *Predicting the Trend of Amazon's Stock Price Using an LSTM with 50 Units (Neurons)*) and the **LSTM** with **100** units in this activity, we get trends with **100** units. Also, note that when we run the **LSTM** with **100** units, it takes more computational time than the **LSTM** with **50** units. A trade-off needs to be considered in such cases:

Note

To access the source code for this specific section, please refer to https://packt.live/31NQkQy.

You can also run this example online at https://packt.live/2ZCZ4GR.