Matrix Factorization Model – Hands-On Machine Learning with ML.NET

Matrix Factorization Model

With anomaly detection models behind us, it is now time to dive into matrix factorization models. Matrix factorization is one of the newer additions to ML.NET, with a transform of the same name. In this chapter, we will dive into matrix factorization, as well as the various applications best suited to utilizing matrix factorization. In addition, we will build a new sample application to predict music recommendations based on the sample training data. Finally, we will explore how to evaluate a matrix factorization model with the properties that ML.NET exposes.

In this chapter, we will cover the following topics:

  • Breaking down matrix factorizations
  • Creating a matrix factorization application
  • Evaluating a matrix factorization model

Breaking down matrix factorizations

As mentioned in Chapter 1Getting Started with Machine Learning and ML.NET, matrix factorization, by definition, is an unsupervised learning algorithm. This means that the algorithm will train on data and build a matrix of patterns in user ratings, and during a prediction call, will attempt to find like ratings based on the data provided. In this section, we will dive into use cases for matrix factorization and have a look into the matrix factorization trainer in ML.NET.

Use cases for matrix factorizations

Matrix factorizations, as you might be starting to realize, have numerous applications where data is available, but the idea is to suggest other matches based on previously unselected data. Without needing to do manual spot-checking, matrix factorization algorithms train on this unselected data and determine patterns using a key-value pair combination. ML.NET provides various matrix factorization values to look at programmatically, inside of your application. We will review these values later on in this chapter, to better ensure the recommendation was not a false positive.

Some of the potential applications best suited for matrix factorization are:

  • Music recommendations
  • Product recommendations
  • Movie recommendations
  • Book recommendations

Effectively, anything where data can be traced back to a single user and then built upon as more data is entered can utilize matrix factorizations. This problem is called a cold start problem. Take, for instance, a new music platform geared toward helping you to find new bands to listen to. When you first reach the site and create a profile, there is no prior data available. You, as the end user, must tell the system what you like and don't like. Due to the nature of the algorithm, matrix factorization is better suited to this application than the straight regression or binary classification algorithms we explored in earlier chapters. 

Diving into the matrix factorization trainer

The matrix factorization trainer is the only traditional trainer found in ML.NET as of this writing. The matrix factorization trainer requires both normalization of the values and caching. In addition, to utilize matrix factorization in ML.NET, the Microsoft.ML.Recommender NuGet package is required if you are creating the project from scratch. The included sample from the GitHub repository includes this package.

Similar to other algorithms, normalization is required, but matrix factorization is unique. Other algorithms, as we have seen with binary classification or regression algorithms, have multiple values that can be normalized. In matrix factorization, there are only three values involved: Label, Row, and Column values. The output is comprised of two properties: Score and Label. The Score value is of type Float, non-negative and unbounded.

It should be noted that in July 2018's ML.NET 0.3 update, field-aware factorization machines were added. However, this type of trainer offered only binary recommendations (such as either like or dislike), as opposed to matrix factorization, which supports floating-point values of any range. This provides considerably better flexibility in usage, such as getting more granular predictions. If, for instance, a matrix factorization recommendation on a scale from 0 to 100 returned 30, the recommendation engine would more than likely return a negative recommendation. With simply a binary response, the application—and thereby the end-user—is not shown how strong the recommendation is either way.

We will demonstrate this trainer in the sample application later, in the next section, by providing music recommendations.

Creating a matrix factorization application

As mentioned earlier, the application we will be creating is for music prediction. Given a UserID, MusicID, and a rating, the algorithm will use that data to create recommendations. As with other applications, this is not meant to power the next Spotifyesque machine learning product; however, it will show you how to use matrix factorization in ML.NET.

As with previous chapters, the completed project code, sample dataset, and project files can be downloaded here: https://github.com/PacktPublishing/Hands-On-Machine-Learning-With-ML.NET/tree/master/chapter07.

Exploring the project architecture

Building on the project architecture and code we created in previous chapters, the bulk of the changes are in the training of the model, as matrix factorization requires a fairly significant paradigm shift from what we have reviewed in previous chapters.

In the following screenshot, you will find the Visual Studio Solution Explorer view of the project. The new additions to the solution are the MusicRating and MusicPrediction files, which we will review later in this section:

The sampledata.csv file contains 10 rows of random music ratings. Feel free to adjust the data to fit your own observations, or to adjust the trained model. Here is a snippet of the data:

1,1000,4
1,1001,3.5
1,1002,1
1,1003,2
2,1000,1.5
2,1001,2
2,1002,4
2,1003,4
3,1000,1
3,1001,3

Each of these rows contains the value for the properties in the newly created MusicRating class that we will review later on in this chapter.

In addition to this, we added the testdata.csv file that contains additional data points to test the newly trained model against and evaluate. Here is a snippet of the data inside of testdata.csv:

1,1000,4
1,1001,3.5
2,1002,1
2,1003,2
3,1000,1.5
3,1001,2
4,1002,4
4,1003,4

Diving into the code

For this application, as noted in the previous section, we are building on top of the work completed in Chapter 6Anomaly Detection Model. For this deep dive, we are going to focus solely on the code that was changed for this application.

Classes that were changed or added are as follows:

  • MusicRating
  • MusicPrediction
  • Predictor
  • Trainer
  • Constants

The MusicRating class

The MusicRating class is the container class that contains the data to both predict and train our model. As described in previous chapters, the number in the LoadColumn decorator maps to the index in the CSV files. As noted in the earlier section, matrix factorization in ML.NET requires the use of normalization, as shown in the following code block:

using Microsoft.ML.Data;

namespace chapter07.ML.Objects
{
public class MusicRating
{
[LoadColumn(0)]
public float UserID { get; set; }

[LoadColumn(1)]
public float MovieID { get; set; }

[LoadColumn(2)]
public float Label { get; set; }
}
}

The MusicPrediction class

The MusicPrediction class contains the properties mapped to our prediction output. The Score contains the likelihood the prediction is accurate. We will review these values later on in this section, but for now, they can be seen in the following code block:

namespace chapter07.ML.Objects
{
public class MusicPrediction
{
public float Label { get; set; }

public float Score { get; set; }
}
}

The Predictor class

There are a couple of changes in this class to handle the music-prediction scenario, as follows:

  1. First, we create our prediction engine with the MusicRating and MusicPrediction types, like this:
var predictionEngine = MlContext.Model.CreatePredictionEngine<MusicRating, MusicPrediction>(mlModel);
  1. Next, we read the input file into a string object, like this:
var json = File.ReadAllText(inputDataFile);

               3. Next, we deserialize the string into an object of type MusicRating, like this:

var rating = JsonConvert.DeserializeObject<MusicRating>(json);
    1. Lastly, we need to run the prediction, and then output the results of the model run, as follows:
    var prediction = predictionEngine.Predict(rating);

    Console.WriteLine(
    $"Based on input:{System.Environment.NewLine}" +
    $"Label: {rating.Label} | MusicID: {rating.MusicID} |
    UserID: {rating.UserID}{System.Environment.NewLine}" +
    $"The music is {(prediction.Score > Constants.SCORE_THRESHOLD ?
    "recommended" : "not recommended")}");

    With the transform only returning the three-element vector, the original row data is outputted to give context.

    The Trainer class

    Inside the Trainer class, several modifications need to be made to support the matrix factorization. In many ways, a simplification is required due to the nature of only having three inputs:

    1. The first addition is the two constant variables for the variable encoding, shown in the following code block:
    private const string UserIDEncoding = "UserIDEncoding";
    private const string MovieIDEncoding = "MovieIDEncoding";
    1. We then build the MatrixFactorizationTrainer options. The Row and Column properties are set to the column names previously defined. The Quiet flag displays additional model building information on every iteration, as illustrated in the following code block:
    var options = new MatrixFactorizationTrainer.Options
    {
    MatrixColumnIndexColumnName = UserIDEncoding,
    MatrixRowIndexColumnName = MovieIDEncoding,
    LabelColumnName = "Label",
    NumberOfIterations = 20,
    ApproximationRank = 10,
    Quiet = false
    };
    1. We can then create the matrix factorization trainer, as follows:
    var trainingPipeline = trainingDataView.Transformer.Append(MlContext.Recommendation().Trainers.MatrixFactorization(options));
    1. Now, we fit the model on the training data and save the model, as follows:
    ITransformer trainedModel = trainingPipeLine.Fit(trainingDataView.DataView);

    MlContext.Model.Save(trainedModel, trainingDataView.DataView.Schema, ModelPath);

    Console.WriteLine($"Model saved to {ModelPath}{Environment.NewLine}");
    1. Lastly, we load the testing data and pass the data to the matrix factorization evaluator, like this:
    var testingDataView = GetDataView(testingFileName, true);

    var testSetTransform = trainedModel.Transform(testingDataView.DataView);

    var modelMetrics = MlContext.Recommendation().Evaluate(testSetTransform);

    Console.WriteLine(
    $"matrix factorization Evaluation:{Environment.NewLine}
    {Environment.NewLine}" +
    $"Loss Function: {modelMetrics.LossFunction}
    {Environment.NewLine}" +
    $"Mean Absolute Error: {modelMetrics.MeanAbsoluteError}
    {Environment.NewLine}" +
    $"Mean Squared Error: {modelMetrics.MeanSquaredError}
    {Environment.NewLine}" +
    $"R Squared: {modelMetrics.RSquared}{Environment.NewLine}"+
    $"Root Mean Squared Error:
    {modelMetrics.RootMeanSquaredError}");

    The Constants class

    In addition, given the training only requires the training data, some modifications to the Program class need to be performed, as follows:

    namespace chapter07.Common
    {
    public class Constants
    {
    public const string MODEL_FILENAME = "chapter7.mdl";

    public const float SCORE_THRESHOLD = 3.0f;
    }
    }

    Running the application

    To run the application, the process is nearly identical to Chapter 6's sample application, as follows:

    1. After preparing the data, we must then train the model by passing in the newly created sampledata.csv file, like this:
    PS Debug\netcoreapp3.0> .\chapter07.exe train ..\..\..\Data\sampledata.csv ..\..\..\Data\testdata.csv
    iter tr_rmse obj
    0 2.4172 9.6129e+01
    1 1.9634 6.6078e+01
    2 1.5140 4.2233e+01
    3 1.3417 3.5027e+01
    4 1.2860 3.2934e+01
    5 1.1818 2.9107e+01
    6 1.1414 2.7737e+01
    7 1.0669 2.4966e+01
    8 0.9819 2.2615e+01
    9 0.9055 2.0387e+01
    10 0.8656 1.9472e+01
    11 0.7534 1.6725e+01
    12 0.6862 1.5413e+01
    13 0.6240 1.4311e+01
    14 0.5621 1.3231e+01
    15 0.5241 1.2795e+01
    16 0.4863 1.2281e+01
    17 0.4571 1.1938e+01
      18 0.4209 1.1532e+01
    19 0.3975 1.1227e+01

    Model saved to Debug\netcoreapp3.0\chapter7.mdl
    1. To run the model with this file, simply pass the testdata.csv file mentioned earlier into the newly built application, and the predicted output will show the following:
    matrix factorization Evaluation:

    Loss Function: 0.140
    Mean Absolute Error: 0.279
    Mean Squared Error: 0.140
    R Squared: 0.922
    Root Mean Squared Error: 0.375

    Prior to running the prediction, create a JSON file in Notepad with the following text:

    { "UserID": 10, "MusicID": 4, "Label": 3 }

    Then save the file to your output folder.

    1. Then, run the prediction, like this:
    PS Debug\netcoreapp3.0> .\chapter07.exe predict input.json
    Based on input:
    Label: 3 | MusicID: 4 | UserID: 10
    The music is not recommended

    Feel free to modify the values, and see how the prediction changes based on the dataset that the model was trained on. A few areas of experimentation from this point might be to:

    • Change the hyperparameters mentioned in the Trainer class deep dive.
    • Add diversification and more data points to the training and test data.

    Evaluating a matrix factorization model

    As discussed in previous chapters, evaluating a model is a critical part of the overall model-building process. A poorly trained model will only provide inaccurate predictions. Fortunately, ML.NET provides many popular attributes to calculate model accuracy based on a test set at the time of training, to give you an idea of how well your model will perform in a production environment. 

    As noted earlier in the sample application, for matrix factorization model evaluation in ML.NET, there are five properties that comprise the RegressionMetrics class object. Let us dive into the properties exposed in the RegressionMetrics object here:

    • Loss function
    • Mean Squared Error (MSE)
    • Mean Absolute Error (MAE)
    • R-squared 
    • Root Mean Squared Error (RMSE)

    In the next sections, we will break down how these values are calculated, and detail the ideal values to look for.

    Loss function

    This property uses the loss function set when the matrix factorization trainer was initialized. In the case of our matrix factorization example application, we used the default constructor, which defaults to the SquaredLossRegression class. 

    Other regression loss functions offered by ML.NET are:

    • Squared-loss one class
    • Squared-loss regression

    The idea behind this property is to allow some flexibility when it comes to evaluating your model compared to the other four properties, which use fixed algorithms for evaluation.

    MSE

    MSE is defined as the measure of the average of the squares of the errors. To put this simply, take the plot shown in the following screenshot:

    The dots correlate to data points for our model, while the line going across is the prediction line. The distance between the dots and the prediction line is the error. For MSE, the value is calculated based on these points and their distances to the line. From that value, the mean is calculated. For MSE, the smaller the value, the better the fitting, and the more accurate the predictions you will have with your model.

    MSE is best used to evaluate models when outliers are critical to the prediction output.

    MAE

    MAE is similar to MSE, with the critical difference being that it sums the distances between the points and the prediction lines, as opposed to computing the mean. It should be noted that MAE does not take into account directions in calculating the sum. For instance, if you had two data points equal distance from the line, one above and the other below, in effect this would be balanced out with a positive and negative value. In machine learning, this is referred to as Mean Bias Error (MBE). However, ML.NET does not provide this as part of the RegressionMetrics class at the time of this writing.

    MAE is best used to evaluate models when outliers are considered simply anomalies, and shouldn't be counted in evaluating a model's performance.

    R-squared 

    R-squared, also called the coefficient of determination, is another method of representing how well the prediction compares to the test set. R-squared is calculated by taking the difference between each predicted value and its corresponding actual value, squaring that difference, then summing the squares for each pair of points. 

    R-squared values generally range between 0 and 1, represented as a floating-point value. A negative value can occur when the fitted model is evaluated to be worse than an average fit. However, a low number does not always reflect that the model is bad. Predictions, such as the one we looked at in this chapter, that are based on predicting human actions are often found to be under 50%. 

    Conversely, higher values aren't necessarily a sure sign of the model's performance, as this could be considered as overfitting of the model. This happens in cases when there are a lot of features fed to the model, thereby making the model more complex as compared to the model we built in the Creating your first ML.NET application section of Chapter 2, Setting Up the ML.NET Environment, as there is simply not enough diversity in the training and test sets. For example, if all of the employees were roughly the same values, and then the test set holdout was comprised of the same ranges of values, this would be considered overfitting.

    RMSE

    RMSE is arguably the easiest property to understand, given the previous methods. Take the plot shown in the following screenshot:

    In the case of testing the model, as we did previously with the holdout set, the lighter dots are the actual values from the test set, while the darker dots are the predicted values. The depicted is the distance between the predicted and actual values. RMSE simply takes a mean of all of those distances, squares that value, and then takes the square root. 

    A value under 180 is generally considered a good model.

    Summary

    Over the course of this chapter, we have deep-dived into ML.NET's matrix factorization support. We have also created and trained our first matrix factorization application to predict music recommendations. Lastly, we also dove into how to evaluate a matrix factorization model and looked at the various properties that ML.NET exposes to achieve a proper evaluation of a matrix factorization model.

    With this chapter coming to a close, we have also completed our initial investigation of the various models ML.NET provides. In the next chapter, we will be creating full applications, building on the knowledge garnered over the last few chapters, with the first being a full .NET Core application providing stock forecasting.