Predicting the Future of Investment Bankers – Hands-On Artificial Intelligence for Banking

Predicting the Future of Investment Bankers

In the previous chapter, we understood the basic financial and capital market concepts. We looked at how AI can help us in optimizing the best capital structure by running risk models and generating sales forecasts using macro-economic data. We also looked at how useful AI is while planning the financial internals of an organization and external investors' communication. We then looked at two examples – the first regarding how to optimize the funding mix of debt and equity and the second regarding performing a financial forecast that could help us with financially planning capital demand.

The goal of this chapter is to introduce additional techniques that can be used for financial planning. You will learn how to perform auto syndication for new issues so that the capital can be obtained from the interested investors. Then, you will learn how to identify acquirers and targets, a process that requires a science background so that you can pick the ones that need the banking services the most.

In this chapter, we will cover the following topics:

  • Basics of investment banking
  • Understanding data technologies
  • Clustering models
  • Auto syndication for new issues
  • Identifying acquirers and targets

Let's get started!

Basics of investment banking

Investment banking will be the focus of this chapter. Therefore, you will need to understand a few basic concepts surrounding investment banking. We will begin by understanding the challenges of an Initial Public Offer, commonly known as an IPO. When a company decides to go to the stock market in order to acquire money from the public, they release an IPO for the public and institutions to subscribe to. We will also understand the concepts of M&A, as well as how to classify investors and apply AI to mergers and acquisitions.

The job of investment bankers in IPOs

The following are some of the core problems that investment bankers deal with:

  • Pricing:What is the right price for the new issuance (of equity)?
  • Syndication:Who should we distribute the shares to and at what price?
  • Listing:How can we register these shares with the markets (such as stock exchanges) so that they pass all the requirements as investment security in the market?

Let's answer each of these questions, one by one.

To answer the first question, in the previous chapter, we briefly illustrated how to correctly model the capital structure of the company, including its financial position. The core of this remains how to estimate the drivers when given some macro indicators that matter to the company concerned.

To answer the second question, it makes a difference if we have visibility of the investment preference of the market. When the investment decision on investors is automated by the robot adviser, we should find it easy to test the demand of the investors represented by the robot. The robot needs parameters on the investment, while lots of these projections are made by an investment bank's engine; that is, its past accuracy shall also be considered when accessing information (also known as the prospectus) about potential issues. We will address this question in the first example we will complete in this chapter.

The third question focuses a lot on reporting and filing information regarding the legitimacy of the company's ownership and legal status, as well as its risk factors. When this issue is executed by a robot, there will be different requirements from the regulator/stock exchange:

  • There should be a robot from the regulator/stock exchange side to validate the claims of the filing company. Here, a robot indicates an intelligent software program that can perform specific tasks meant for it. However, it may even be possible for the listing company's CFO to upload their sales forecast as per what we discussed in Chapter 4, Mechanizing Capital Market Decisions. The material factor that impacts the sales of an electricity company is the weather, given that it is highly predictive of its sales.
  • Besides factors related to its sales, risk factors include other macroeconomics variables that impact the major financial items of financial statements. Factors sensitive to the firm's strategies will be covered in the second example of Chapter 7, Sensing Market Sentiment for Algorithm Marketing at the Sell-Side. We will do this here since the investor-side also influences the important topics that need to be focused on.

Stock classification – style

There are two schools of thought when it comes to classifying stocks: one based on qualitative features and another based on quantitative features. We will be focusing on the qualitative approach, which is called style. An example of such a scheme is Morningstar Style Box (http://news.morningstar.com/pdfs/FactSheet_StyleBox_Final.pdf).

Here, we can look at the sector/industry, the size of the stocks, the riskiness of the stock, the potential of the stock, and so on. There are many ways to create features and classify stocks. We will use sector and size as the features for qualitative classification in this chapter.

The quantitative approach (for example, arbitrage pricing theory (APT)) groups stocks that contain similar factors together analytically.

Investor classification

Like stock classification, there are both quantitative and qualitative approaches. Qualitative could be based on the type of money (pension, sovereign wealth, insurance, and so on), strategies (long-short, global macro, and so on), underlying holdings (futures, commodities, equities, bonds, and private equities), riskiness, and so on. Quantitative could be based on proximate factors that these investors are based on. In the first example of this chapter, we will use investment riskiness and return as the features for qualitative classification.

Mergers and acquisitions

Investment banking covers not just listing securities but also advisory services such as mergers and acquisitions (M&A), financial opinions such as company valuation, and other event-driven financing operators such as management buyout. In short, all these activities deal with buying and selling companies and/or company assets and pricing them correctly. The easiest way to understand this is by thinking about property brokers, appraisers, and mortgage bankers in terms of buying a house. M&A is like two people getting married – sometimes, one will be more dominant, while other times, it is a marriage of two equal entities. The rationale behind this is that a firm exists because it is more efficient to operate, as theorized by Ronald Coase in 1937. As technologies, regulations, and consumer preferences change, the economic boundary of a firm changes too, which makes the case for M&A.

We are largely talking about the following types of transactions:

  • Acquisition (acquiring another firm)
  • Merger (two or more firms combine)
  • Divestiture (sell itself)
  • Spin-off (selling part of the firm), and so on

Another dimension of classifying M&A is done through the pre-deal relationship between the acquirers and the target: if they are both in the same industry, this is called horizontal integration; if they are in a supplier-customer relationship, this is called vertical integration; when neither of them are linked, this is called diversification.

As an investment banker, these are the key areas you need to look into:

  • Predeal matter: Ensure the commitment/willingness of the acquirer and the target to embark on a journey together to explore a deal.
  • Approvals: Approval by regulators or existing shareholders.
  • Postdeal matter: Deliver 1 + 1 > 2. This is not because of bad math; this is because certain processes are more integrated to deliver better results. So, when two entities (companies) are added together, the cost will be lower or the revenue will be higher.

According to the guide for dummies (https://www.dummies.com/business/corporate-finance/mergers-and-acquisitions/steps-of-the-ma-process/), an M&A deal can be summarized with the following steps:

  1. Contact the targets
  2. Exchange documents and pricing
  3. Conduct due diligence
  4. Close the deal
  5. Perform post-deal integration

Next, we'll look at the application of AI in M&A.

Application of AI in M&A

With regard to the application of AI for bankers, AI is used to identify the right target and help quantify the pricing for post-deal synergies. Both of these steps (the first and the last step) are highly unreliable under the existing setting, where there is not much science involved. Firstly, bankers' time is very expensive, while the mortality rate of any prospection deal is very high (for example, 90%). Clients (buyer/seller) will have the incentive of maximizing the banker's service hours, even though no deal may be closed. Given the banker's limited time and clients' conflicting goal of maximizing the banker's time, regardless of their actual intention to close any deal, the best approach is to find the actual economics derived from the M&A deal. If it works fundamentally, there shall be a higher urgency to announce and engage the investment bankers on deal execution/announcement.

The modeling approach actually exists today in credit risk modeling, which we mentioned in the previous chapters. Given the financials, we predict a binary outcome regarding whether an event occurs or not. In the case of the credit risk model, there is bankruptcy occurrence within X years; whereas, for mergers, it could be an acquisition or divestment announcement within X years given the financials. I personally do not see any difference between these modeling approaches if the probability of bankruptcy can be estimated as such.

Secondly, when it comes to quantifying post-deal synergies, there is either cost-efficiency, revenue growth, or higher productivity with a better mix of knowledge transfer by the staff:

  • When it comes to cost-efficiency, we can easily run the sales analysis for the cost relationship in the industry to quantitatively validate whether it is the actual behavior of the industry or just some wishful thinking that the suppliers will accept lower payment from the combined company.
  • With regard to revenue synergies, this is a massive data exchange exercise and can only be done with proper machine learning models. For example, if the synergy is about better market access (for example, competitor A, a buyer, buying competitor B in the same industry), the targeting model of competitor A shall be run on competitor B's customer database to derive how much revenue will likely be generated. This happens with joint database marketing programs; for example, insurance distributed via banks (Bancassurance). Here, the insurers provide the model to run on the bank's customer database.
  • For know-how related to human resources synergies, I see an equal possibility of applying HR analytics to measure and quantify knowledge, skill level and cultural fitness, and team performance. The hard and soft side of the staff shall be measured, projected, and simulated in the pre-due synergy analysis.
To do this, I do not believe that any existing M&A banker would be willing to change much because the time spent on doing this would be rather long given that, right now, the extent of digitalization for customers and staff is not yet mainstream. This means that features and models can't do this. But I do believe that we should work on this future model of M&A, especially now that we are building the future of M&A and training the next generation.

Compared to a financial investment, M&A has a huge uncertainty in terms of it operational integration, which is exactly where AI should deliver value. Numerous studies have been conducted on the determinant factors of a successful M&A deal that deliver the promised synergies; these findings or features from academic researches need to be collected and run through in order to generate a quantifiable success likelihood and will be priced while calculating the offering price.

Filing obligations of listing companies

To ensure there are fair markets for the investors of publicly listed securities, the exchange requires that we announce the occurrence of events such as the release of financial results, major company activities that affect the valuation of security, and so on. For example, you can refer to the New York Stock Exchange'sIPO guide (https://www.nyse.com/publicdocs/nyse/listing/nyse_ipo_guide.pdf).

Understanding data technologies

We are going to manage a large amount of data through the examples in this chapter. Due to this, it is critical to understand the underlying data technologies that we will use. These data technologies are related to storing varying types of data and information. There are two challenges related to information storage – first is the physical medium that we use to store the information, while the second is the format in which the information is stored.

Hadoop is one such solution that allows stored files to be physically distributed. This helps us to deal with various issues such as storing a large amount of data in one place, backup, recovery, and so on. In our case, we store the data on one computer as the size does not justify using this technology, but the following NoSQL databases could support this storage option. In Python, there is another file format called HDF5, which also supports distributed filesystems.

While NoSQL databases can be used, the reason why I am not using them in this chapter can be explained with the help of the following table, which compares SQLite, Cassandra, and MongoDB side by side:

Pros

Cons

Conclusions

SQLite

Structured data format, compatible with DataFrames

Cannot save unstructured data.

We need this for simplicity.

Cassandra

Can run at distributed computing and can put in structured data (with fields as items)

When dealing with structured data, the syntax is not straightforward to insert.

We can't use these for our case as we aim to cluster similar investors and predict who will buy our new issues in IPO.

MongoDB

Can handle huge data sizes and parallel processing of different records at scale

Not suitable for fully structured data such as trading records; still need to convert it into a DataFrame before running any machine learning algorithm.

Through this analysis, we see that it may not be necessary to have a NoSQL database for the sake of being cutting-edge. In the case of capital markets, where data is quite structured, it could be more efficient to use a SQL database that fits this purpose.

Clustering models

Before we start looking at the programming content, let's take a look at clustering models, since we will be using one in our first example.

Clustering seeks to group similar data points together. As a simple example, when there are three data points, each with one column, [1],[2],[6], respectively, we pick one point as the centroid that represents the nearby points; for example, with two centroids, [1.5] and [5], each represents a cluster: one with [1],[2] and another cluster with [6], respectively. These sample clusters can be seen in the following diagram:

When there are two columns for each data point, the distance between the actual data point and the centroid needs to consider the two columns as one data point. We adopt a measurement called Euclidean distance for this.

One of the key challenges of adopting clustering in banking is that it leads to clusters that are too large, which reduces the true positive rate if all the clusters are targeted. As per my experience, I would use it for preliminary data analysis to understand the major dynamics of the target populations, not necessarily to draw actionable insights that make economic sense in a wholesale banking setting. In our example, we will create lots of clusters with the very stringent requirement that the distance of each data point from the centroid averages a 5% deviation.

Another key question regarding the clustering algorithm is determining how many features we feed it. We could commit bias clustering by overweighing certain types of financial ratios (for example, using two different kinds of profitability ratios, such as return on equity and return on asset) for clustering. The solution to this is to run principle component analysis, which removes similar features by merging them into the same feature.

For a non-finance/banking example, you can refer to Building Recommendation Engines by Suresh Kumar Gorakala (https://www.packtpub.com/big-data-and-business-intelligence/building-practical-recommendation-engines-part-1-video).

Auto syndication for new issues

If there are issues, there are investors behind them. Traditional investment banks will hire a group of professionals called the syndication desk to handle the allocation of security issues to investors who can buy these shares and bonds.

If we consider the role of the syndication desk of the investment bank, our work will be to identify the cornerstone investors of the upcoming new issues with Duke Energy, as the CFO has the funding needs in equities. To do so, we will use the institutional holding data of US stocks from SEC filing via Quandl/Sharadar, which will help us find out the investment preferences of investors who share similar interests and match those with the investors who also hold similar stocks, such as Duke Energy.

With regard to who to sell to, we will take the largest investors of US stocks as our universe of investors. The syndicated desk's job is to sell the major position of any equity issues to these investors. Using the unsupervised learning method, we recommend the relevant stocks to the right investors as an initial public offering. This can be done using securities similarities (called holding similarities) and investment styles (called investor similarities).

Solving the problem

The following diagram shows the steps involved in solving the problem at hand:

We will cover each step in detail in the following sections.

Building similarity models

Here, we will build two similarity models – one on stock similarity and another on finding similar investors. Both models are clustering models, and they belong to the last type of machine learning approach – unsupervised learning. We have picked 21 financial ratios to build the clustering model at the stock level, while for the investor model, we have a maximum of 60 features (six capitalization sizes * five investment decisions * two types of indicators):

  • Six capitalization scales: Nano, Micro, Small, Medium, Large, and Mega
  • Five investment decisions: Two for Buy (New, or Partial), one for Hold, and two for Sell (All or Partial)
  • Seven indicators: Quarterly return (total return, realized, unrealized), new money changing rate's mean and standard deviation, and current value

Import all the relevant libraries and then load the ticker's universe by reading the CSV files together with the scale fields that describe the stocks. To reduce the processing time, load the investor lists instead of all the investors. For each investor, calculate the direction per market segment stock (that is, we use scale as the only market segment, but in reality, we should use country × industry × scale).

Building the investor clustering model

To build an investor clustering model, loop through the investors and calculate the movement and the profit (realized and unrealized profit), as follows:

  1. Import the required libraries and data:
'''************************
Load Data
'''
#import relevant libraries
import quandl
from datetime import date,timedelta
import pandas as pd
import os

#load tickers universe and description field (scale)
...

#loop through investors
...

for investor in investorNameList:
...
#calculate the change in position by ticker on
Quarter-to-quarter basis
...

#qualify investor's activities
print('classify investor decision')
...
#output the ticker’s activities of the investor
  1. Prepare the investor profiles:
## Prepare investor Profile'''
#load relevant libraries
import os
import pandas as pd
import numpy as np
from time import time
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import pickle

...

#Summarize quarterly performance of investors per quarter
...
for file in file_list:
...
for index, row in tmp_pd.iterrows():
...

#calculate return (realized, unrealized and new money)
...
  1. Prepare the cluster investors, as well as the output clusters and results:
## Cluster investors
#cleansed and transform data for clustering
...

sc_X = StandardScaler()
X = sc_X.fit_transform(investor_pd)

#define the k means function
def bench_k_means(estimator, name, data):
...

#try out different K means parameters and find out the best parameters
...

for num_cluster in range(5, 500):
KMeans_model = KMeans(init='k-means++', \
n_clusters=num_cluster, n_init=10)
...

## Output the results
#Output clusters

Here, we run clustering analysis on the features that list the realized and unrealized return by market. Then, we set a threshold of 0.05, which means that the clusters that we build have to have 5% variation across the feature variables. Finally, we output the clustering results; that is, the clustering results, the clustering model, and the scaler.

Building the stock-clustering model

To build the stock-clustering model, we will load the data, prepare the stock's profile, cluster the stock, and output the clusters and results:

  1. Load the industry, tickers, and functions and import the libraries and the KPI key for Quandl:
'''*************************************
i. load industry, tickers and functions
'''
#import libraries
import quandl
import pandas as pd
import numpy as np
import os
from time import time
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import pickle

#KPI keys
...

...
  1. Use sklearn to run the training models. Use pickle to load up the results and models. Then, download the fundamental data of the ticker for the latest annual financials:
#define important functions
#download fundamental data of the ticker
def download_tkr(tkr):
...
  1. Define the required k means clustering functions. Filter the industry by using tickers exceeding a cutoff. Here, we will use 100 tickers as the cutoff. When assessing the industry, we list the clusters in the industry. Then, download the financial data from the industry that passes this threshold. For each of the tickers in the industry cluster, clean the data type:
#kmean clustering function
def bench_k_means(estimator, name, data):
...


'''*************************************
#2a. load data
'''
#parameters
...

'''*************************************
#i. filter the industry in scope
'''
...

#collect tkr in each industry
for index, row in df_tkr.iterrows():
...
  1. Then, calculate the clustering model of the industry. The maximum number of clusters should be half of the total number of tickers in the industry. The clustering model will stop if it reaches the maximum silhouette score of 5%, which is the target, or when it reaches N/2 clusters (N = number of tickers in the industry):
'''*************************************
#ii. create a dataframe for each industry to do clustering
'''
...
#loop through the industry
for ind, list_tkr in dict_ind_tkr.items():
...
#Go through the ticker list to Download data from source
#loop through tickers from that industry
for tkr in list_tkr:
...

'''*************************************
2b. prepare features for clustering for the industry
'''
#convert to float and calc the difference across rows
...
'''*************************************
2C. Perform K means clustering for the industry
'''
#clustering
sc_X = StandardScaler()
X = sc_X.fit_transform(df_fs_filter)

...
for num_cluster in range(5, max_clsuter):
KMeans_model = KMeans(init='k-means++', \
n_clusters=num_cluster, n_init=10)
...
  1. Output the scalar, clustering model, and the result of clustering:
    '''*************************************
2D. Output the clustering model and scaler for the industry
'''
#Output clusters
...

By adopting the methodology we developed in the previous chapter on financial projection, we can derive the financial statements and hence the financial ratios used for classifying the stock later.

In the example we looked at in the previous chapter, we projected the capital structure after issuing debt and equity. But to begin with, we did not assume any movement in stock price, for example, P/E ratios, except movement in profitability, scale, and so on.

To forecast the financials of the new stock, perform the following steps:

  1. Import all the relevant libraries and use pickle to load the results and models:
#import relevant libraries
import os
import pickle
import math
import numpy as np
import pandas as pd
import quandl


...
  1. Leverage the program we built in the previous chapter and run the financial projection defined in the preceding section. Then, calculate the metrics of the projected financials for the company to be listed:
#perform financial projection
#reuse the function developed for WACC optimization
def cal_F_financials(record_db_f, logreg, logreg_sc, new_debt_pct, price_offering, levered_beta, sales_growth, coefs, r_free):
...


'''*****************************
Step 2: Simulate financial of the new stock
'''
...

#load credit model built previously
...

#reuse the parameters developed from WACC example
...

#assume that we are raising equity for the same client
...

#run simulation / projection of financial data
...

As we can see, the stock clusters look like the new stock we are working on. The clustering model will tell us which other stocks in the same cluster this new stock is associated with.

There is a shortcut we can use when building the model on stocks, which is also a practical consideration. For stocks in an industry that has too few stocks (less than 100, for example), there is no need to build a clustering model to help us find the subgroups within the industry. Instead, we should go and check every single stock if there aren't many of them.

Given the complete member list of the stock clusters, we can go to the existing stockholders of these stocks to find out the current owners (investor list A). If we still need more names to approach, then we can run another investor-level clustering model to find out who else (investor list B) might be interested in this stock that shares similar traits as investor list A.

Follow these steps to perform clustering:

  1. Find the stocks that have similar financials to the one we are looking to list/syndicate and which share the same industry.
  2. Based on the stocks we found, we find out who the existing holders of the stocks are.
  3. We find the list of investors who hold the stocks we checked; that is, the selected investors.
  4. We find the clustering IDs of all the investors.
  5. Given the selected investors, find out clusters and the other investors that share the same cluster ID. Those are the target investors we will sell the new issue to.

The following is the pseudocode we can use to perform clustering:

#Step 2 and 3. Perform clustering to find out the similar investors whose sharing the similar stocks


'''*****************************
Step 3: Run the similarity models to find out holders of the similar stocks
'''
#check if we need any model - if industry has too few stocks, no model needed to find out the similar stocks
...

#retrieve the list of tickers that are similar
...

#find list of investors looking at the similar size and more
#check which investors have it...
...

#loop through investors holding name by name to find out investor that is holding the similar stocks
for filename in investorNameList:
...

#Load the investor clustering model
...
#extract the investors' cluster id
...

#find out who else share the same cluster id
...

#print out the investor list
...

The preceding code shows how to list the clustered investors with similar portfolio stocks. Here, we have built a clustered model for investors and used it. In the next section, we will build an understanding of acquirers and targets.

Identifying acquirers and targets

There has been a long history of corporate finance research in the field of acquirers and targets, and our challenge is to apply this rich body of research to the real world. Hedge funds have been applying these research findings as merger arbitrage, and M&A bankers have always had their eyes on scoring and assessing the market on a regular basis (for example, reading the morning news).

In this chapter, we will assume that you are an M&A banker looking for organization opportunities. To optimize our time allocation, we can allocate our time better by focusing on clients that can close the deal. Therefore, we will use a model to predict the probability of us being the acquirers or targets in M&A.

The current new generation of investment bankers should use automated financial modeling tools. Over time, data can be captured, and then prediction capability can be added to assist bankers in financial modeling. The current world, which uses Excel, definitely needs to do more NLP research into how to train a machine to parse/understand an Excel-based financial model, which is understood by humans but barely understood by the machine at all!

Secondly, an M&A prediction model should be part of the investment committee/mandate acceptance committee, where the likelihood of announcing the deal shall be presented – just like how credit ratings are presented in credit committees today.

So, let's see how we can apply a similar approach to credit rating in M&A prediction to spot a deal.

Follow these steps to solve this problem. We will start by loading the necessary Python libraries:

  1. Import all the required libraries and define the key variables:
'''*************************************
#1. Import libraries and define key variables
'''
import pandas as pd
import numpy as np
import quandl
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report,roc_curve, auc,confusion_matrix,f1_score
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
import pickle
import graphviz

#KPI keys
quandl.ApiConfig.api_key = '[API Key for Quandl]'
  1. Download the financials of a given ticker (leverage them from the previous example) and define the function you will use to train the tree and the neural network, including grid search (all of these can be leveraged from Chapter 3, Using Features and Reinforcement Learning to Automate Bank Financing):
'''*************************************
#2. Definition of functions
'''
#2a.Download tickers
def download_tkr(tkr):
...
#2b.Train tree
def train_tree(X,Y,ind):
...
##2C Neural Network
#2Ci. Grid search that simulate the performance of different neural network design
def grid_search(X_train,X_test, Y_train,Y_test,num_training_sample):
...
#2Cii. Train Neural Network
def train_NN(X,Y,ind):
...
  1. Filter the industries that have a sizable number of tickers and run through the industry and its respective tickers to build the decision tree and the neural network:
def filterIndustriesByTickets(ind):
  1. Output the results by the ROC curve per industry:
def displayCurveChart(type, ind):
  1. Load the list of companies from the file alongside their sectors, just like we did auto syndication. Select the sectors that have at least 30 companies to ensure size. Load the tickers of the same sector into one entry on a dictionary, taking the sector as the key and tickers as values:
'''*************************************
3. Execute the program
#3a. filter the industry in scope
'''
groupby_fld = 'sicsector'
min_size = 30
df_tkr = pd.read_csv('industry_tickers_list.csv')
...
#collect ticker in each industry
for index, row in df_tkr.iterrows():
ind = row[groupby_fld]
tkr = row['ticker']
if ind in list_scope:
if ind in dict_ind_tkr:
dict_ind_tkr[ind].append(tkr)
else:
dict_ind_tkr[ind] = [tkr
  1. Loop through the selected sectors one by one and load the companies' historical financials. For each company, we will load 10 year's worth of annual financial records:
#loop through the dictionary - one industry at a time
for ind, list_tkr in dict_ind_tkr.items():
df_X = pd.DataFrame({})
df_Y = pd.DataFrame({})
print(ind)
#Go through the ticker list to Download data from source
#loop through tickers from that industry
for tkr in list_tkr:
print(tkr)
try:
df_tmp,X_tmp,Y_tmp = download_tkr(tkr)
...

Here, we loaded the events of the company. After loading the events, we filtered only those related to M&A and made them a binary column to denote whether the company has completed any M&A within 1 calendar year, where 1 equals yes. Then, we joined both company financials and events together—we joined both t-1 years financials with the binary event indicator at year t. We converted the null event into 0. Most of this logic is implemented to prepare the financials and events, which is done via download_tkr(tkr).

  1. Split the data from the industry in order to train the models:
    #neural network
nn_clf,f1_score_temp = train_NN(df_X,df_Y,ind)
f1_list.append(f1_score_temp)
nn_clf.get_params()

#decision tree
try:
tree_clf,f1_score_temp = train_tree(df_X,df_Y,ind)
except Exception:
continue

f1_list.append(f1_score_temp)
tree_clf.get_params()

Here, we leveraged what we built in Chapter 2, Time Series Analysis. However, for the sake of illustration, we only used a decision tree and neural network code.

This brings us to the end of this chapter.

Summary

In this chapter, you understood the basics of investment banking. Now, you should be able to understand the concepts of IPO and M&A. Based on the data technologies you learned about in this chapter, you should be able to model the domain requirements. With the use of the clustering model technique, you can now create high-performance artificial intelligence systems.

After this, we completed an exercise where we solved the problem of auto syndication for new issues. We also looked at an example regarding how to identify acquirers and targets.

In the next chapter, we will focus on portfolio management, asset management, and a few artificial techniques that are suitable in the domain of portfolio management.