Measuring the ROI of Investment Prediction Models

by Chris Conlan

The following is a draft from a chapter of the upcoming book: The Financial Data Playbook

Data scientists want to build prediction models with a high level of accuracy. Executives want to invest in projects with a high return on investment. This chapter will attempt to establish a relationship between the accuracy of machine learning models and the rate of return on investing operations. Ultimately, it will help you understand when and how it can make sense to invest in financial data projects.

The following discussion is mathematically general and applies to any financial services company that invests or loans money, including, for example, mutual funds, banks, and real estate developers.


The following section sets up a mathematical model for discussing the circumstances under which it is appropriate to invest in financial data projects based on their ability to improve investment performance.

Equity Curve and Internal Rate of Return

Suppose that a financial services business holds both investments and cash. The investments are in a portfolio with a total value represented by $P_t$. The amount of the business's cash is represented by $C_t$. We will measure the performance of business's investments in financial data projects by measuring the change in the equity curve $E_t$ for $t \in 0,...,T$.

$$ E_t = C_t + P_t $$

We define $\Delta{E_t}$ as follows, and similarly for $C_t$ and $P_t$.

$$ \Delta{E_t} = E_t - E_{t-1} $$

Our primary measure of performance will be the Internal Rate of Return (IRR), represented by $r_I$, which is calculated by numerically solving the following equation.

$$ \sum_{t=1}^{T} \dfrac{ \Delta{E_t} }{ (1 + r_I )^{t-1} } = 0 $$

In practice, one unit of $t$ can represent any amount of time based on the investment scenario being studied.

When we refer to cash as being either expensed or gained at time $t$, we assume that the change in the cash balance occurred at some time between the $t-1$ and $t$, thus it would be represented in $\Delta{E_t}$ and $\Delta{C_t}$.

For analytic simplicity, we will frequently consider the following scenario.

  • $T$ goes to $\infty$.
  • $\Delta{E_t}$ is a constant positive number, $\Delta{E_c}$, for $t > 1$
  • $\Delta{E_1}$ is a negative number representing the amount of a single upfront investment.

In this scenario, $r_I$ has a closed-form solution provided by the known properties of infinite geometric series.

$$ r_I = - \frac{\Delta{E_c}}{\Delta{E_1}} $$

If we define $\Delta{E_1}' = -\Delta{E_1}$ in order to treat the upfront cost as a positive number, the IRR formula reduces to something resembling the return on investment (ROI) formula, where $\Delta{E_1}'$ represents the invested capital and $\Delta{E_c}$ represents the profit on the investment.

$$ r_I = \frac{\Delta{E_c}}{\Delta{E_1}'} $$

We will also consider the following trivial scenario occasionally.

  • $t \in 1, 2$. $T=2$.
  • $\Delta{E_1}$ is a negative number representing the amount of a single upfront investment.
  • $\Delta{E_2}$ is a positive number representing the one-time return from the investment.

In this scenario, $r_I$ is reduces to the formula for return on investment (ROI).

$$ r_I = \frac{\Delta{E_2}}{\Delta{E_1}'} - 1 $$

With these definitions settled, we will proceed to integrate the notion of predictive accuracy into our mathematical model.

Financial Agents, Prediction Models, and Accuracy

Assume that the business employs some number of financial agents that are responsible for making and managing investments on behalf of the business. The business incurs a cost for employing these agents, in the form of salaries, with the aim of profiting from their investment expertise. They predict the outcomes of potential investment opportunities with a certain accuracy, $R_A^2$, which represents the standardized mean squared error, or R-squared, of the agents' predictions. The R-squared accuracy, or just the accuracy, for the purposes of our discussion, is a number bounded between 0 and 1, where 0 represents a set of uninformed predictions, and 1 represents a set of flawless predictions.

The R-squared accuracy is not typically used to evaluate investment management performance, because that would require that we evaluate the accuracy of their predictions on investments that were not made. In certain industries, and for a number of asset types, this is an impossible task. Nonetheless, for our discussion, we will model the accuracy of predictions made by financial agents, for investments they both made and did not make, for the purpose of comparing them to the predictions made by machine learning models. Machine learning models typically use R-squared as a measure of accuracy, so it necessary to measure the agents' accuracy in these terms to establish a common performance metric. We will proceed with a definition of R-squared.

Given a set of $n$ investment opportunities, $i \in 1,...,n$, each has an outcome $y_i$ and a predicted outcome of $\hat{y}_i$. The error of the outcome is represented as $y_i - \hat{y}_i$ and the squared error is its square. The R-squared is the proportion of the variance of the random variable $Y$ accounted for by the predictions. Given the sample mean, $\bar{y}$, of the random variable $Y$, the following is the most expository way of calculating R-squared.

$$ R^2 = 1 - \frac{ \sum_{i=1}^{n}(y_i - \hat{y} _ i)^2 } { \sum_{i=1}^{n}(y_i - \bar{y})^2} $$

In other words, the R-squared represents, typically on a scale of 0 to 1, how much better the model's predictions are than a naive prediction using the sample mean. In practice, there is nothing preventing the R-squared from being negative. If the predictions are, overall, worse than a naive prediction using the sample mean, then the R-squared will be negative.

The above R-squared formula applies to what are called regression problems in machine learning. Regression problems involve predicting the exact value of a random variable, like the price of a stock tomorrow. We will still use the term R-squared and the variable $R^2$ to refer to the prediction accuracy in classification problems, but the formula is different. It is as follows, where $\mathbb{1}[ f(\cdot) ]$ represents the indicator function, which returns the value $1$ if the condition $f(\cdot)$ is true and $0$ if it is false. The formulation is also referred to as the classification accuracy.

$$ R^2 = \frac{1}{n} \sum_{i=1}^{n}\mathbb{1}[y_{i} \equiv \hat{y}_i] $$

We will move on to bridge these concepts in the context of financial machine learning projects within different types of organizations.

Market Makers

We will use a simple model of the business of a market maker for our first example. Market makers make frequent short-term trades. If an asset has a bid and an ask price, the market maker typically endeavors to buy at the bid and sell at the ask, potentially with a brief holding period, in order to make a small profit. They aim to do this many times per day over many assets to produce a consistent stream of mostly profitable trades.

For simplicity, we will assume that all the trades made by the market maker are of equal initial dollar value. We will model the returns on these trades as a random normal variable $X \sim \mathcal{N}(\mu,\sigma^2)$ with $\mu=0$. The market makers, acting as financial agents of the business, will be responsible for predicting whether or not $X$ is positive or negative, and they will take the appropriate position (long or short) based on this prediction. In other words, the market makers will predict $Y$, which is the sign of $X$, with accuracy $R_A^2$, and trade the assets accordingly. The percentage return on trade $i$ can be represented as $x_i * \hat{y}_i$.

In order to calculate how this trading behavior affects the equity of the financial services business, $E_t$, we need to make some assumptions about how money is managed within the market making business. We will assume that the firm makes $m$ investments per period $t$ each with $C_m = C_0 / m$ dollars. We assume this is the invested amount per trade regardless of the prior day's performance or cash balance, $C_t$. All gains or losses from trading will be represented as changes to the cash balance $\Delta{C_t}$. The portfolio will be liquidated before the end of each period, meaning that $P_t = 0$ and $E_t = C_t$ throughout.

Thus, the component of $\Delta{E_t}$ attributable to trading activities is as follows, given that $m_t$ is the set of all trades made between $t-1$ and $t$.

$$ C_m \sum_{i \in m_t} x_i * \hat{y}_i $$

By separating the winning trades from the losing trades, we can start to see how predictive accuracy affects performance of the business and the IRR.

Given that $X$ is normally distributed with $\mu=0$, the mean of $|X|$ is $\sqrt{\frac{2}{\pi}} \sigma = \gamma \sigma \approx 0.798 * \sigma$. We can use this fact to start to separate out the effects of correct and incorrect predictions. The above equation can be expressed as follows, given the set of $m_t^+$ correct predictions and $m_t^-$ incorrect predictions.

$$ C_m \left( \sum_{i \in m_t^+} |x_i| - \sum_{i \in m_t^-} |x_i| \right) $$

Given that $m_t^+$ has $R^2_A m$ elements, $m_t^-$ has $1 - R^2_A m$ elements, and the mean of $|X|$ is $\gamma \sigma$, the expected impact on $\Delta{E_t}$ from trading activities is as follows.

$$ C_m \left( R^2_A m \gamma \sigma - (1 - R^2_A) m \gamma \sigma \right) $$

Reducing the expression further, we get this very brief expression for the expected gains or losses from trading activities.

$$ C_0 \gamma \sigma \left( 2R^2_A - 1 \right) $$

The above expression tells us that the profit or loss of the market making business is a function of the invested cash, the volatility of the traded assets, and the prediction accuracy. The maximum expected profit or loss is $C_0 \gamma \sigma$ with perfectly accurate or perfectly inaccurate predictions, respectively.

If we treat $\Delta{E_t} / C_0$ as the return on invested capital and $\sigma$ proportional to the volatility of the overall portfolio, then we assert that the Sharpe Ratio is proportional to $2R^2_A - 1$, zero-centered accuracy of predictions.

Interpretation and Discussion

Financial services professionals often discuss the meaning of $51\%$ accuracy as a way of expressing the simple fact that making a profit involves placing more winning bets than losing bets. In practice, they may be unexcited to invest in projects that only marginally improve the accuracy of their predictions. This analysis offers a justification for such projects.

Say, for example, that a market making firm employs a cabal of discretionary traders that place $52\%$ winning bets. The firm makes money year after year and is comfortable with its operating procedures. Then, a hot shot analyst comes along and claims he can make a predictive model that has $53\%$ accuracy. Should the firm invest time and money into the analyst's idea? This mathematical model provides a method for answering that question.

Consider, in this example, that the firm has \$1 billion in capital, and a unit increment in $t$ represents the time it takes them to turn over \$1 billion in trading volume. For this firm, it happens once per hour during the NYSE trading day. The volatility of their trades is $0.2\%$ on average, and there are approximately 1,620 trading hours in a year. They make a healthy \$104 million in trading gains each year with this business model and financial agents with $R^2_A = 0.52$. If their financial agents were aided by a machine learning model with $R^2_B = 0.53$, their yearly trading gains become \$156 million. In other words, each percentage point gained in overall accuracy yields an additional \$52 million in trading gains and an additional 5% return on assets. In practice, market makers understand this, and they are in constant competition with each other to build the most accuracy models and the most efficient execution pipelines.

Because there is such intense competition in the market making industry, models are rarely useful for more than 18 months. As such, when computing the IRR for market makers, we should model a single upfront investment, representing the yearly expenses on model development, and a single payout, representing the gains in accuracy accrued from the model developed in the previous year. So, when budgeting for machine learning development at a high-frequency market maker, a simple return on invested capital (ROIC) analysis will do.

The mathematical model we have proposed here can be also be extended to work for any equities investment business that allocates capital across a wide variety of assets. The only necessary adjustments are the definitions of a unit of $t$ and the volatility $\sigma$. We will extend this model further to different types of assets and different styles of investing.