# UseR! 2013 in Albacete, Spain

The UseR! 2013 Conference was held in Albacete, Spain July 9-12, 2013. I attended the conference and presented a summary of the conference highlights to the DFW R User Group at the August 24th meeting.

Since it takes about 90 minutes to deliver the presentation, I reworked it in Beamer so that it would have a progress outline and easier bullet point/remote control clicking.

The slides from the presentation are available in two forms:

- Including photos of Albacete and the venue. This download is 37 megabytes.
- A version without the photos. This download is 2.6 megabytes.

In converting from PowerPoint to Beamer, there are some issues with quotes. If you see a typo, let me know.

- Details
- Written by Bruce Moore
- Hits: 3974

# Calculating the Marginal Cost of Funds for Deposit Pricing

Interest expense is the largest item on most banks' income statement, but this expense frequently receives significantly less attention than expense items that are far less important to the profitability of the institution. The discussion that follows will show you how to do some simple calculations to help you set prices for deposit products in ways that can significantly improve the profitability. The discussion will be focused on how to determine the marginal cost of funds for depository products.

### Collecting the Necessary Data

"What is our marginal cost of funds?" is a question that is infrequently asked and less frequently answered at many institutions. For some products, it can be relatively simple, while for others it can be fiendishly complex and require statistical analysis. The discussion that follows will help to calculate the simple straightforward costs and to know what is involved for the difficult calculations. To calculate marginal costs, you must first gather the necessary data. For checking, savings, and money market (MM) products, this is fairly simple--you just need the aggregate balance for the product, and the weighted average interest rate over the period--probably a month. If your products have different interest rates depending upon the account balance, you should treat each tier separately in this analysis. You should go back as far possible; three years of history is probably the minimum to do a good analysis.

For certificate of deposit (CD) products, the data collection is somewhat more difficult, as you will need the total balance, the amount that matured during the period, the amount that rolled over, the maturing deposits that were withdrawn, the non-maturing deposits that were withdrawn, and the new deposits added during the period. This may not be available in an master customer information file (MCIF) extract and may require a special extract from Information Systems.

Getting data on a monthly basis is probably the easiest, as this is frequently captured for an MCIF extract. Unfortunately, monthly isn't necessarily the best choice as the number of days varies; for some types of analysis, this will introduce unwanted variability, but if monthly data is the only choice available, you can generally use it without too much difficulty.

Once you have the data from your internal systems, you will need to enhance it with some publicly available data on market interest rates. Many institutions use Treasury rates for transfer pricing, but for marginal cost analysis, swap rates are generally a better choice, as swap rates closely follow the top of the market for bank CD offerings and provide a very good proxy for the competitive market. Federal Home Loan Bank (FHLB) and other outside borrowing rates are perhaps the best proxy for the competitive market CD market, but using FHLB rates requires more code to download the rates and requires that you do your own averaging over the period. The best data would be competitor's actual rates, but gathering history for competitors' rates would be very difficult. Swap rates are readily available from the St. Louis Federal Reserve's FRED2 system for several different time intervals, making swap rates the most practical choice for marginal cost analysis.

Once you have both in-house data and swap rate data, merge them so that you have a table with period (date), balance, nominal interest rate, and swap (benchmark) interest rate. For checking, savings and money market accounts, use the one year swap rate. For CD products, use the swap rate term that most closely matches the CD term. For a money market product, the table would have the following structure:

## dateVal balance_change.MM balance.MM nominal_int_rate.MM benchmark_int_rate.MM delta_to_benchmark.MM ## 1 2003-02-01 -550270 99449730 1.3311 1.40 -0.06894 ## 2 2003-03-01 -1526234 98473766 1.2163 1.32 -0.10373 ## 3 2003-04-01 -1435229 98564771 1.2628 1.36 -0.09719 ## 4 2003-05-01 -421703 99578297 1.1218 1.26 -0.13816 ## 5 2003-06-01 -837693 99162307 0.9622 1.09 -0.12783 ## 6 2003-07-01 -799519 99200481 1.0929 1.21 -0.11706 ## 7 2003-08-01 -1211073 98788927 1.2915 1.42 -0.12849 ## 8 2003-09-01 -1490083 98509917 1.1667 1.34 -0.17328 ## 9 2003-10-01 -1069795 98930205 1.3054 1.42 -0.11457 ## 10 2003-11-01 -694666 99305334 1.4484 1.52 -0.07157

### Calculating the Marginal Cost of Funds for Money Market, Savings and Checking Products

When you plot the balance against the nominal interest rate, you will probably get a plot that shows no discernible pattern, similar to the plot in Figure 1, but if you plot the balance against the *delta to benchmark* as shown in Figure 2, you will probably get a reasonably clear pattern where the the balance increases as the delta to benchmark increases. For many institutions, the pricing decisions based upon Figure 2 can be worth hundreds of thousands or millions of dollars.

To calculate the marginal cost, we want to get the slope of the line that best fits the data in Figure 2. An eyeball calculation would be to take a point at each end and calculate the slope--(0%,$1.05e8) and (-0.2%,$9.75e8) for a slope of

(5e+05 - (-2500000))/(0 - (-0.2))

## [1] 1.5e+07

1.5e7 or $15 million per month balance increase per 1% increase relative to the benchmark. A more precise way to calculate the slope would be to do a linear regression

## ## Call: ## lm(formula = amt_balance_change.MM ~ delta_to_benchmark.MM, data = lbtDf2) ## ## Coefficients: ## (Intercept) delta_to_benchmark.MM ## 41253 11038442

## Analysis of Variance Table ## ## Response: amt_balance_change.MM ## Df Sum Sq Mean Sq F value Pr(>F) ## delta_to_benchmark.MM 1 3.82e+13 3.82e+13 152

which gives a slope of 11038442 or $11 million per month balance increase per 1% increase relative to the benchmark. To convert this to a marginal cost we have to calculate the cost of adding $11 million to an existing $99 million MM product by increasing the interest rate by 1% (0.083% per month) assuming a benchmark rate of 0.5% (0.0416% per month):

\[ \begin{aligned} \text{marginal cost}&=&\frac{0.00083*99e6+(0.000416+0.00083)*11e6}{11e6} \\ &=&0.008718 \text{ per month}\\ &=&10.5\% \text{ per year} \end{aligned} \]

Why is the marginal cost so high? In order to get customers to add $11 million, the bank must pay a premium (the 0.00083*99e6 in the formula) to existing depositors who really don't care about getting the additional interest. If product balances are high, it may make sense to open a new product and allow rate-sensitive depositors to switch into the new high-rate product until the marginal cost of the original product is reasonable.

The marginal cost of funds for savings accounts can be calculated similarly. For interest bearing checking products, the marginal cost can be calculated if there are no fees for going below a minimum balance. If there are minimum balance fees, the interest rate would need to be adjusted to include the fee as an implied negative interest rate. Checking deposits tend to be very seasonal, with declines in December through February for Christmas spending and a spike in April for tax refunds. There is also a lot of noise introduced by weekly, bi-weekly payroll cycles being superimposed over a monthly data extract cycle. These two characteristics make getting a good regression model difficult for checking account data.

Doing the modelling in the format described is quite straightforward, but it can't really be used for large changes in the balance, especially in a declining environment. $11 million leaving a $100 million balance is 11% of the balance--but would be about 5.5% of a $200 million balance. A future article will address how to improve the model to account for this difference.

### Calculating the Marginal Cost of Funds for Certificate of Deposit Products

Calculating the marginal cost of Certificate of Deposit (CD) products can be significantly more difficult than the similar calculation for money market, savings and checking accounts. The example that follows will illustrate some of these difficulties. First we need to address a terminology problem. For CDs, we will look at the amount maturing, and call that the product balance to make things work similarly to the way that they did for Money Market products. For these plots, we will assume that the amount maturing each month is the same--a completely invalid assumption in real life, but it makes the illustration much easier to follow. If you plot the balance against the nominal rate as in Figure 3, you will probably get a plot with no discernible pattern. If you plot the balance against the delta to benchmark as in Figure 4, you still probably won't get much of a pattern; this is very unlike the MM product.

Why is there no pattern? Many customers looking at CDs will compare prices with MM funds, with CDs of slightly shorter terms, and with CDs of slightly longer terms; if the other products are more competitive with the customer's perception of the yield curve, the customer will put money into the MM or the shorter/longer CD. They will *substitute* a more attractive CD if the originally desired CD isn't priced to their liking. To illustrate this in a data plot, do a contour plot of the CD Balance Change vs. the CD delta to benchmark AND the MM delta to benchmark. In this plot you will probably see a clear pattern as shown in Figure 5. CD products typically have a great deal of product substitution, so that the price of the product relative to its benchmark is important, but so is the price relative to other CD products and MM products. It is impossible to estimate this graphically; a multi-variate regression model is the only approach that will work for this situation.

To calculate the marginal cost of the CD product, you have to assume the prices of the substitute CD terms--but if there a large amount maturing in one substitute CD term, that would assume a high marginal cost for that CD term. Since the bank doesn't want to set prices so that multiple products have a high marginal cost, **both** prices have to be calculated simultaneously. For this reason, it doesn't make much sense to try to calculate the marginal cost for CD products. Setting rates is the ultimate goal, and the marginal cost is just input into that goal.

The best way to set CD prices is to model demand for each CD product including substitution effects to get a formula that represents the demand for each CD as a function of its price and the price of other CDs. Then all of the formulas can be solved simultaneously to obtain a price sheet that will reduce the cost of funds for the deposit portfolio. Although Excel has the statistical tools to do this, automating a large number of regression models in Excel would be difficult.

Fortunately, there are inexpensive or free statistical packages that can easily handle the task of developing the demand models. Solving the system is straightforward with commercial mathematical modeling languages like AMPL or GAMS.

### Notes

The data in this article was synthesized to illustrate common characteristics of money market and certificate of deposit demand data. Data manipulation and chart preparation was done using the R statistical language. Historical swap rates from the St. Louis Federal Reserve FRED2 system were downloaded and incorporated using the fImport package. The charts were prepared using ggplot2. The display of mathematical formulas in the HTML web page uses MathJax, which allows formulas to be written using LaTeX syntax.

The discussion in this article follows portions of the *Estimating Deposit Volume Price Sensitivity* presentation by the author at the 2005 Institute for Operations Research and Management Science (INFORMS) Annual Meeting. The presentation received an Honorable Mention in the Financial Services Section contest for Best Presentation. The abstract is available on the INFORMS web site.

- Details
- Written by Bruce Moore
- Hits: 9581

# Open Source Solver Resources

Solvers are the basis for all optimization modeling. There are four major types, according to the type of problem that the particular software can solve:

- Linear
- Linear/quadratic
- Linear integer, otherwise known as "mixed integer"
- Non-linear
- Non-linear integer

Most pricing problems in finance are either quadratic or nonlinear, so the non-linear solvers will be listed first.

### Non-linear

- Ipopt is an open source solver designed for constrained non-linear problems. It was originally written by Andreas Wachter while at IBM, and is currently released under the Eclipse Public License. It is probably the most robust of the available open source solvers.
- Excel has a built-in non-linear integer solver, but it is limited to 300 variables; since some of these are used for internal variables that are not present in the spreadsheet as cells, the problems that it can handle are really pretty small, but it can be useful if your problem fits into 300 variables.

### Linear, Integer and Quadratic

- Clp is a linear solver that was written by John Forrest while at IBM, and is currently released under the Eclipse Public License. It is has a reputation as the most robust of the available open source solvers, and is considered to be competitive with commercial solvers for all but very large problems. It is unusual among open source linear solvers in that it can handle quadratic objective functions.
- Cbc is a mixed integer solver that was written by John Forrest while at IBM, and is currently released under the Eclipse Public License. It is has a reputation as the most robust of the available open source solvers, and is considered to be competitive with commercial solvers for all but very large problems.
- GLPK is a mixed integer solver written by Andrew O. Makhorin (Андрей Олегович Махорин) while at the Moscow Aviation Institute. It is widely used due to the inclusion of the GNU Linear Programming Language that is essentially a subset of the commercially available AMPL language. The modeling language is much slower than AMPL and does not allow order sets (among many other things), but it can be useful when you don't have an AMPL license available. It is significantly slower than commercial solvers, especially on integer programs.
- lp_solve is a linear solver that was probably the first widely used open source solver and which has a reputation as being useful for problems larger than GLPK can handle, but smaller than Clp can handle.

- Details
- Written by Bruce Moore
- Hits: 4845

# Auto Loan Severity of Loss and the CPI for Used Cars and Trucks

Institutions typically use their own experience to estimate recoveries and thus the severity of loss for auto loan defaults. The Consumer Price Index for Used Cars and Trucks (CPI-UCT) can be helpful in determining whether changes in recoveries and severity of loss are due to declining credit quality or changes in the used car and truck market. When used car prices fall and thus the CPI-UCT falls, auction prices are lower and the recoveries on repossessed vehicles are lower, leading to a higher severity of loss. When the CPI rises, auction prices are higher, recoveries are higher and the severity of loss is lower. The discussion that follows will illustrate how changes in the CPI-UCT and the averaging period used for pricing-related loss rate forecasting can help in understanding changes in severity of loss for auto loan portfolios.

The CPI-UCT has stabilized since mid-2011 as shown in Figure 1. This will change the severity of loss behavior of most auto loan portfolios, causing losses to be higher than expected for institutions that price loans using an estimate for the severity of loss using a rolling average with a period of more than two years.

Figure 1 shows the dramatic drop in the CPI-UCT during the early part of the recession in 2008-2009, followed by a dramatic rebound due to the reduced new-car manufacturing capacity that resulted from bankruptcies and restructuring in the auto industry. The rebound largely ended in the summer of 2011. The CPI-UCT is currently dominated by seasonal used-auto purchase demand; it is high in the Spring and low in the Winter.

For institutions that use a rolling average to estimate future severity of loss for new and used auto loans, the changes in the used car market reflected in the CPI-UCT will affect the actual severity of loss compared to the severity of loss estimate used to price loans at the time of origination. Institutions that use a rolling average of more than 5 years have experienced lower than expected severity of loss, and thus higher than expected net interest income. This better than expected performance is about to end for these institutions, not because loans are performing better or worse, but because the 2008-2009 crash in the used car market is about to roll out of the data set used to calculate the rolling average that was used in pricing the loans.

Figure 2 shows the 5-year rolling average for CPI-UCT and has a clearly rising value that will cause losses to be lower then originally estimated. For example, loans that originated in January of 2012 (rolling average of 138 used for pricing) and defaulted in January of 2013 (actual value of 147 at time of default) saw a 6% higher recovery than expected. The better than expected performance will decline as the 2008-2009 crash numbers continue to drop out of a 5-year rolling average in the coming months.

This affect would be greatest for loans originated in October, November or December of 2009 when the 5-year rolling average hit a minimum of slightly more than 135. Loans originated at the bottom would experience an approximately 8.5% better than expected recovery in the event of default.

Institutions that use shorter rolling averages have already begun to see the 2008-2009 crash values and the subsequent rebound begin to drop out. For example, using a 3-year rolling average for pricing purpose as shown in Figure 3, the same January 2012 loan would have been priced with a estimated recovery value of 140, with an actual recovery value of 147 at time of default for a 5% higher than expected recovery.

The three-year rolling average is more volatile than the 5-year; the minimum would have been 132, for an approximately 11.4% better than expected recovery in January, 2013. This benefit will rapidly go away as the values from the crash continue to drop out of the rolling average used for pricing.

Using the 2-year rolling average shown in Figure 4, the recovery estimate used for pricing would be about 146, for a less than 1% better than expected recovery in January of 2013; institutions using this shorter rolling average have already seen the affects of the crash period dropping out of the average.

Although a 6% better than expected recovery may not sound like much, it can translate into a disproportionately large change in the severity of loss. 6% on a $10,000 vehicle would be $600. If the outstanding loan is $12,000, when the 6% better than expected recovery goes away, the severity of loss goes from $1,400 ($12,000 - $10,600), to $2,000--a change of 43%.

The CPI-UCT is helpful in determining whether changes in recoveries are due to credit problems, or used car market problems. The crash in used car and truck prices that occurred in 2008-2009 was a very unusual event, and will cause some unusual and perhaps negative changes in the severity of loss in car loan portfolios.

Data are from the CUSR0000SETA02 series available from the St. Louis Federal Reserve FRED2 system.

- Details
- Written by Bruce Moore
- Hits: 5112

# R - Open Source Statistical Software

While working on my praxis during my Doctor of Engineering program, I tried to do a multiple linear regression in Excel--and discovered Excel's limitations. Since I didn't have budget money for SAS and couldn't move the data to the university's computers for confidentiality reasons. I started frantically surfing the web: "open source" "free" "statistics" "linear" "regression".

I downloaded several programs, most of which had even more problems than Excel. I finally downloaded a program called "R" and was able to install it, load my data, and run my regression in less than an hour. Although the language was very different from all of the languages that I knew (or for that matter know), it worked reliably, and did what I needed.

More than 12 years later, I am still an R user and have attended the UseR! conferences in Ames (2007), Dortmund (2008), Rennes (2009), Coventry (2011) and Albacete (2013). The links that follow provide a few resources for this great statistical package. R is now used in a number of Fortune 500 firms, including Google (which does its own internal distribution) and a number of pharmaceutical companies.

- R project web site Provides links to mirrors for downloading.
- Revolution Analytics is a commercial vendor that distributes a supported version of R and it's related packages. The Revolution Analytics interpreter has some proprietary features that are of use for parallel computing and large datasets.
- RStudio offers a wonderful free integrated development environment for R, and various educational offerings. Hadley Wickham is one of the most prolific and respected package writers.
- RMetrics is a project to develop R packages for the financial engineering. This user group has its own conference, and most of the finance crowd goes to this conference rather than the general UseR! conference.
- RGeo is a project to develop geospatial analysis packages for R.
- BioConductor is a very large project to develop software for genomic and pharmalogical research. Pharmaceutical firms were among the first to adopt R.

### Important Blogs for R

### R User Groups

- Dallas Ft. Worth R User Group
- Houston R User Group
- Austin R User Group
- San Antonio R User Group
- Atlanta R User Group
- Burlington R User Group
- List of world-wide R user groups

- Details
- Written by Bruce Moore
- Hits: 6099