\documentclass[12pt]{article}
\usepackage{abstract} % Allows abstract customization
\usepackage{microtype}
\usepackage{fancyhdr} % Headers and footers
%% \pagestyle{fancy} % All pages have headers and footers
\usepackage{graphicx}
\usepackage{natbib}
\usepackage{parskip}
\usepackage{booktabs}
\usepackage{float}
\usepackage{amssymb,amsmath}

\renewcommand{\abstractnamefont}{\normalfont\bfseries} % Set the "Abstract" text to bold
\renewcommand{\abstracttextfont}{\normalfont\small\itshape} % Set the abstract itself to small italic text

%% \title{Value Factors Do Not Forecast Returns for the S\&P 500 Stocks}
\title{\vspace{-15mm}\fontsize{16pt}{16pt}\selectfont\textbf{Value Factors Do Not Forecast Returns for S\&P 500 Stocks}}
%% \author{Ian L. Kaplan}
\author{
\large
\textsc{Ian L. Kaplan}
\thanks{This paper grew out of my Masters presentation in the Computational Finance and Risk Management program at the University of Washington. I would like to thank Professor Doug Martin and Professor Guy Yollen for their comments and suggestions. I am also grateful to the University of Washington for providing access to the Wharton Research Data Service and the CRSP/Compustat data. I would like to thank Brian Ertley who has been kind enough to discuss this work as it progressed. Ernie Chan read an early draft of the paper and suggested a number of changes which improved the paper. My wife, Kate Butler, read various drafts of the paper, caught several errors and helped to improve the final result.}
\\[2mm]
%% \normalsize University of Washington \\ % Your institution
\normalsize{iank@bearcave.com} \\[1mm]% Your email address
\normalsize{http://www.bearcave.com}
\vspace{-5mm}
}
\date{March 10, 2014}
\begin{document}

\maketitle

\graphicspath{ {../diagrams/} }
\bibliographystyle{plain}
\raggedright

\begin{abstract}
This paper investigates how effectively value factors can forecast future returns for stocks in the S\&P 500. 

Ranked portfolios and linear models are constructed from a set of quarterly value factor from 1998 to 2013. Portfolios are drawn from the quarterly S\&P 500 stock universe to avoid survivor bias.

Over this time period, with a set of over 400 or more stocks per quarter, the returns from ranked portfolios or forecast by linear models produce at best weak performance compared to the S\&P 500 index returns.

These results suggest that for this fifteen year time period, for the large capitalization S\&P 500 stocks, the value factors examined here are not useful for constructing portfolios.
\end{abstract}

\newpage

\section*{Introduction}

Investors will choose to buy stocks that they believe will increase in value. In doing this, the investor is making a forecast for the future return of the stock.

Value investing is based on the idea that a set of one or more value factors can be used to forecast future return. The literature on value investing is so large that it could easily consume a book length bibliography. In 1951 Benjamin Graham and David Dodd published their classic \emph{Securities Analysis}, which described a value driven approach to investing. Benjamin Graham's most famous student, Warren Buffet, used value based portfolio analysis in his early days as a stock portfolio manager\cite{Lowenstein1995}.

The actual process followed by a value investor may include human judgement about the future prospects of a company
combined with a quantitative estimation of corporate value factors\cite{Scott1997}. This paper focuses on 
computationally driven quantitative analysis of portfolios using value factors. These factors, like book value to price (B2P), are calculated from the information in quarterly annual reports.

<<echo=FALSE, message=FALSE>>=
#
# WARNING: Running this Knitr document will take hours, the first time it is run. The code will save RData objects
# so that subsequent runs are faster. Knitr's cache funtion is also used.
#
library(tseries)
library(timeDate)
library(timeSeries)
library(xts)
library(zoo)
library(leaps)
library(bit)
library(robustbase)
library(robust)
library(PerformanceAnalytics)
library(TTR)
library(corrplot)
library(Matrix)
library(xtable)

dataDir = "../data"
r_dataDir = "../r_data"
# The synth_factors.csv values are generated by:
# 1. run fix_compustat_data.r on the CRSP/Compustat data. For information on the this data
#    (assuming that you have access to the Warton Research Data Service) see
#    http://www.bearcave.com/finance/wrds_data.html
# 2. run factor_calc.r to generate the synth_factors.csv file.
#
sp500Mon = "sp500ByMonths_all.csv"
sp500 = "sp500ByQtr_all.csv"
factorFile = "synth_factors.csv"
monthlyFactorFile = "monthly_synth_factors.csv"
compustat = "sp500_compustat_fixed.csv"
closeFile = "sp500_close_qtr.csv"
closeMonthFile = "sp500_close_mon.csv"
interestOut = "quarterly_interest_rates.csv"
factorPath = paste(dataDir, factorFile, sep="/")
monthFactorPath = paste(dataDir, monthlyFactorFile, sep="/")
sp500Path = paste(dataDir, sp500, sep="/")
sp500MonPath = paste(dataDir, sp500Mon, sep="/")
compustatPath = paste(dataDir, compustat, sep="/")
closePath = paste(dataDir, closeFile, sep="/")
closeMonthPath = paste(dataDir, closeMonthFile, sep="/")
interestOutPath = paste(dataDir, interestOut, sep="/")
@

<<Functions, echo=FALSE>>=

# Standardize a vector. The library scale function is not used because some stocks have factors were all
# more most of the values are zero, so the standard deviation is close to zero. This results in an NA.
#
myScale = function(v)
{
  rslt = (v - mean(v))
  s = sd(v)
  if (! identical(s, 0)) {
    rslt = rslt / s
  }
  return(rslt)
}


scaleFactors = function(factors.df, stocks, factorCols)
{
  scaleFactors.df = factors.df
  for (i in 1:length(stocks)) {
    s = stocks[i]
    ix = which(scaleFactors.df[,"sym"] == s)
    block = scaleFactors.df[ix, factorCols]
    blockScale = apply(block, 2, FUN=myScale)
    scaleFactors.df[ix, factorCols] = blockScale
  }
  return(scaleFactors.df)
} # scaleFactors

#
#  * Perform a linear regression of the factors and the n-quarter head return
#  * Perform a robust regression.
#  * Predict the future values using the linear and robust models
#
# Sometimes the robust regression seems to produce a much better fit (e.g., coefficients
# with a lower p-value (e.g., pr(>|t|))), so the robust regression is thrown in too.
#
# Return:
# A data frame row consisting of:
#    * stock symbol
#    * 1-quarter ahead predicted return
#    * R^2 for OLS
#    * the robust prediction
#
alphaCalc = function(factors.df, ret )
{
  rslt.df = NULL
  len = nrow(factors.df)
  win = len - 1
  # remove columns that are largely made up of zeros to avoid screwing up the linear regression
  fracZero = apply(factors.df, 2, FUN=function(v) { sum(v == 0)}) / nrow(factors.df)
  goodCols = fracZero <= 0.25
  if (sum(goodCols) >= 2) {
    factorsAdj = factors.df[,goodCols]
    retBlock = ret[-length(ret)]
    facBlock = factorsAdj[1:win,]
    l = lm(retBlock ~ ., data=facBlock)
    s = summary(l)
    R2 = s$r.squared
    lr = lmrob(retBlock ~ ., data=facBlock)
    # use the linear model to predict the next quarter return
    predVal = predict(l, type="response", newdata = factorsAdj[len,])
    rPredVal = predict(lr, type="response", newdata = factorsAdj[len,])
    lastRet = ret[length(ret)]
    rslt.df = data.frame(predict=predVal, actual=lastRet, R2 = R2, 
                         robPredict = rPredVal)
  }
  return(rslt.df)
} # alphaCalc


buildFormula = function( names )
{
  xExpr = names[1]
  n = length(names)
  if (n >= 2) {
    for (i in 2:n) {
      xExpr = paste(xExpr, "+", names[i])
    }
  }
  return( xExpr )
} # buildFormula


#
# For each linear model, use leaps() model selection to find the optimal linear model.
# Return the predicted value from that model, the actual value, the R-squared and a bit 
# vector which indicates which of the factors was used in the model.
#
linearModelSelect = function(factors.df, ret )
{
  rslt.df = NULL
  len = nrow(factors.df)
  win = len - 1
  # remove columns that are largely made up of zeros to avoid screwing up the linear regression
  fracZero = apply(factors.df, 2, FUN=function(v) { sum(v == 0)}) / nrow(factors.df)
  goodCols = fracZero <= 0.25
  if (sum(goodCols) >= 2) {
    factorsAdj = factors.df[,goodCols]
    retBlock = ret[-length(ret)]
    facBlock = factorsAdj[1:win,]
    jump = leaps(x = facBlock, y = retBlock, method="Cp", nbest=1, names=colnames(factorsAdj))
    ix = which.min(jump$Cp)
    minRow = jump$which[ix,]
    b = as.bit(minRow)
    length(b) = 8
    bits = packBits(as.logical(b))[1]
    names = colnames(factorsAdj)[minRow]
    exprStr = paste("retBlock ~", buildFormula(names))
    exprForm = as.formula(exprStr)
    l = lm(exprForm, data=facBlock)
    s = summary(l)
    R2 = s$r.squared
    # use the linear model to predict the next quarter return
    factors = unlist(c(1, factorsAdj[len, minRow]))
    beta = l$coef
    predVal = as.numeric(factors %*% beta)
    lastRet = ret[length(ret)]
    rslt.df = data.frame(predict=predVal, actual=lastRet, R2 = R2, bitVec = bits)
  }
  return(rslt.df)
}

#
# Calculate a linear model, using leaps() which gives the optimal linear model. Leaps only works
# on full rank matrices.  Columns with a high percentage of zeros are removed.  After this, the
# rank of the matrix is checked. The rank should be the same as the number of columns.
#
blockAlphaLeaps = function(factors.df, close.df, stocksQtr, endQtr, startQtr, currentQtr, stepAhead)
{
  alphaStock.df = data.frame()
  metaCols = which(colnames(factors.df) %in% c("date", "sym", "ggroup", "gsector"))
  for (sym in stocksQtr) {
    factorsStock.df = factors.df[factors.df[,"sym"] == sym,]
    endIx = which(as.vector(factorsStock.df[,"date"]) == endQtr)
    startIx = which(as.vector(factorsStock.df[,"date"]) == currentQtr)
    factorBlock.df = factorsStock.df[endIx:startIx,]
    factorData.df = factorBlock.df[,-metaCols]
    rownames(factorData.df) = factorsStock.df[endIx:startIx,"date"]
    closeStock = as.zoo(as.vector(close.df[close.df[,"sym"] == sym,"close"]))
    closeDates = as.Date(as.vector(close.df[close.df[,"sym"] == sym,"date"]))
    index(closeStock) = closeDates
    returns = Return.calculate(closeStock, method="log")[-1]
    endReturnIx = which(closeDates == as.Date(endQtr)) + (stepAhead-1)
    startReturnIx = which(closeDates == as.Date(currentQtr)) + (stepAhead-1)
    returnBlock = returns[endReturnIx:startReturnIx]
    fracZero = apply(factorData.df, 2, FUN=function(v) { sum(v == 0)}) / nrow(factorData.df)
    goodCols = fracZero <= 0.25
    if (sum(goodCols) > 1) {
      factorDataAdj.df = factorData.df[,goodCols]
      m = as.matrix(factorDataAdj.df)
      rank = round(as.numeric(rankMatrix(m)), 6)
      if (rank == ncol(m)) {
        alpha = linearModelSelect(factorDataAdj.df, returnBlock)
        periodRet = returnBlock[-length(returnBlock)]
        retSd = sd(periodRet) # calculate the sd of the returns without the predictor check
        len = nrow(factorDataAdj.df)
        blockDate = rownames(factorDataAdj.df)[len]
        ggroup = factorBlock.df[len, "ggroup"]
        gsector = factorBlock.df[len, "gsector"]
        df = data.frame(blockDate, sym, alpha$predict, alpha$actual, alpha$bitVec, alpha$R2, retSd, 
                        ggroup, gsector)
        colnames(df) = c("date", "sym", "predict", "actual", "model", "R2", "retSigma", 
                         "ggroup", "gsector")
        alphaStock.df = rbind(alphaStock.df, df)
      }
      else {
        msg = paste("%% The factor matrix for", sym, "is not full rank\n")
        cat(msg)
      }
    }
  }
  return(alphaStock.df)
}


startEndDates = function(factors.df, sp500.df, close.df, stepAhead)
{
   # The factors and the S&P 500 should have overlapping dates.
  factorDates = as.Date(as.vector(factors.df[,"date"]))
  sp500Dates = as.Date(as.vector(sp500.df[,"qtr"]))
  closeDates = close.df[,"date"]
  factorDateMin = min(factorDates)
  sp500DateMin = min(sp500Dates)
  factorDateMax = max(factorDates)
  sp500DateMax = max(sp500Dates)
  closeDates = sort(unique(as.vector(close.df[,"date"])))
  # there must be enough close dates in future quarters to check the back test prediction
  closeDateMin = as.Date(closeDates[1])
  closeDateMax = as.Date(closeDates[length(closeDates) - stepAhead])
  startDate = max(factorDateMin, sp500DateMin, closeDateMin)
  endDate   = min(factorDateMax, sp500DateMax, closeDateMax)
  return(list(startDate = startDate, endDate = endDate))
}

#
# Make sure that there are winSize of factor data for the S&P 500 stocks
#
indexUnivCalc = function(index.df, factor.df, close.df, startDate, endDate, winSize, stepAhead, dateIndex)
{
  univList = list()
  listIx = 1
  uniqueDates = sort(as.Date(unique(as.vector(index.df[,dateIndex]))))
#   datesYM = format(uniqueDates, format="%Y-%m")
#   startDateYM = format(as.Date(startDate), format="%Y-%m")
#   endDateYM = format(as.Date(startDate), format="%Y-%m")
  endIx = which(uniqueDates == startDate)
  lastIx = which(uniqueDates == endDate)
  indexIx = seq(from=(endIx+winSize), to=(lastIx-stepAhead))
  for (j in indexIx) {
    curDate = uniqueDates[j]
    indexStocks = as.vector(index.df[index.df[,dateIndex] == as.character(curDate), "co_tic"])
    factorIx = seq(from=endIx, to=(j+stepAhead))
    for (i in factorIx) {
      facQtr = uniqueDates[i]
      if (i <= j) {
        factorStocks = as.vector(factor.df[factor.df[,"date"] == as.character(facQtr), "sym"])
        if (length(factorStocks) > 0) {
          ix = which(indexStocks %in% factorStocks)
          indexStocks = indexStocks[ix]
        }
      }
      closeStocks = as.vector(close.df[close.df[,"date"]== as.character(facQtr), "sym"])
      if (length(closeStocks) > 0) {
        closeIx = which(indexStocks %in% closeStocks)
        indexStocks = indexStocks[closeIx]
      }
    }
    univList[[listIx]] = list(date = curDate, stocks = indexStocks)
    endIx = endIx + 1
    listIx = listIx + 1
  }
  return(univList)
}

#
# use a linear model to predict the t+1 value based on the linear 
# regression mean (e.g., the line through the returns)
#
predictLM = function(returns)
{
  len = length(returns)
  xvals = seq(from=1, to=len)
  l = lm(returns ~ xvals)
  predPt = l$coef[1] + l$coef[2] * (len + 1)
  return(predPt)
}

# 
# For all of the stocks in the current quarterly stock universe, calculate forecasted values on the basis
# of a multiple linear regression.
#
# factors.df (the predicton factors):
#
#       date   sym     CFO2EV         RONA  EBITDA2EV      E2PFY0       BB2P      BB2EV       B2P ggroup gsector
# 1998-03-31 0033A -0.9004220 -0.369872599 -0.2010031 -0.06315872 -0.3035219 -0.1664513 1.2050654   5510      55
# 1998-06-30 0033A -0.8731191 -0.502276001 -0.3602984 -0.34823329 -0.3035219 -0.1664513 1.0076856   5510      55
# 1998-09-30 0033A -0.7510997  0.001508416  1.0049890  0.37221205 -0.3035219 -0.1664513 0.8624639   5510      55
# 1998-12-31 0033A -0.8071428 -0.189229919  0.4266131  0.16651864 -0.3035219 -0.1664513 0.6945868   5510      55
# 1999-03-31 0033A -0.7730149 -0.285340844 -0.1457944  0.04227895 -0.3035219 -0.1664513 0.6916150   5510      55
# 1999-06-30 0033A -0.7711558 -0.423606844 -0.2253379 -0.28301650 -0.3035219 -0.1664513 0.6552650   5510      55
#
# close.df (close prices)
#
#        date  sym   close
#  1998-03-31 ADCT 19.0630
#  1998-06-30 ADCT 29.9375
#  1998-09-30 ADCT 33.5000
#  1998-12-31 ADCT 23.0000
#  1999-03-31 ADCT 39.8125
#  1999-06-30 ADCT 47.8125
#
# stocksQtr
#
# "ABT"  "HON"  "AA"   "BEAM" "AEP"  "BHI" 
#
# endDate : the t-n quarter in the back-test block (example: 1998-03-31)
#
# startDate : the t-1 quarter in the current back-test block (example: 2002-12-31)
#
# currentDate : the quarter at time t (example: 2003-03-31)
#
# lastDate : the t+1 quarter (example: 2003-06-30)
#
# Return:
#    date sym predRet actRet pval stdDevRet ggroup gsector
blockAlphaModel = function(factors.df, close.df, stocksDate, endDate, startDate, currentDate, DateAhead)
{
  alphaStock.df = data.frame()
  metaCols = which(colnames(factors.df) %in% c("cusip", "close", "date", "sym", "ggroup", "gsector"))
  for (sym in stocksDate) {
    factorsStock.df = factors.df[factors.df[,"sym"] == sym,]
    endIx = which(as.vector(factorsStock.df[,"date"]) == endDate)
    startIx = which(as.vector(factorsStock.df[,"date"]) == currentDate)
    if ((length(endIx) == 1) && (length(startIx) == 1) && (endIx != 0) && (startIx != 0)) {
      factorBlock.df = factorsStock.df[endIx:startIx,]
      factorData.df = factorBlock.df[,-metaCols]
      rownames(factorData.df) = as.Date(as.vector(factorsStock.df[endIx:startIx,"date"]))
      closeStock.z = as.zoo(as.vector(close.df[close.df[,"sym"] == sym,"close"]))
      closeDates = as.Date(as.vector(close.df[close.df[,"sym"] == sym,"date"]))
      index(closeStock.z) = closeDates
      returns.z = diff(log(closeStock.z))
      endReturnIx = which(closeDates == as.Date(endDate)) + (DateAhead-1)
      startReturnIx = which(closeDates == as.Date(currentDate)) + (DateAhead-1)
      returnBlock.z = returns.z[endReturnIx:startReturnIx]
      alpha = alphaCalc(factorData.df, returnBlock.z)
      periodRet.z = returnBlock.z[-length(returnBlock.z)]
      len = length(periodRet.z)
      # momentum really only works for monthly data.
      momentum = sum(periodRet.z[(len-11):len])
      retSd = sd(periodRet.z) # calculate the sd of the returns.z without the predictor check
      retMu = mean(periodRet.z)
      retEMA = EMA(x = periodRet.z, n = 4)[length(periodRet.z)]
      # use a linear model to calculate the predicted value at t+1
      retLinMu = predictLM(periodRet.z)
      len = nrow(factorData.df)
      blockDate = rownames(factorData.df)[len]
      ggroup = factorBlock.df[len, "ggroup"]
      gsector = factorBlock.df[len, "gsector"]
      df = data.frame(blockDate, sym, alpha$predict, alpha$actual, alpha$R2, alpha$robPredict, 
                      retMu, retEMA, retLinMu, momentum, retSd, 
                      ggroup, gsector)
      colnames(df) = c("date", "sym", "predict", "actual", "R2", "robPredict", 
                       "retMu", "retEMA", "retLinMu", "momentum", "retSigma", 
                       "ggroup", "gsector")
      alphaStock.df = rbind(alphaStock.df, df)
    }
  }
  return(alphaStock.df)
}

#
# Calculate the factor momentum for quarterly factors, over the 5-year retrospective period.
#
factorMomentumModel = function(factors.df, close.df, stocksDate, endDate, startDate, currentDate, DateAhead)
{
  simpleReturn = function(prices)
  {
    len = length(prices)
#     R = (prices[2:len] - prices[1:(len-1)])/prices[1:(len-1)]
    R = ifelse(prices[1:(len-1)] == 0, 0, (prices[2:len] - prices[1:(len-1)])/prices[1:(len-1)])
    return(R)
  } # simpleReturn
  
  factorMo.df = data.frame()
  metaCols = which(colnames(factors.df) %in% c("cusip", "close", "date", "sym", "ggroup", "gsector"))
  for (sym in stocksDate) {
    factorsStock.df = factors.df[factors.df[,"sym"] == sym,]
#     endIx = which(as.vector(factorsStock.df[,"date"]) == endDate)
    startIx = which(as.vector(factorsStock.df[,"date"]) == currentDate)
    endIx = startIx - 4
    if ((length(endIx) == 1) && (length(startIx) == 1) && (endIx != 0) && (startIx != 0)) {
      factorBlock.df = factorsStock.df[endIx:startIx,]
      factorData.df = factorBlock.df[,-metaCols]
      rownames(factorData.df) = as.Date(as.vector(factorsStock.df[endIx:startIx,"date"]))
      
      closeStock.z = as.zoo(as.vector(close.df[close.df[,"sym"] == sym,"close"]))
      closeDates = as.Date(as.vector(close.df[close.df[,"sym"] == sym,"date"]))
      index(closeStock.z) = closeDates
      returns.z = diff(log(closeStock.z))
      endReturnIx = which(closeDates == as.Date(endDate)) + (DateAhead-1)
      startReturnIx = which(closeDates == as.Date(currentDate)) + (DateAhead-1)
      returnBlock.z = returns.z[endReturnIx:startReturnIx]  
      len = length(returnBlock.z)
      threeQAheadRet.z = returnBlock.z[len]
      threeQAheadRet = as.numeric(threeQAheadRet.z)
      # calculate the value factor "returns" (e.g., differences)
      facReturn.m = apply(factorData.df, 2, FUN=simpleReturn)
      # calculate the factor momentum
      factorMo = matrix(apply(facReturn.m, 2, sum), nrow=1)
      colNames = colnames(factorData.df)
      
      len = nrow(factorData.df)
      blockDate = rownames(factorData.df)[len]
      df = data.frame(blockDate, sym, factorMo, threeQAheadRet)
      colnames(df) = c("date", "sym", colNames, "actual")
      factorMo.df = rbind(factorMo.df, df)
    }
  }
  return(factorMo.df)
}


#
# For a given factor (facCol), calculate a linear regression through the past five years of quarterly factor
# values. The function returns the last value on the regression line, for all symbols in the current
# quarterly stock universe.
#
linearFactorMean = function(factors.df, facCol, close.df, stocksDate, endDate, startDate, currentDate, DateAhead)
{
  linearMean = function(v) {
    linMu = 0
    end = length(v)
    numZeros = sum(v == 0)
    fracZero = numZeros / end
    if (fracZero <= 0.25) {
      x = seq(from=1, to=end)
      l = lm(v ~ x)
      linMu = l$fitted.values[ end ]
    } else {
      if (v[end] != 0) {
        linMu = v[end]
      }
    }
    return(linMu)
  } # function linearMean
  
  first = TRUE
  metaCols = which(colnames(factors.df) %in% c("cusip", "close", "date", "sym", "ggroup", "gsector"))
  for (sym in stocksDate) {
    factorsStock.df = factors.df[factors.df[,"sym"] == sym,]
    facDates = as.Date(as.vector(factorsStock.df[,"date"]))
    endIx = which(facDates == endDate)
    startIx = which(facDates == currentDate)

    if ((length(endIx) == 1) && (length(startIx) == 1) && (endIx != 0) && (startIx != 0)) {
      endIx = endIx + 1
      facVec = as.vector(factorsStock.df[endIx:startIx, facCol])
      facVec.m = matrix(facVec, nrow=length(facVec), ncol=1)
      colnames(facVec.m) = c(sym)
      if (first) {
        factorData.m = facVec.m
        first = FALSE
      } else {
        factorData.m = cbind(factorData.m, facVec.m)
      }
    }
  }
  facMu = apply(factorData.m, 2, FUN=linearMean)
  return(facMu)
}


#
# Build a top and bottom decile portfolios from stocks that are chosen by ranking against the
# value factors.
#
# An ordinary least squares regression is calculated for five years of quarterly factor values. The
# last value on the regression line is used as the factor value in ranking.
#
# The function returns the top and bottom decile portfolio returns, for equally weighted portfolios, for
# each quarter.
#
blockPortfolioModel = function(factors.df, close.df, stocksDate, endDate, startDate, currentDate, DateAhead)
{
  names = colnames(factors.df)
  metaIx = which(names %in% c("cusip", "close", "date", "sym", "ggroup", "gsector"))
  factorNames = names[-metaIx]
  # Calculate the quarter to quarter return
  closeSyms = as.vector(close.df[,"sym"])
  symIx = which(closeSyms %in% stocksDate )
  closeFilt.df = close.df[symIx,]
  
  closeDates = as.Date(as.vector(closeFilt.df[,"date"]))
  buyIx = which(closeDates == currentDate) + (DateAhead - 1)
  sellIx = buyIx + 1
  buy.df = closeFilt.df[buyIx,c("date", "sym", "close")]
  sell.df = closeFilt.df[sellIx,c("date", "sym", "close")]
  rtrn = log(as.vector(sell.df[,"close"])) - log(as.vector(buy.df[,"close"]))
  return.df = data.frame(sell.df[,c("date", "sym")], rtrn)
  colnames(return.df) = c("date", "sym", "return")
  
  facVals = rep(0, (length(factorNames) * 2))
  facVals = matrix(facVals, nrow=1, ncol=length(facVals))
  names(facVals)[seq(from=1, to=(length(factorNames) * 2), by=2)] = paste("top", factorNames, sep="")
  names(facVals)[seq(from=2, to=(length(factorNames) * 2), by=2)] = paste("bot", factorNames, sep="")
  facValsNames = names(facVals)
  ix = 1
  for (factor in factorNames) {
    # Use regression to calculate a line through the factors from endDate to startDate. The last value on 
    # the line is used as the factor value
    factorMu = linearFactorMean(factors.df, factor, close.df, stocksDate, endDate, startDate, currentDate, DateAhead)
    numDecile = round((length(factorMu) * 0.1),-1)
    w = rep(1/numDecile, numDecile)
    stocks = names(factorMu)
    ordIx = order(factorMu, decreasing=TRUE)
    topDecile = ordIx[1:numDecile]
    end = length(factorMu)
    botDecile = ordIx[(end-(numDecile-1)):end]
    topSyms = stocks[topDecile]
    botSyms = stocks[botDecile]
    topRet = return.df[which(return.df[,"sym"] %in% topSyms),"return"]
    botRet = return.df[which(return.df[,"sym"] %in% botSyms),"return"]
    topPortRet = t(w) %*% topRet
    botPortRet = t(w) %*% botRet
    facVals[1,ix] = topPortRet
    ix = ix + 1
    facVals[1,ix] = botPortRet
    ix = ix + 1
  }
  date = as.Date(unique(as.vector(return.df[,"date"])))
  df = data.frame(facVals)
  colnames(df) = facValsNames
  valuePort.df = data.frame(date, df)
  return(valuePort.df)
} # blockPortfolioModel
  
  
#
# stockUniv : the universe of stocks for each quarter, taking into account the window size
# factors.df : factor values
# close.df : the quarterly close prices for the stocks in the S&P 500
# winSize : the number of quarters to use in the back test
# stepAhead : the number of quarters head to predict for the return
# linearModel : the function to apply
#
modelReturn = function(stockUniv, factors.df, close.df, winSize, stepAhead, linearModel )
{
  retModel.df = data.frame()
  # we need winSize in the past, plus the current quarter, plus the future quarter. Build a date
  # vector so that the proper dates can be obtained.
  qtrs = sort(as.Date(unique(as.vector(factors.df[,"date"]))))
  for (q in 1:length(stockUniv)) {
    univ = stockUniv[[q]]
    currentQtr = univ$date  # the quarter at time t+1
    stocksQtr = univ$stocks # stock universe
    currentIx = which(qtrs == currentQtr)  
    startQtr = qtrs[currentIx-1]
    endIx = currentIx - winSize
    endQtr = qtrs[endIx]
    msg = paste("%% ", paste(endQtr, startQtr, sep=":"), "\n", sep="")
    cat(msg)

    quarterInfo.df = linearModel(factors.df, close.df, stocksQtr, endQtr, startQtr, currentQtr, stepAhead)
    retModel.df = rbind(retModel.df, quarterInfo.df)
  }
  return(retModel.df)
} # modelReturn

#
# Build momentum portfolios.
#
momentumPortfolios = function(monthStockUniverse, monthClose.df )
{
  # all of the monthly dates associated with the close prices for each stock
  monthDates = as.Date(as.vector(monthClose.df[,"date"]))
  # the unique months
  uniqueMonths = sort(as.Date(unique(monthDates)))
  # The blocks of symbols from the close price regions
  monthSyms = as.vector(monthClose.df[,"sym"])
  # The unique symbols
  closeStocks = unique(monthSyms)
  # blocks of close prices
  monthClose = as.vector(monthClose.df[,"close"])
  # start on a quarter boundary
  listDates = as.Date(sapply(monthStockUniverse, FUN=function(e) {e$date}))
  listMonths = format(listDates, format="%m")
  quarterMonths = c("03", "06", "09", "12")
  quarterBounds = which(listMonths %in% quarterMonths)
  startMonth = quarterBounds[1]
  rslt.df = data.frame()
  monthSeq = seq(from=startMonth, to=length(monthStockUniverse), by=3)
  for (m in monthSeq) {
    univ = monthStockUniverse[[m]]
    currentMonth = univ$date  # the quarter at time t+1
    stocksMonth = univ$stocks  
    currentIx = which(uniqueMonths == currentMonth)
    # the time period for the close prices is 16 months. When the returns are calculated this will
    # be 15 months of return. 12 for the momentum period. Then buy at month 12 and sell at month
    # 16
    endIx = currentIx - 13
    startIx = currentIx + 2
    dateRange = uniqueMonths[endIx:startIx]
    # which stocks in the S&P 500 universe do we have close prices for
    univStocksIx = which(closeStocks %in% stocksMonth)
    univStock = closeStocks[univStocksIx]
    # For the blocks of close prices mark the ones that are in the stock universe.
    blockBySym.df = monthClose.df[which(monthSyms %in% univStock),]
    blockByDate.df = blockBySym.df[which(as.Date(as.vector(blockBySym.df[,"date"])) %in% dateRange),]
    # Make sure that all symbols have at least length(dateRange) values
    syms = as.vector(blockByDate.df[,"sym"])
    runLen = rle(syms)
    okSyms = runLen$values[runLen$lengths >= length(dateRange)]
    blockBySyms2.df = blockByDate.df[syms %in% okSyms,]
    step = length(dateRange)
    portfolioVals.df = data.frame()
    startRange = seq(from=1, to=nrow(blockBySyms2.df), by=step)
    for (start in startRange) {
      sym = as.vector(blockBySyms2.df[start, "sym"])
      close = as.vector(blockBySyms2.df[start:(start + (step-1)),"close"])
      # 1 year = 12 months of returns
      ret = diff(log(close[1:13])) 
      # momentum is the sum of 11 months of returns, skipping the last return
      momentum = sum(ret[1:11])
      # quarterly return: buy stock on month 13 on the basis of past momentum
      # sell on month 16
      qtrRet = log(close[16]) - log(close[13])
      stock.df = data.frame(sym, momentum, qtrRet)
      portfolioVals.df = rbind(portfolioVals.df, stock.df)
    }
    len = nrow(portfolioVals.df)
    ordIx = order(as.vector(portfolioVals.df[,"momentum"]), decreasing=T)
    portfolioSort.df = portfolioVals.df[ordIx,]
    decile = round(len * 0.1, -1)
    topRet.v = as.vector(portfolioSort.df[1:decile,"qtrRet"])
    botRet.v = as.vector(portfolioSort.df[(len-(decile-1)):len,"qtrRet"])
    w = rep(1/decile, decile)
    topDecile = as.numeric(w %*% topRet.v)
    botDecile = as.numeric(w %*% botRet.v)
    longShort.v = c((-1 * topRet.v), botRet.v)
    w = rep(1/(2 * decile), (2 * decile))
    longShort = w %*% longShort.v
    row.df = data.frame(currentMonth, topDecile, botDecile, longShort)
    colnames(row.df) = c("date", "topDecile", "botDecile", "longShort")
    rslt.df = rbind(rslt.df, row.df)
  }
  return(rslt.df)
} # momentumPortfolios


# fuse close prices from the CRSP/Compustat factor data.
#
# close.df : this is the data that has been downloaded from the 
# Compustat monthly update (and then fixed by fix_compustat_data.r)
#
# t.df:
#
#   t.df = compustat.df[,c("datadate", "tic", "PRCCQ")]
#   colnames(t.df) = colnames(close.df)
#
# winSize : the number of quarters in the back test region.
#
fuseClosePrices = function(close.df, t.df, winSize)
{
  closeSym = unique(as.vector(close.df[,"sym"]))
  compuSym = unique(as.vector(t.df[,"sym"]))
  missingIx = which(! compuSym %in% closeSym)
  missingSym = compuSym[missingIx]
  block = data.frame()
  for (sym in missingSym) {
    symRows = t.df[t.df[,"sym"] == sym,]
    if (nrow(symRows) >= winSize) {
      block = rbind(block, symRows)
    }
  }
  close.df = rbind(close.df, block)
  return(close.df)
}

#
# Calculate the symbol density, per quarter
symbolDensity = function(data.df)
{
  dateRange = sort(unique(as.Date(as.vector(data.df[,"date"]))))
  symCounts = rep(0, length(dateRange))
  i = 0
  allDates = as.Date(as.vector(data.df[,"date"]))
  for (date in dateRange) {
    symAtDate = length(close.df[allDates == date, "sym"])
    symCounts[i] = symAtDate
    i = i + 1
  }
  symCounts.z = as.zoo(symCounts)
  index(symCounts.z) = dateRange
  return(symCounts.z)
}


# Calculate the stock by stock correlation between the predicted return (via OLS), the predicted
# return via robust regression, the predicted return as the mean, the EMA mean and the mean
# calculated via the linear regression of the return.
#
corData = function( predicted.df )
{
  corRslt = data.frame()
  omittCols = c("R2", "retSigma", "ggroup", "gsector")
  colNames = colnames(predicted.df)
  omitColsIx = which(colNames %in% omittCols)
  corData.df = predicted.df
  if (length(omitColsIx) > 0) {
    corrCols = colNames[-omitColsIx]
    corData.df = predicted.df[,corrCols]
  }
  stocks = unique(as.vector(corData.df[,"sym"]))
  for (sym in stocks) {
    stockBlock = corData.df[corData.df[,"sym"] == sym,-which(colnames(corData.df) %in% c("date", "sym"))]
    # Apparently correlation stabalizes around 250 samples, which are many fewer than are present here.
    # So I've chosen an arbitrary boundary.
    if (nrow(stockBlock) >= 10) {
      actual = as.vector(stockBlock[,"actual"])
      sigma = sd(actual)
      mu = mean(actual)
      rand = rnorm(n = length(actual), mean = mu, sd = sigma )
      stockBlock = cbind(stockBlock, rand)
      rho = cor(stockBlock)
      rhoVec = round(rho["actual",-which(colnames(rho) == "actual")],4)
      rho.m = matrix(rhoVec, nrow=1, ncol=length(rhoVec))
      colnames(rho.m) = names(rhoVec)
      rho.df = as.data.frame(rho.m)
      rownames(rho.df) = sym
      corRslt = rbind(corRslt, rho.df)
    }
  }
  return(corRslt)
}


#
# Given a matrix, calculate densities for the matrix columns
# (this can be in preparation for plotting)
#
densityPlot = function(mat, mainTitle, xTitle)
{
  lineWidth = 4
  mat[is.na(mat)] = 0
  dens = apply(mat, 2, density)
  maxDens.l = lapply(dens, FUN=function(d) {return(max(d$y))})
  maxDens = max(unlist(maxDens.l))
  colors = c("black", "red", "slateblue1", "green4", "magenta", "gold2")
  par(mfrow=c(1,1))
  for (i in 1:length(dens)) {
    d = dens[[i]]
    if (i == 1) {
      plot(d, xlim=c(-0.8, 0.8), ylim=c(0, maxDens), col=colors[i], lwd=lineWidth, xlab=xTitle, main=mainTitle)
    } else {
      lines(d, col=colors[i], lwd=lineWidth)
    }
  }
  abline(v=0, col="black", lwd=2, lty = 2)
  legend("topright", legend=colnames(mat), col=colors, lwd=8)
  par(mfrow=c(1,1))
}


#
# The data.df data frame should have four columns:
#    date, sym, predict, actual
# Note that the predict column is not necessarily named predict.
#
# Calculate the quarterly time series of portfolio returns where the the portfolio
# is constructed by sorting on the predict column and taking the top numDecile
# stocks. The portfolio return is calculated for equally weighted stocks, using
# the actual return column
#
# The data.df data frame has four columns: c("date", "sym", <data>, "actual") where
# the <data> column is the column to use to build teh portfolio. This will vary depending
# on the call.
#
calcPortTopDecile = function(data.df, numDecile)
{
  nonDataCols = c("date", "sym", "actual")
  colIx = which(colnames(data.df) %in% nonDataCols)
  dates = as.Date(unique(as.vector(data.df[,"date"])))
  ret = rep(0, length(dates))
  w = t(rep(1/numDecile, numDecile))
  for (i in 1:length(dates)) {
    d = dates[i]
    ix = which(as.Date(as.vector(data.df[,"date"])) == d)
    block.df = data.df[ix,]
    # sort on the value colum
    values = as.vector(block.df[,-colIx])
    ordIx = order(values, decreasing=TRUE)
    ordered.df = block.df[ordIx,]
    # The actual return is the actual return for that quarter
    portRet = as.vector(ordered.df[1:numDecile,"actual"])
    r = as.numeric(w %*% portRet)
    ret[i] = r
  }
  ret.z = zoo(ret)
  index(ret.z) = dates
  return( ret.z )
} # calcPortTopDecile

#
# Calculate the quarterly time seriers of portfolio returns where the portfolio is
# constructed from randomly chosen stocks (uniform distribution). The portfolio is
# equally weighted.
#
# The columns of data.df are: 
#                   date  sym     actual
# 2003-09-30  2003-03-31  ABT 0.09092069
# 2003-09-301 2003-03-31  HON 0.23798543
calcPortRandDecile = function(data.df, numDecile)
{
  w = t(rep(1/numDecile, numDecile))
  dates = as.Date(unique(as.vector(data.df[,"date"])))
  ret = rep(0, length(dates))
  for (i in 1:length(dates)) {
    d = dates[i]
    ix = which(as.Date(as.vector(data.df[,"date"])) == d)
    block.df = data.df[ix,]
    randVals = unique(round(runif(numDecile * 2, min=1, max=nrow(block.df))))
    randIx = randVals[1:numDecile]
    portRet = as.vector(block.df[randIx,"actual"])
    r = as.numeric(w %*% portRet)
    ret[i] = r
  }
  ret.z = zoo(ret)
  index(ret.z) = dates
  return( ret.z )
} # calcPortRandDecile

# Calculate the quarterly time series of portfolio returns where the the portfolio
# is constructed by sorting on the predict column and taking a long position in the 
# top numDecile stocks and a short position in the bottom numDecile stocks. The
# weights of the top and bottom are equally weighted.
#
calcLongShortDecile = function(data.df, numDecile)
{
  nonDataCols = c("date", "sym", "actual")
  colIx = which(colnames(data.df) %in% nonDataCols)
  dates = as.Date(unique(as.vector(data.df[,"date"])))
  ret = rep(0, length(dates))
  # the top 10% is long, the bottom 10% is short
  numStocks = numDecile * 2
  w = t(rep(1/numStocks, numStocks))
  for (i in 1:length(dates)) {
    d = dates[i]
    ix = which(as.Date(as.vector(data.df[,"date"])) == d)
    block.df = data.df[ix,]
    # sort on the value column
    values = as.vector(block.df[,-colIx])
    ordIx = order(values, decreasing=TRUE)
    ordered.df = block.df[ordIx,]
    # The actual return is the actual return for that quarter
    # The top return is a long position
    topRet = as.vector(ordered.df[1:numDecile,"actual"])
    len = nrow(block.df)
    # this is a short position, so the sign of the return is inverted
    bottomRet = -as.vector(ordered.df[(len - (numDecile-1)):len,"actual"])
    portRet = c(topRet, bottomRet)
    r = as.numeric(w %*% portRet)
    ret[i] = r
  }
  ret.z = zoo(ret)
  index(ret.z) = dates
  return( ret.z )
}


calcRandLongShort =function(data.df, numDecile)
{
  nonDataCols = c("date", "sym", "actual")
  colIx = which(colnames(data.df) %in% nonDataCols)
  dates = as.Date(unique(as.vector(data.df[,"date"])))
  ret = rep(0, length(dates))
  # the top 10% is long, the bottom 10% is short
  numStocks = numDecile * 2
  w = t(rep(1/numStocks, numStocks))
  for (i in 1:length(dates)) {
    d = dates[i]
    ix = which(as.Date(as.vector(data.df[,"date"])) == d)
    block.df = data.df[ix,]
    len = nrow(block.df)
    randVals = unique(round(runif(n = (numStocks * 2), min=1, max=len)))
    topIx = randVals[1:numDecile]
    bottomIx = randVals[(numDecile+1):numStocks]
    # The actual return is the actual return for that quarter
    # The top return is a long position
    topRet = as.vector(block.df[topIx,"actual"])
    # this is a short position, so the sign of the return is inverted
    bottomRet = -as.vector(block.df[bottomIx,"actual"])
    portRet = c(topRet, bottomRet)
    r = as.numeric(w %*% portRet)
    ret[i] = r
  }
  ret.z = zoo(ret)
  index(ret.z) = dates
  return( ret.z )  
}

infoRatio = function(assetRet.z, bench.z) 
{
  alpha = (assetRet.z - bench.z)
  IR = mean(alpha)/sd(alpha)
  return(IR)
}

#
# Calculate the yearly Sharpe ratio from quarterly returns and
# risk free rate.
#
sharpeRatio = function(assetRet.z, Rf.z)
{
  alpha = assetRet.z - Rf.z
  sharpe = (4 * mean(alpha))/(2 * sd(alpha))
  return(sharpe)
}

#
# ====================================< end functions >============================================
#
@

<<echo=FALSE, cache=TRUE>>=
#
# Three data sets are loaded:
#  1. The S&P Constituents, per quarter
#  2. The synthesized corporate factors (see factor_calc.r)
#  3. The close prices which are downloaded from the Compustat monthly update data
#
factors.df = read.csv(file=factorPath)
sp500.df = read.csv(file=sp500Path)
compustat.df = read.csv(file=compustatPath)
closeRaw.df = read.csv(file=closePath)
interest.df = read.csv(file=interestOutPath)

removeCols = c("COMNAM", "CUSIP")
removeIx = which(colnames(closeRaw.df) %in% removeCols)
close.df = closeRaw.df[,-removeIx]
colnames(close.df) = c("date", "sym", "close")

# Number of quarters in the back-test region
numYears = 5
winSize = (numYears * 4)

t.df = compustat.df[,c("datadate", "tic", "PRCCQ")]
colnames(t.df) = colnames(close.df)

# Add the close prices that exist in the factors but are not in the
# compustat data.
closeFuse.df = fuseClosePrices(close.df, t.df, winSize)

@


<<echo=FALSE>>=
stocks = unique(as.vector(factors.df[,"sym"]))
numStocks = length(stocks)
nonFactorCols = c("date", "sym", "ggroup", "gsector")
colNames = colnames(factors.df)
nonFactorIx = which(colNames %in% nonFactorCols)
factorCols = colNames[-nonFactorIx]

@

<<echo=FALSE, cache=TRUE>>=
# standardize the factors (avoiding divide by zero errors)
factorsScaled.df = scaleFactors(factors.df, stocks, factorCols)
@

\section*{Corporate Value Factors}

A value investors looks for companies whose stock is under-priced relative to a set of value factors. 
There are a wide variety of factors that may drive stock prices and returns. Some of these factors may be macro-economic factors like the interest rate, the employment or construction indexes. 
The most commonly referenced value factor is the ratio of the stock price to the earnings per share (which is generally referred to as the price earnings ratio or P/E ratio). A low price earnings ratio may indicate that a 
company's stock is under-priced. The value investor hopes that when the "market" recognizes this mispricing, 
the stock price will rise.

Technology stocks and other growth companies can be a notable exception to corporate value metrics. At the time this paper was written Google had a price earnings ratio of about 30. Facebook had a P/E ratio of 141 and Amazon had a price earnings ratio of over 1,400. Twitter had no earnings, so the company had no P/E ratio.
 
The value factors that are investigated in this paper are calculated for the historical S\&P 500 stocks, from 1998 through 2013. The S\&P 500 is composed of stocks with large market capitalization.

The data to calculate these factors was obtained from the CRSP/Compustat Merged Database available from Wharton Research Data Service (WRDS).\footnote{Wharton Research Data Services (WRDS) was used in preparing this paper. This service and the data available thereon constitute valuable intellectual property and trade secrets of WRDS and/or its third-party suppliers.} 
For a detailed discussion of the Fundamentals Quarterly data set in WRDS see \cite{Kaplan2013a}. The value factors discussed in this paper are based on those discussed in Chapter 5 of \textit{Quantitative Equity Portfolio Management}\cite{Qian2007}. A description of how these value factors are calculated from the WRDS data set can be found in \cite{Kaplan2013a}.

The value factors are summarized in Table \ref{valueDesc}:

<<echo=FALSE, results='asis'>>=
valueFactorDesc = c("MV", "Market Value",
                    "EV", "Enterprise Value",
                    "CF02EV", "Cash Flow from Operations to Enterprise Value",
                    "RONA", "Return On Net operating Assets",
                    "EBITDA2EV", "Earnings Before Interest, Taxes, Depreciation and Amortization to Enterprise Value", 
                    "E2PFY0", "Trailing 12-month earnings to market capitalization",
                    "BB2P", "Net buyback to market capitalization",
                    "BB2EV", "Net external financing to enterprise value",
                    "B2P", "Book to market capitalization",
                    "S2EV", "Sales to Enterprise Value"
                    )
valueFactorDesc.m = matrix(valueFactorDesc, ncol=2, byrow=T)
tbl = xtable(valueFactorDesc.m, label="valueDesc", caption="Value Factor Description", align=c("l","l", "l"))
print.xtable(tbl, include.rownames=F, include.colnames=F, size="\\small", floating=T)
@

These value factors were chosen because they are frequently cited as indicators of future return (see Chapter 5 of \cite{Qian2007}). For example, one popular factor is the the earnings to enterprise value factor (EBITDA2EV) factor, which is a ratio of earnings to the purchase price of the company. Larger values (closer to 1) suggest that a company is inexpensive relative to earnings, which in theory means that the stock is more likely to rise in value.

The stock universe used in this paper is drawn from the S\&P 500 index for a given quarter. The constituents of the S\&P 500 index change over time as stocks are added and removed from the index. When a portfolio from the S\&P 500 universe is constructed, it is built from the S\&P 500 constituents at that time to avoid survivor bias.

The value factors are constructed from the information obtained from the corporate quarterly reports that companies are required to file with the US Securities and Exchange Commission (SEC.) To match the quarterly factors, quarterly returns are constructed from the quarterly close price for each stock in the portfolio.

Corporate quarterly data is not immediately available. In many cases there is a lag of six months (two quarters). When constructing a linear model, the value factors are lagged by two quarters relative to the future return (constructed from the close price). This should reflect market reality, since the value factors for a given quarter are not available to "the market" for six months.

\section*{Ranked Portfolios and Linear Models}

In this paper, the effectiveness of the value factors in Table \ref{valueDesc} to forecast returns are investigated in two ways: 

\begin{enumerate}
\item Portfolios are constructed by ranking the stocks on the basis of the value factors (one at a time).
\item Portfolios are constructed by ranking the stocks on the return predicted by a linear model constructed from the value factors.
\end{enumerate}

In constructing a ranked portfolio, a long only, equally weighted portfolio is constructed from the top ten percent of the sorted stocks (on the basis of either a value factor or a linear model prediction.) 

In some cases 50/50 long/short portfolios are also constructed. These portfolios consist of an equal number of stocks in a long position in the sorted top ten percent and a short position in the bottom ten percent. The positions in the portfolio are equally weighted.

\section*{Factor Correlation}

In this section the correlations between the value factors are examined for all \Sexpr{numStocks} stocks in the S\&P 500 universe throughout the back test period.

Factor correlation is an important statistic because it shows the relationship between paired factors. If a pair of factors is highly correlated (correlation close to 1) two portfolios ranked on the factors will have similar performance. 

In linear models, a factor may be omitted from the linear regression if it is highly correlated with another factor.  Avoiding highly correlated factors may also help avoid multi-colinearity, which results in inaccurate ordinary least squares regression results. 

Figure \ref{fig:corrplot} shows the pairwise factor correlations for the \Sexpr{length(factorCols)} factors.

<<corrplot, fig.lp="fig:", echo=FALSE, fig.cap = 'Factor Correlations', fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
#
# Correlation plot
#
M = round(cor(factors.df[, factorCols]), 4)
par(mfrow=c(1,1))
corrplot(M, type="lower", diag=F, addgrid.col=T, addCoef.col=T)
par(mfrow=c(1,1))
@

As Figure \ref{fig:corrplot2} shows, EBITDA2EV and S2EV are highly correlated with each other. Their correlations with the other factors are also similar. As a result, S2EV is omitted from this analysis in favor of the EBITDA2EV factor.

<<corrplot2, fig.lp="fig:", echo=FALSE, fig.cap="Correlation between EBITDA2EV and S2EV", fig.pos='H', fig.width=8, fig.height=5>>=
twoFac = M[,c("EBITDA2EV", "S2EV")]
rowOrder = c("EBITDA2EV", "S2EV", "CFO2EV", "RONA", "E2PFY0", "BB2P", "BB2EV", "B2P")
twoFacOrder = twoFac[rowOrder,]
par(mfrow=c(1,1))
corrplot(twoFacOrder, type="lower", diag=F, addgrid.col=T, addCoef.col=T)
par(mfrow=c(1,1))
@

\section*{Portfolio Back-testing}

\begin{quote}
\emph{Whereof what's past is prologue} \\
The Tempest \\
William Shakespeare, 1611 \\
\end{quote}

<<echo=FALSE>>=
# Calculate the date as a month, day year string (Aug 17, 2013)
dateRange.ch = unique(as.vector(sp500.df[,"qtr"]))
dateRange = as.Date(dateRange.ch)
minSP500Date = min(dateRange)
maxSP500Date = max(dateRange)
minSP500Date.ch = format(minSP500Date, format="%B %d, %Y")
maxSP500Date.ch = format(maxSP500Date, format="%B %d, %Y")
@


The future outcome for a stock is unknown and unknowable in an exact sense. The only way to predict what \emph{may} happen in the future is to use information from the past. By using past data to back-test portfolio models an imperfect estimate of future portfolio performance can be calculated.

In the back tests described here, the S\&P 500 portfolio date range is from \Sexpr{minSP500Date.ch} to 
\Sexpr{maxSP500Date.ch}. This time period is chosen because many of the characteristics that are present in the 
current market, like low cost electronic trading, were present in this time period.

<<univCalSetup, echo=FALSE>>=
# 
# Delete the S2EV factor since it is closely mirrored by the EBITDA2EV factor.
#
ix = which(colnames(factors.df) == "S2EV")
factors2.df = factors.df[,-ix]
stepAhead = 3
dates = startEndDates(factors2.df, sp500.df, closeFuse.df, stepAhead)
startDate = dates$startDate
endDate = dates$endDate
@


<<getSP500Close, echo=FALSE, cache=TRUE>>=
#
# Calculate the S&P 500 return from the finance.yahoo.com monthly prices.
#
# Get the quarterly close prices.
closeDates = as.Date(unique(as.vector(close.df[,"date"])))
adjDates = as.Date(rep(0, length(closeDates)))
# adjust the dates so that the quarter date falls on a day when the market is open
for (i in 1:length(closeDates)) {
  date = closeDates[i]
  while(! isBizday(x = as.timeDate(date), holidays=holidayNYSE(as.numeric(format(date, "%Y"))))) {
    date = date - 1
  }
  adjDates[i] = date
}
sp500Close = zoo(rep(0, length(closeDates)))
# Get the close prices from Yahoo
for (i in 1:length(adjDates)) {
  date = adjDates[i]
  sp500Close[i] = get.hist.quote(instrument="^GSPC", start=date, end=date, quote="Close",
                                 provider="yahoo", compression="m", retclass="zoo", quiet=T)
}
# Use the quarterly dates which line up with the other calculations
index(sp500Close) = closeDates
sp500Ret = diff(log(sp500Close))
@


<<univCalc, echo=FALSE, cache=TRUE>>=
indexUnivName = "indexUniv.RData"
indexUnivFile = paste(r_dataDir, indexUnivName, sep="/")
if (file.exists(indexUnivFile)) {
  load(indexUnivFile)
} else {
  stockUniverse = indexUnivCalc(sp500.df, factors2.df, closeFuse.df, startDate, endDate, winSize, stepAhead, "qtr")
  save(stockUniverse, file=indexUnivFile)
}
@


<<valuePortfolio, echo=FALSE, cache=TRUE>>=
valPortfolioName = "valuePortfolio.RData"
valPortfolioFile = paste(r_dataDir, valPortfolioName, sep="/")
if (file.exists(valPortfolioFile)) {
  load(valPortfolioFile)
} else {
  portfolioRslt = modelReturn(stockUniverse, factors2.df, closeFuse.df, winSize, stepAhead, blockPortfolioModel)
  save(portfolioRslt, file=valPortfolioFile)
}
@

\subsection*{The Information Ratio}

Portfolios are frequently evaluated relative to a benchmark like the S\&P 500 (for a long only portfolio) or the "risk free rate", in the case of a long/short portfolio. A metric that can be used to compare the performance of a portfolio with the benchmark is the information ratio, IR:

\begin{equation}
  IR = \frac{E[R_p - R_b]}{\sigma} = \frac{E[R_p - R_b]}{\sqrt{var(R_p - R_b)}} = \frac{\alpha}{\omega}
\end{equation}

Here $R_p$ is the quarterly portfolio return and $R_b$ is the quarterly benchmark return (in this case the S\&P 500).

To provide perspective on the information ratio (IR) measure, the table below shows the distribution of annual IR values for active portfolio managers \cite{Grinold1999} (e.g., a portfolio manager with a portfolio IR of 1.0 would be in the $90^{th}$ percentile of portfolio managers).

<<IRRankTable, echo=FALSE, results='asis'>>=
IRRanking = matrix(cbind(c(90, 75, 50, 25, 10), c(1.0, 0.5, 0.0, -0.5, -1.0)), nrow=5, ncol=2)
colnames(IRRanking) = c("Percentile", "Information Ratio")

t = xtable(IRRanking, label="ir_ranking", caption="IR Ranking", xtable=4, 
           display=c("d", "d", "f"))
print.xtable(t, include.rownames=F, size="\\small", floating=T, caption.placement="bottom")
@

The information ratios quoted in this paper are quarterly information ratios. Information ratios are commonly quoted in annualized form. However, annualizing the quarterly information ratio adds inaccuracy, since this makes the assumption that the returns and variance are the same in each quarter.

\subsection*{Ranked Portfolios}

One way to evaluate a factors effectiveness in selecting stocks for a portfolio is to rank the stocks on the basis of the factor \cite{Chan2004, Fama1992}. An equally weighted portfolio is constructed from the top ranked stocks (generally the top ten percent) in the case of a long only portfolio. For a long/short, market neutral portfolio, a long position is taken in the top ranked stocks and a short position is taken in the bottom ranked stocks.

To investigate the value factors, quarterly portfolios are constructed on the basis of each factor. Except for the RONA (return on assets) factor, all factors are divided by the market or enterprise value, which is calculated from the stock price. To adjust for the volatility of the factor values due to the volatility in the stock price, the value used in ranking is calculated by linear regression over the past 20 factor values (5 years). This regression is a time series regression over the factor values for each stock. An example is shown below in Figure \ref{fig:applePlot}. 

<<applePlot, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Time Series Regression for AAPL B2P", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
apple = as.vector(factors2.df[factors2.df[,"sym"]  == "AAPL", "B2P"][1:20])
dates = as.Date(as.vector(factors2.df[factors2.df[,"sym"]  == "AAPL", "date"][1:20]))
l = lm(apple ~ dates)
par(mfrow=c(1,1))
plot(x = dates, y=apple, type="p", pch=16, col="blue", ylim=c(min(l$fitted.values),max(apple)),
     main="Regression for AAPL Book to Price (B2P)", xlab="Dates", ylab="Book to Price")
lines(x = dates, y = l$fitted.values, type="o", col=c("red"), pch=24, lwd=2)
par(mfrow=c(1,1))
@

The factor value used in the ranking is the last value on the regression line.

The value calculated via regression lags the quarter in which the stock would be bought by six months to account for the reporting lag. The portfolio is rebalanced every quarter, so the stock is held one quarter and then sold. Transaction costs are ignored.

Once the factor value for each stock in the quarterly S\&P 500 universe has been estimated by linear regression, the stocks are ranked by their associated factor value. An equally weighted portfolio is constructed from the top ten percent of the stocks.

Table \ref{rankedPortIR} shows the information ratio for portfolios ranked on the basis of each of the seven factors. The quarterly S\&P 500 close prices are obtained from \texttt{finance.yahoo.com} (ticker symbol $\wedge GSPC$.) Table \ref{rankedPortIR} shows both the top and bottom deciles. All of these portfolios have a lower risk adjusted return than the S\&P 500 index.

<<sortPortfolioIR, echo=FALSE, results='asis'>>=
dateIx = as.Date(as.vector(portfolioRslt[,"date"]))
portfolioRslt.z = zoo(portfolioRslt[,-which(colnames(portfolioRslt) == "date")], order.by=dateIx)
sortedIR = rep(0, ncol(portfolioRslt.z))
colNames = colnames(portfolioRslt.z)
names(sortedIR) = colNames
for (col in colNames) {
  col.z = portfolioRslt.z[,col]
  sortedIR[col] = infoRatio(col.z, sp500Ret)
}

colNames = colnames(factors2.df)
nonFactorIx = which(colNames %in% nonFactorCols)
factorCols = colNames[-nonFactorIx]

topIx = seq(from=1, by=2, length.out=length(factorCols))
botIx = seq(from=2, by=2, length.out=length(factorCols))
topDecile = sortedIR[topIx]
botDecile = sortedIR[botIx]
tableData = rbind(topDecile, botDecile)
colnames(tableData) = factorCols
rownames(tableData) = c("Top Decile", "Bottom Decile")
tableData = round(tableData, 4)

tbl = xtable(tableData, label="rankedPortIR", caption="IR for Portfolios Selected by Sorted Factors",
             xtable=4, digits=4 )
print.xtable(tbl, include.rownames=T, size="\\small", floating=T)
@

The factor that results in the portfolio with the best performance relative to the S\&P 500 benchmark is the book to price (B2P) factor. The plot in Figure \ref{fig:rankPortCumReturn} shows the performance of the B2P ranked portfolio vs. the S\&P 500 benchmark.

<<rankPortCumReturn, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="one dollar invested in the B2P and SP 500 Portfolios", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=

dateIx = index(portfolioRslt.z[,"topB2P"])
r = cbind(portfolioRslt.z[,"topB2P"], sp500Ret[dateIx])
colnames(r) = c("B2P", "S&P 500")
par(mfrow=c(1,1))
chart.CumReturns(r, wealth.index=T, main="B2P Ranked Portfolio and the S&P 500", 
                 col=c("blue", "black"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

<<echo=FALSE>>=
# for data verification
endDates = as.Date(unlist(lapply(stockUniverse, FUN=function(l) {l$date})))
stockCount = unlist(lapply(stockUniverse, FUN=function(l) {length(l$stocks)}))
@

\subsection*{Linear Value Factor Models}

The financial performance of a company may depend on a wide variety of factors. This may be lost when a single factor is used to select portfolio assets. Ranking stocks on the basis of a linear model allows multiple factors to be combined to forecast future return.

Linear models, constructed from the time series of the factors associated with a stock, allows the forecasting strength of factors to be examined in isolation. Assessing factor performance can be more difficult when forecasting is aggregated by constructing a ranked portfolio of equally weighted stocks since there may be other factors, outside the model, that affect portfolio return.

The linear models that are constructed take into account the time lag between the quarterly date and the time the corporate information becomes available (e.g., the data from March 2013 may not be available until September 2013.)

A linear model at time $t$ is used to estimate for the return at time $t+1$. The linear model regresses the seven value factors against future returns. Since the factor data has a two quarter lag in availability, the factor data at time $t-2$, is used to predict the return at $t+1$, which is three quarters ahead. The model is constructed from factor data starting at time $t-3$ back to $t-3-n$.  These factors are regressed against returns from time $t-n \dots t$.

There are seven factors (CFO2EV, RONA, EBITDA2EV, E2PFY0, BB2P, BB2EV, B2P) over 20 past quarters used to build the linear model:

\begin{equation}
r_{t+1} = \beta_{0} + \beta_{1} f_{t,1} + \beta_{2} f_{t,2} + ... + \beta_{7} f_{t,7} + \epsilon_t
\end{equation}

\begin{equation}
  \begin{bmatrix}
  r_{t_2} \\
  r_{t_3} \\
  \vdots \\
  r_{t_{21}} \\
  \end{bmatrix} = \begin{bmatrix}
                   1 & f_{t_1,1} & f_{t_1,2} & \dots & f_{t_1,7} \\
                   \vdots & \vdots & \vdots & & \vdots \\
                   1 &  f_{t_{20},1} & f_{t_{20},2} & \dots & f_{t_{20},7} \\
                  \end{bmatrix}
                  \cdot
                  \begin{bmatrix}
                  \beta_0 \\
                  \beta_1 \\
                  \vdots \\
                  \beta_{7} \\
                  \end{bmatrix} + \begin{bmatrix}
                                  \epsilon_1 \\
                                  \epsilon_2 \\
                                  \vdots \\
                                  \epsilon_{20} \\
                                  \end{bmatrix}
\end{equation}

From the past data we know $\boldsymbol{r}$ (the returns) and $\boldsymbol{f}$ (the factor values). The values of $\hat{\boldsymbol{\beta}}$ and $\hat{\boldsymbol{\epsilon}}$ are estimated via multiple linear regression.

\begin{equation}
\boldsymbol{r} = \boldsymbol{f} \hat{\boldsymbol{\beta}} + \hat{\boldsymbol{\epsilon}}
\end{equation}

The data layout for the 20 quarters (5 year) of factor data regressed against 20 quarters of returns (where the returns are three quarters ahead) is shown below in figure \ref{fig:backtestData}. 

\begin{figure}[h]
\includegraphics[scale=0.6]{backtest_data.jpg}
\caption{Factors and 3-quarter ahead returns}
\label{fig:backtestData}
\end{figure}

Once the linear model is constructed, the linear model can be used to forecast the future return at time $t+1$ from the factor values available at time $t$:

\begin{equation}
r_{t+1} = \beta_0 + \beta_1 f_{t,1} + \dots + \beta_7 f_{t,7}
\end{equation}

The accuracy of the linear model can be checked in back-testing, since the actual return at time $t+1$ is known.

Five past years (20 quarters) are used to construct the linear model and a total of 21 quarters are needed (20 for the model, plus the "current" factor value used to forecast the future return).  These quarters start two quarters back from time $t$ (due to the lag in factor data availability). Twenty (three-quarter ahead) returns are needed to build the linear model. An extra return (the "actual" return) is used to check the prediction (and calculate the correlation between the predicted return and the actual return.)

The CRSP/Compustat data is downloaded for the entire set of stocks over the entire back test period. Not all stocks have 21 quarters of factor data and 23 quarters of return data available in the CRSP/Compustat database. These stocks are filtered out of the backtest set, leaving on average about \Sexpr{round(mean(stockCount),-1)} stocks per quarter (rather than the 500 S\&P index stocks.)

In this study both ordinary least squares and robust regression are used. Robust regression does not produce results that are clearly better and sometimes produces results that seem to be worse.

Some authors \cite{Guerard2006,Saxena2012} have used weighted least squares regression. Weighted least squares relies on estimates of sample variance (see section 6.2 of \cite{Faraway2005}). Since there are only 20 quarterly values in the backtest data set, the standard error in the variance estimate will be relatively high, so weighted least squares was not used.

The time window used to build the linear model is "rolled" forward each quarter to estimate the forecasted return for that quarter.

<<calcPredRet, echo=FALSE, cache=TRUE>>=

# the predictedRet.df data frame contains:
# - the date
# - the CRSP/Compustat symbol
# - the predicted return at time t+1
# - the actual return at time t+1.
# - the GIC group
# - the GIC sector
#
predReturnName = "predRet.RData"
predReturnFile = paste(r_dataDir, predReturnName, sep="/")
if (file.exists(predReturnFile)) {
  load(predReturnFile)
} else {
  predictedRet.df = modelReturn(stockUniverse, factors2.df, closeFuse.df, winSize, stepAhead, blockAlphaModel)
  save(predictedRet.df, file=predReturnFile)
}
@

The distribution of the ordinary least squares $R^2$ is shown below in Figure \ref{fig:plotR2}.

<<plotR2,  fig.lp="fig:", echo=FALSE, fig.cap="$R^2$ for OLS of Factors vs. 3-Quarter Ahead Returns", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
regressionR2 = predictedRet.df[,"R2"]
par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
chart.Histogram(regressionR2, col="grey40", xlab=expression(paste("Distribution of OLS ", R^2)))
par(mfrow=c(1,1))
@

For each stock, the forecasting accuracy of the Ordinary Least Squares 7-factor model is compared to five other models. The models are listed below:

\begin{enumerate}
\item predict: the return at time $t+1$ is forecast via an OLS model using the seven value factors, over a 20 quarter period.
\item robPredict: the same model as the OLS model, but using robust linear regression
\item retMu: the mean of the past 20 returns ($t-n \dots t$) is used as the value to forecast the return at time $t+1$.
\item retEMA: the return at time $t+1$ is forecasted by the Exponential Moving Average function with a window of 4, over 20 past returns.
\item retLinMu: OLS regression through the past 20 returns is used to forecast the return at time $t+1$.
\item random values with the same mean and standard deviation are correlated against the actual future return.
\end{enumerate}

<<echo=FALSE>>=
# sym corPredict, corRobPredict, corRetMu, corRetEMA, corRetLinMu
retCorName = "retCor.RData"
retCorFile = paste(r_dataDir, retCorName, sep="/")
if (file.exists(retCorFile)) {
  load(retCorFile)
} else {
  # for the quarterly data, omit the momentum factor since momentum is supposed to be a monthly
  # factor.
  omittCols = c("R2", "momentum", "retSigma", "ggroup", "gsector")
  colNames = colnames(predictedRet.df)
  omitColsIx = which(colNames %in% omittCols)
  corrCols = colNames[-omitColsIx]
  stockCor = corData(predictedRet.df[,corrCols])
  save(stockCor, file=retCorFile)
}
@

The box plot below in figure \ref{fig:plotCorWithRet} shows distribution of the correlations between the predicted values for each stock at a given quarter $t$ and the actual return at time $t+1$. For comparison, normal random values, with the same mean and standard deviation as the actual returns have been added (the yellow line).  With the exception of the correlation of the random values with the actual return, all of the correlations are negative.

<<plotCorWithRet,  fig.lp="fig:", echo=FALSE, fig.cap="Predicted Return vs. Actual Return", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=
# par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
# densityPlot(stockCor, "Predicted (and random) values vs. Actual Return", "Correlation")
# par(mfrow=c(1,1))
plotData = stockCor
colnames(plotData) = c("OLS Predict vs. Actual",
                          "Rob. OLS Predict vs. Actual",
                          "Mean Return vs. Actual",
                          "EMA return vs. Actual",
                          "OLS Return vs. Actual",
                          "Random val. vs. Actual")
par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
chart.Boxplot(plotData, xlab="Correlation", main="Correlation of Predicted vs. Actual",
              colorset=c("black", "red", "slateblue1", "green4", "magenta", "gold3"), lwd=4)
par(mfrow=c(1,1))
@

The plot in figure \ref{fig:plotCorWithRet} gives the following information:

\begin{enumerate}
\item Robust regression (red line) of the value factors with the future return is not much better than the random correlation.
\item The value factor based linear model (black line) has a higher correlation with future returns than either the random factor or the robust regression. But the differences is relatively small.
\item In strength of correlation between the actual returns:
   \begin{enumerate}
   \item The mean return (over five years, or 20 quarters, of past data) has the strongest correlation
   \item followed by the mean calculated via the exponential moving average
   \item followed by the mean calculated by ordinary least squares regression through the past returns
   \end{enumerate}
\end{enumerate}

The mean return, although negatively correlated, is a remarkably strong at forecasting future returns.

\subsection*{Model Selection for Value Factors}

The linear models used to forecast future returns are multi-variate linear models that make use of all seven of the corporate value factors. In some cases adding more factors to a linear model can increase the model error. To investigate this possibility R's \texttt{leaps} package is used to choose optimal linear models (on the basis of Mallow's Cp information criteria).

<<calcLinerModelPredRet, echo=FALSE, cache=TRUE>>=

# the predictedRet.df data frame contains:
# - the date
# - the CRSP/Compustat symbol
# - the predicted return at time t+1
# - the actual return at time t+1.
# - the GIC group
# - the GIC sector
#
predModelSelectReturnName = "predModelSelectRet.RData"
predModelSelectReturnFile = paste(r_dataDir, predModelSelectReturnName, sep="/")
if (file.exists(predModelSelectReturnFile)) {
  load(predModelSelectReturnFile)
} else {
  predictedLeapsRet.df = modelReturn(stockUniverse, factors2.df, closeFuse.df, winSize, stepAhead, 
                                blockAlphaLeaps)
  save(predictedLeapsRet.df, file=predModelSelectReturnFile)
}
@

To keep track of which factors are used in the linear models, the result from each \texttt{leaps} model selection includes a bit vector with one bit set for each factor.  Examples of these bit vectors are shown in Table \ref{model_bitvec}.

<<echo=FALSE, results='asis'>>=
colNames = colnames(factors2.df)
nonFactorIx = which(colNames %in% nonFactorCols)
factorCols = colNames[-nonFactorIx]

modelVal = as.vector(predictedLeapsRet.df[,"model"])
modelBits = t(sapply(modelVal, FUN=function(v) {intToBits(v)[1:7]}))
colnames(modelBits) = factorCols
t = matrix(as.integer(modelBits), ncol=ncol(modelBits), nrow=nrow(modelBits))
colnames(t) = factorCols
freq = apply(t, 2, sum)
freqPercnt = round((freq/sum(freq)) * 100,2)
names(freqPercnt) = factorCols
tbl = xtable(t[1:6,], label="model_bitvec", caption="Linear Model Represented as a Bit Vector")
print.xtable(tbl, include.rownames=T, size="\\scriptsize", floating=T)
@

Figure \ref{fig:factorDistribution} shows the factor distributions in the models selected by \texttt{leaps}. Most models have more than one factor and the distribution of the factors in the models is relatively uniform.

<<factorDistribution, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Factor Distribution in the Linear Models", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
#  mar = c(bottom, left, top, right)
par(mfrow=c(1,2), mar=c(6, 4, 3, 3))
barplot(freqPercnt, las=2, ylim=c(0, (max(freqPercnt) + 5)), ylab="Percent", xlab="", main="Percent that a factor occurs in a model")
abline(h=0, col="black")
hist(log2(as.integer(modelVal)), breaks=44, xlab=expression(paste(log[2], " model bit vector value")),
     main="Model Bit Vector Distribution")
par(mfrow=c(1,1))
@

The \texttt{leaps} optimized linear models do not result in better return forecasts. Figure \ref{fig:leapsModel} shows the correlation between the predicted and actual return for both the \texttt{leaps} optimized models and the full 7-factor model. The two distributions are almost exactly the same.

<<leapsModel, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Correlation of predicted and actual values for leaps and 7-factor model", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
ix = which(colnames(predictedLeapsRet.df) %in% "model")
leapsModel = corData(predictedLeapsRet.df[,-ix])
# par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
# plot(density(leapsModel[,"predict"]), col="blue", lwd=4, main="Optimized and 7-factor models", xlab="Correlation")
# lines(density(stockCor[,"predict"]), col="green", lwd=4)
# legend("topright", legend=c("leaps", "seven factors"), col=c("blue", "green"), lwd=8)
# par(mfrow=c(1,1))
modelCor = cbind(stockCor[,"predict"], leapsModel)
colnames(modelCor) = c("7-factor model vs. actual", "leaps model vs. actual", "random vs. actual")
par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
chart.Boxplot(modelCor, xlab="Correlation", main="Correlation of Predicted vs. Actual",
              colorset=c("blue", "green", "black"), lwd=2)
par(mfrow=c(1,1))
@

\subsection*{Monthly Factors}

The linear models discussed above are constructed from five years of quarterly data. With relatively few quarterly data points, the linear model may have a high error (which is reflected in the low $R^2$ values for many models.) In this section the number of values in the linear model is increased by using monthly values (e.g., 60 values, rather than 20).

Value factors are reported in quarterly corporate reports submitted to the US Securities and Exchange Commission, so more frequent corporate data values are not available.  Except for the return on assets (RONA) factor, all factors are scaled by either the enterprise value or the market value.  The enterprise value and market value are derived from the number of shares outstanding and the current share price. The market and enterprise values change as rapidly as the stock values change in the market.

<<readMonthlyFactors, echo=FALSE, cache=TRUE>>=
monthFactors.df = read.csv(monthFactorPath)
sp500Mon.df = read.csv(sp500MonPath)
monthCloseRaw.df = read.csv(closeMonthPath)
numMonths = 62
monthsAhead = 9
@

<<monthlyUniverseCalc, echo=FALSE, cache=TRUE>>=
normalizeDate = function( date ) {
  newDate = as.Date(date)
  monthDay = format(date, "%m-%d")
  if (monthDay == "03-30") {
    newDate = paste(format(date, "%Y"), "03-31", sep="-")
  }
  if (monthDay == "12-30") {
    newDate = paste(format(date, "%Y"), "12-31", sep="-")
  }
  return(newDate)
}

monthIndexUnivName = "monthIndexUniv.RData"
monthIndexUnivFile = paste(r_dataDir, monthIndexUnivName, sep="/")
if (file.exists(monthIndexUnivFile)) {
  load(monthIndexUnivFile)
} else {
  startDateMonth = normalizeDate( startDate)
  endDateMonth = normalizeDate(endDate)
  removeCols = c("COMNAM", "CUSIP", "sprtrn")
  removeIx = which(colnames(monthCloseRaw.df) %in% removeCols)
  monthClose.df = monthCloseRaw.df[,-removeIx]
  colnames(monthClose.df) = c("date", "sym", "close")
  monthStockUniverse = indexUnivCalc(sp500Mon.df, monthFactors.df, monthClose.df, startDateMonth, endDateMonth, 
                                     numMonths, monthsAhead, "month")
  save(monthStockUniverse, file=monthIndexUnivFile)
}
@


<<calcMonthPredRet, echo=FALSE, cache=TRUE>>=

# the predictedRet.df data frame contains:
# - the date
# - the CRSP/Compustat symbol
# - the predicted return at time t+1
# - the actual return at time t+1.
# - the GIC group
# - the GIC sector
#
monthPredReturnName = "monthPredRet.RData"
monthPredReturnFile = paste(r_dataDir, monthPredReturnName, sep="/")
if (file.exists(monthPredReturnFile)) {
  load(monthPredReturnFile)
} else {
  # in this case we keep the momentum factor
  monthPredictedRet.df = modelReturn(monthStockUniverse, monthFactors.df, monthClose.df, 
                                     numMonths, monthsAhead, blockAlphaModel)
  save(monthPredictedRet.df, file=monthPredReturnFile)
}
@

The quarterly data set used above is expanded to monthly values by using market and enterprise values that are calculated from the monthly close prices. This expands the linear model from twenty quarterly values to sixty two monthly values (two additional value are available by expanding the last corporate value). This is shown below in Figure \ref{fig:backtestMonthlyData}.

\begin{figure}[h]
\includegraphics[scale=0.5]{backtest_monthly_data.jpg}
\caption{Monthly Factors and 9-month Ahead Return}
\label{fig:backtestMonthlyData}
\end{figure}

The Figure \ref{fig:monthlyR2} shows the density plots for the quarterly and monthly $R^2$ error for the ordinary least squares (OLS) linear regression models used to predict future returns. The $R^2$ error is much higher (e.g., a lower $R^2$ error value) for the monthly linear regressions.

<<monthlyR2, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="$R^2$ for Quarterly and Monthly Regression", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
qtrDensity = density(regressionR2)
monthDensity = density(as.vector(monthPredictedRet.df[,"R2"]))
par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
plot(monthDensity, xlab=expression(paste("linear regression ", R^2)), main=expression(paste(R^2, " for quarterly and monthly regression models")), col="green", lwd=4)
lines(qtrDensity, col="blue", lwd=4)
legend("topright", legend=c(expression(paste("Quarterly OLS ", R^2)), expression(paste("Monthly OLS ", R^2))), col=c("blue", "green"), lwd=8)
par(mfrow=c(1,1))
@

Extending the factor data by using monthly close price information is probably adding volatility, which accounts for the increase in the linear model error (shown by the lower $R^2$ distribution).

Figure \ref{fig:plotMonthlyCorr} shows the distributions for the monthly and quarterly correlations of the predicted returns vs. the actual returns. The correlation of the monthly values are slightly worse than the quarterly values, so no predictive advantage was gained by using monthly factor values.

<<monthlyCorr, echo=FALSE, cache=TRUE>>=
# Calculate the correlation between the actual return value and the:
# - value predicted by linear regression
# - value preedicted by robust linear regression
# - the value predicted by taking the mean
# - the value predicted by the exponentially weighted moving average
# - the value predicted by a linear regression plot through the past return.
monthlyCorr = corData( monthPredictedRet.df)
@

<<plotMonthlyCorr, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Correlation of Monthly Return Prediction", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
colnames(monthlyCorr) = c("OLS Predict vs. Actual",
                          "Rob. OLS Predict vs. Actual",
                          "Mean Return vs. Actual",
                          "EMA return vs. Actual",
                          "OLS Return vs. Actual",
                          "Momentum vs. Actual",
                          "Random val. vs. Actual")
#  mar = c(bottom, left, top, right)
par(mfrow=c(1,1), mar=c(4, 8, 2, 1))
chart.Boxplot(monthlyCorr, xlab="Correlation", main="Correlation of Monthly Predicted vs. Actual",
              col=c("grey40", "red", "slateblue4", "green4", "magenta", "gold3", "blue"), lwd=2, horizontal=TRUE, las=1,
        cex.axis = 0.65)
par(mfrow=c(1,1))
@

The correlation between a monthly return momentum factor and the actual return is included as well. This factor is discussed later in the paper. Note that the correlation of the momentum factor with the actual return is lower than the mean return factor.

\section*{Multi-Factor Portfolios}

<<calcDecileSize,echo=FALSE,cache=TRUE>>=
qtrDates = as.vector(predictedRet.df[,"date"])
runLen = rle(qtrDates)
# calculate the mean portfolio length and round to the nearest ten
muLen = round(mean(runLen$lengths),-1)
numDecile = round(muLen * 0.1)
@


<<portfolioDecileCalc, echo=FALSE, cache=TRUE>>=
metaIx = which(colnames(predictedRet.df) %in% c("date", "sym"))

predictData = cbind(predictedRet.df[,metaIx], -predictedRet.df[,"predict"], predictedRet.df[,"actual"])
colnames(predictData) = c("date", "sym", "predict", "actual")

meanData = cbind(predictedRet.df[,metaIx], -predictedRet.df[,"retMu"], predictedRet.df[,"actual"])
colnames(meanData) = c("date", "sym", "mean", "actual")

predictTop10.z = calcPortTopDecile(predictData, numDecile)
perfectTop10.z = calcPortTopDecile(predictedRet.df[,c("date", "sym", "actual", "actual")], numDecile)
# The mean is negatively correlated with the future return, so change the sign of the predicted
# mean return.

meanTop10.z = calcPortTopDecile(meanData, numDecile)
set.seed(2677)
randTop10.z = calcPortRandDecile(predictedRet.df[,c("date", "sym", "actual")], numDecile)

predictIR = infoRatio(predictTop10.z, sp500Ret)
perfectIR = infoRatio(perfectTop10.z, sp500Ret)
meanIR = infoRatio(meanTop10.z, sp500Ret)
randIR = infoRatio(randTop10.z, sp500Ret)

IR.v = c(perfectIR, predictIR, meanIR, randIR)
ix = order(IR.v, decreasing=T)
IR = matrix(IR.v[ix], ncol=1)
rownames(IR) = c("Perfect IR", "OLS Predict IR", "Mean Predict IR", "Random IR")[ix]
colnames(IR) = c("IR")
@

The results so far suggest that a linear model of seven value factors predicts future returns only slightly better than a random predictor. These results are derived from time series statistics.

The performance of portfolios based on the linear models described above is another view of the same information. Perhaps because portfolios are investment models, the results can be more stark that the statistics.

To investigate portfolio performance, a portfolio is constructed each quarter, from \Sexpr{min(qtrDates)} to \Sexpr{max(qtrDates)}. The portfolio consists of the S\&P 500 stocks that have sufficient history to construct the linear models. The history requirement reduced the number of stocks from 500 to an average of about \Sexpr{muLen}. 

\subsection*{Long Only Portfolios}

Long only portfolios are created by ranking the stocks on the basis of a forecasting factor. The top ten percent of the stocks (\Sexpr{numDecile}.) are chosen to create an equally weighted portfolio. The four forecasting factors are listed below:

\begin{enumerate}
\item Perfect portfolio: the stocks are sorted on actual return and the top 10 percent (\Sexpr{numDecile}) are used to construct the long portfolio. The is a portfolio constructed with perfect foresight.
\item Portfolio based on the 7-factor least squares predictor: the least squares predictor is multiplied by -1 (since it is negatively correlated with actual returns - see Figure \ref{fig:plotCorWithRet}) and the stocks are sorted on this predictor. A equally weighted portfolio is formed from the top 10 percent (\Sexpr{numDecile} stocks).
\item Mean predictor portfolio: the mean return is multiplied by -1 and the stock are sorted on this predictor. An equally weighted portfolio is formed from the top 10 percent (\Sexpr{numDecile} stocks).
\item Random portfolio: \Sexpr{numDecile} stocks are selected using a uniform random distribution over the number of stocks in the portfolio.
\end{enumerate}

The portfolios results do not account for transaction costs.

The quarterly information ratio for the four portfolios is shown in table \ref{irtable}:

<<IRTable, echo=FALSE, results='asis'>>=
t = xtable(IR, label='irtable', caption='Portfolio IR', digits=4, 
           display=c("s", "f") )
print.xtable(t, include.rownames=T, size="\\small", floating=T, type="latex")
@

As suggested by Figure \ref{fig:plotCorWithRet}, the mean return is a stronger predictor than the value factor linear model.

A cumulative return plot is shown in \ref{fig:decilePortCumReturn} below. This plot shows the fate of one dollar invested in one of three portfolios: the S\&P 500, the linear predictor and mean predictor portfolios.

<<decilePortCumReturn, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Return for one dollar invested in three portfolios", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
sp500Region = sp500Ret[index(predictTop10.z)]
r = cbind(sp500Region, predictTop10.z, meanTop10.z)
colnames(r) = c("S&P 500", "OLS Predict", "Mean Predict")
par(mfrow=c(1,1))
chart.CumReturns(r, wealth.index=T, main="Long Only Cumulative Portfolio Return", col=c("black", "blue", "magenta"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

\subsection*{Market Neutral Portfolios}

Three 50/50, long/short, market neutral, portfolios are constructed with an equal number of stocks in the long and short positions (this is not strictly a market neutral portfolio since the dollar values of the positions are not equalized.) The methods for constructing the portfolios are listed below:

\begin{enumerate}
\item Sort the stocks by the return predicted by the linear model, times -1. Construct a long position from the top 10 percent (\Sexpr{numDecile}) of the stocks and a short position from the bottom 10 percent.
\item Sort the stocks by the return predicted by the mean return times -1. Construct a long position from the top 10 percent (\Sexpr{numDecile}) of the stocks and a short position from the bottom 10 percent.
\item Using a uniform random distribution, randomly select \Sexpr{numDecile} stocks for a long position and \Sexpr{numDecile} stocks for a short position.
\end{enumerate}

The portfolios results do not account for transaction costs.

<<portfolioLongShortCalc, echo=FALSE, cache=TRUE>>=
predictLongShort.z = calcLongShortDecile(predictedRet.df[,c("date", "sym", "predict", "actual")], numDecile)
# The mean is negatively correlated with the future return, so change the sign of the predicted
# mean return.
meanLongShort.z = calcLongShortDecile(meanData, numDecile)
randLongShort.z = calcRandLongShort(predictedRet.df[,c("date", "sym", "actual")], numDecile)   
interest.z = zoo(interest.df[,"FedFunds"])
index(interest.z) = as.Date(strptime(as.vector(interest.df[,"date"]), format="%Y-%m-%d", tz="EST"))
interestRegion = interest.z[index(predictLongShort.z)]/4
sp500Region = sp500Ret[index(predictLongShort.z)]
@

Market neutral portfolios are frequently compared to "cash" (the risk free rate, $R_f$). The calculation of the Sharpe ratio is the same as the information ratio, but the risk free rate is used as the benchmark.

\begin{equation}
  \text{Sharpe Ratio}_{year} = \frac{4 \times E[R_p - R_f]}{\sqrt{4} \times \sigma} = \frac{4 \times E[R_p - R_f]}{\sqrt{4 \times var(R_p - R_f)}}
\end{equation}

<<echo=FALSE, results='asis'>>=
sp500Sharpe = sharpeRatio(sp500Region, interestRegion)
predictSharpe = sharpeRatio(predictLongShort.z, interestRegion)
meanSharpe = sharpeRatio(meanLongShort.z, interestRegion)
randSharpe = sharpeRatio(randLongShort.z, interestRegion)
sharpe.v = c(sp500Sharpe, predictSharpe, meanSharpe, randSharpe)
ix = order(sharpe.v, decreasing=T)
sharpe.m = matrix(sharpe.v[ix], nrow=4, ncol=1)
rownames(sharpe.m) = c("S&P 500 SR", "OLS Predict SR", "Mean Predict SR", "Random SR")[ix]
colnames(sharpe.m) = c("Long/Short Annual SR")

t = xtable(sharpe.m, label="sharpe_table", caption="Annual Sharpe Ratio for 50/50 long/short portfolio", xtable=4, digits=4,
           display=c("s", "f"))
print.xtable(t, include.rownames=T, size="\\small", floating=T, caption.placement="bottom")
@

The plot in Figure \ref{fig:LongShortPortCumReturn} shows the cumulative return of a dollar invested in long/short portfolios constructed from the linear predictor and the mean predictor. The cumulative return for the S\&P 500 and the risk free rate is also shown.

The market neutral long/short portfolio constructed using the mean as a predictor for future return has a slightly higher return than the risk free rate, but the return is more volatile.

<<LongShortPortCumReturn, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Return for one dollar invested in three Long/Short portfolios", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=

r = cbind(sp500Region, predictLongShort.z, meanLongShort.z, interestRegion)
colnames(r) = c("S&P 500", "OLS Predict", "Mean Predict", "Risk Free Rate")
par(mfrow=c(1,1))
chart.CumReturns(r, wealth.index=T, main="Long/Short Cumulative Portfolio Return", 
                 col=c("black", "blue", "magenta", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

\section*{Momentum}

Momentum is a strategy that attempts to "buy the winners" (the stocks whose price is increasing) and sell the losers 
(the stocks whose price is decreasing) \cite{Jegadeesh1993, Asness2013}. In this section, value factor and return momentum are investigated.

The S\&P 500 index tends to mirror the overall stock market, which in turn reflects the US (and world) economy. The back test period includes a number of significant events that may drive the stock market: the "dot-com" bubble and crash, the 9/11 terrorist attacks, wars in Afghanistan and Iraq, the financial crash of 2008 and the 
"great recession". The power of these events in effecting the stock market may drive momentum factors and suppress value factors. This section investigates value factor and return momentum factors.

\subsection*{Value Factor Momentum}

The results above show that value factors, by themselves, are weak predictors of future return in the S\&P 500.
One value factor value would probably not be used as the basis for a buy or sell decision. Instead, a portfolio analyst might look at the trend in a value factor or set of value factors. Such a value factor trend can be quantified as value factor momentum. Value factor momentum has been used to forecast future returns in \cite{Guerard2006}.

To calculate the quarterly value factor momentum, the quarterly "return" for the value factors must be calculated:

\begin{equation}
   R_{t} = \left\{
    \begin{array}{l l}
    \frac{f_{t} - f_{t-1}}{f_{t-1}} & \quad \text{if $f_{t-1} \neq 0$}\\
    0 & \quad \text{if $f_{t-1} = 0$}
    \end{array} \right.
\end{equation}

To calculate the quarterly factor momentum, the quarterly factor "returns" are summed over the past year (e.g., four value factor returns).

\subsubsection*{Factor Momentum Correlation with Future Return}

<<factorMomentum, echo=FALSE>>=
factorMoName = "factorMo.RData"
factorMoFile = paste(r_dataDir, factorMoName, sep="/")
if (file.exists(factorMoFile)) {
  load(factorMoFile)
} else {
  factorMo.df = modelReturn(stockUniverse, factors2.df, closeFuse.df, winSize, stepAhead, factorMomentumModel )
  save(factorMo.df, file=factorMoFile)
}
@

The distributions of the correlation between the quarterly factor momentum with the 3-quarter ahead return is shown below. The 3-quarter ahead return is used because of the lag in the availability of the corporate quarterly balance sheet data.

<<calcFactorMoCorr, echo=FALSE, cache=TRUE>>=
factorMoCorr = corData(factorMo.df)
@


<<plotFactorMo,  fig.lp="fig:", echo=FALSE, fig.cap="Correlation of Factor Momentum vs. 3-Quarter Ahead Return", fig.align='left', fig.pos='H', fig.width=10, fig.height=6>>=
plotData = factorMoCorr
colNames = colnames(factorMoCorr)
plotNames = paste(colNames, "Momentum vs. return")
colnames(plotData) = plotNames
par(mfrow=c(1,1), mar=c(5, 4, 3, 3))
chart.Boxplot(plotData, xlab="Correlation", main="Correlation of Factor Momentum vs. 3-Quarter Ahead Return",
              colorset=c("black", "red", "slateblue3", "green4", "magenta", "gold3", "blue", "darkorange4"), lwd=4)
par(mfrow=c(1,1))
@

\subsubsection*{Value Factor Momentum Portfolios}

<<factorMomentumIRCalc, echo=FALSE, cache=TRUE>>=
metaCols = c("date", "sym", "actual")
allCols = colnames(factorMo.df)
metaIx = which(allCols %in% metaCols)
dataCols = allCols[-metaIx]
metaData = factorMo.df[,metaCols]
moIR = rep(0, length(dataCols))
moSharpe = rep(0, length(dataCols))
names(moIR) = dataCols
names(moSharpe) = dataCols
moLongShortPort = c()
for (col in dataCols) {
  block.df = data.frame(metaData, factorMo.df[,col])
  colnames(block.df) = c(metaCols, col)
  moLong.z = calcPortTopDecile(block.df, numDecile)
  moLongShort.z = calcLongShortDecile(block.df, numDecile)
  moIR[col] = infoRatio(moLong.z, sp500Region)
  moSharpe[col] = sharpeRatio(moLongShort.z, interestRegion)
  moLongShortPort = cbind(moLongShortPort, coredata(moLongShort.z))
}
colnames(moLongShortPort) = dataCols
rownames(moLongShortPort) = as.character(as.Date(index(moLongShort.z)))
@

Long only portfolios are formed by sorting the stocks on the basis of the 1-year value factor momentum, for each of the seven value factors. The portfolio is constructed from the top decile of these sorted stocks. The quarterly portfolio IR (relative to the S\&P 500 benchmark) for each of these portfolios is shown below in Table \ref{moIRTable}

<<echo=FALSE, results='asis'>>=
moIRTbl = moIR
ordIx = order(moIR, decreasing=TRUE)
ordMoIRTbl = matrix(moIRTbl[ordIx], ncol=1)
colnames(ordMoIRTbl) = c("Quarterly IR")
rownames(ordMoIRTbl) = names(moIRTbl)[ordIx]
t = xtable(ordMoIRTbl, label='moIRTable', caption='Value Factor Momentum Portfolio IR', digits=4, 
           display=c("s", "f") )
print.xtable(t, include.rownames=T, size="\\small", floating=T, type="latex")
@

Table \ref{moSharpeTable} shows the Sharpe ratio for 50/50 equally weighted long/short factor momentum portfolios, where there is a long position in the top decile and a short position in the bottom decile.

<<echo=FALSE, results='asis'>>=
moSharpeTbl = moSharpe
ordIx = order(moSharpeTbl, decreasing=TRUE)
ordMoSharpeTbl = matrix(moSharpeTbl[ordIx], ncol=1)
colnames(ordMoSharpeTbl) = c("Annual Sharpe Ratio")
rownames(ordMoSharpeTbl) = names(moSharpeTbl)[ordIx]
t = xtable(ordMoSharpeTbl, label='moSharpeTable', 
           caption='50/50 Long/Short Value Factor Momentum Portfolio Sharpe Ratio', digits=4, 
           display=c("s", "f") )
print.xtable(t, include.rownames=T, size="\\small", floating=T, type="latex")
@

As the quarterly IR and the annual Sharpe ratios indicate, the best performing market neutral portfolio uses the EBITDA2EV momentum factor. This factor indicates that the company has earnings momentum over the last year. 

A cumulative return plot of the EBITDA2EV momentum portfolio, the S\&P 500 and the risk free rate is shown  in Figure \ref{fig:momentumReturnSharpe}.

<<momentumReturnSharpe, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Return for one dollar invested in Long/Short momentum portfolios and SP 500", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
sp500Region = sp500Ret[as.Date(rownames(moLongShortPort))]
moEBITDA = cbind(coredata(sp500Region), 
          as.vector(moLongShortPort[,"EBITDA2EV"]), 
          coredata(interestRegion))
rownames(moEBITDA) = rownames(moLongShortPort)
colnames(moEBITDA) = c("S&P 500", "EBITDA2EV Long/Sort", "Rf")
par(mfrow=c(1,1))
chart.CumReturns(moEBITDA, wealth.index=T, main="S&P 500 and EBITDA Long/Short Momentum Portfolio", 
                 col=c("black", "blue", "red"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

The market neutral EBITDA2EV momentum factor portfolio (shown in blue) has significantly better return than the risk free rate (shown in red), with very little draw down. However, transaction costs are ignored. The cost of shorting half of the portfolio and the bid/ask spread would reduce the portfolio return, in practice.

Of all portfolios examined in this paper, the market neutral EBITDA2EV momentum factor portfolio appears to have the best return and the lowest volatility. Figure \ref{fig:ebitdaMomentumDrawdown} shows the drawdown plot for the market neutral EBITDA2EV momentum factor portfolio.

<<ebitdaMomentumDrawdown, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Drawdown for the market neutral EBITDA2EV momentum factor and the SP 500", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
par(mfrow=c(1,1))
moDrawdown = moEBITDA[,-which(colnames(moEBITDA) == "Rf")]
chart.Drawdown(moDrawdown, main="Drawdown S&P 500 and EBITDA Momentum Market Neutral Portfolio", 
               ylab = "Drawdown",
                 colorset=c("black", "blue"), 
                 lwd=4, type="l",
                 legend.loc="bottomleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

The low volatility and drawdown for the market neutral EBITDA2EV portfolio may make it an attractive canidate for a leveraged portfolio (e.g., a portfolio that uses margin borrowing) to boost yield.

\subsection*{Return Momentum}

To provide perspective on the factor momentum portfolios, return momentum is examined in this section. Return momentum has been reported to be negatively correlated with value factors \cite{Israel2012}. In this section a 1-year monthly return momentum factor is examined.

The basic technique for calculating a 1-year return momentum factor is to sum the monthly returns within a 1-year period, from months $1 \dots 11$ skipping the return for month 12. Figure \ref{fig:momentumCalc} shows this calculation. In this case the stock is bought at the end of month 13 and sold at the end of month 16, yielding a quarterly return.

\begin{figure}[h]
\includegraphics[scale=0.5]{momentum_calc.jpg}
\caption{Momentum Calculation Based on Monthly Values}
\label{fig:momentumCalc}
\end{figure}

As Figure \ref{fig:momentumWin} shows, the quarterly return period is non-overlapping.

\begin{figure}[h]
\includegraphics[scale=0.5]{momentum_windows.jpg}
\caption{Sliding Windows Used to Calculate Momentum}
\label{fig:momentumWin}
\end{figure}

To build a portfolio, the stock universe (in this case the S\&P 500 stocks for that quarter) is sorted by the associated momentum factor. The top 10 percent decile portfolio consists of an equally weighted portfolio of the top 10 percent of the sorted stocks. The bottom 10 percent decile consists of the equally weighted portfolio of the bottom 10 percent of the stocks.  

Table \ref{momentumIR} shows that the bottom decile has a higher IR than the top decile, suggesting that there is a negative correlation between the momentum and the future return.  This negative correlation can also be seen in Figure \ref{fig:plotMonthlyCorr}. As a result, the 50/50 long/short portfolio consists of a short position in the top ten percent of the momentum ordered stocks and a long position in the bottom ten percent.

<<momentumCalc, echo=FALSE, cache=TRUE>>=
momentumName = "momentum.RData"
momentumFile = paste(r_dataDir, momentumName, sep="/")
if (file.exists(momentumFile)) {
  load(momentumFile)
} else {
  momentum.df = momentumPortfolios(monthStockUniverse, monthClose.df)
  save(momentum.df, file=momentumFile)
}
@

Table \ref{momentumIR} gives the quarterly information ratio for the top and bottom deciles of long only portfolios and the long/short portfolio.

<<echo=FALSE, results='asis'>>=
moRetLongShort.z = zoo(as.vector(momentum.df[,"longShort"]))
dates = as.Date(as.vector(momentum.df[,"date"]))
index(moRetLongShort.z) = dates
moRetInterestRegion = interest.z[index(moRetLongShort.z)]/4
sp500Region = sp500Ret[index(moRetLongShort.z)]
moRetSharpe = sharpeRatio(moRetLongShort.z, moRetInterestRegion)
moRetTopDecile.z = zoo(as.vector(momentum.df[,"topDecile"]))
index(moRetTopDecile.z) = dates
moRetBotDecile.z = zoo(as.vector(momentum.df[,"botDecile"]))
index(moRetBotDecile.z) = dates
moRetTopDecileIR = infoRatio(moRetTopDecile.z, sp500Region)
moRetBotDecileIR = infoRatio(moRetBotDecile.z, sp500Region)
table = round(matrix(c(moRetTopDecileIR, moRetBotDecileIR, moRetSharpe), ncol=1), 4)
rownames(table) = c("IR Top Decile (quarterly)", 
                    "IR Bottom Decile (quarterly)", 
                    "Long/Short Sharpe Ratio (annual)")
colnames(table) = c("Momentum Portfolio")
t = xtable(table, label="momentumIR", caption="Quarterly IR and Annual Sharpe Ratio for Momentum Portfolios", 
           xtable=4, digits=4)
print.xtable(t, include.rownames=T, size="\\small", floating=T, caption.placement="bottom")
@

The plot in figure \ref{fig:momentumReturn} shows the fate of one dollar invested in the S\&P 500, the top and bottom decile momentum portfolio and the market neutral long/short momentum portfolio.

<<momentumReturn, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Return for one dollar invested in momentum portfolios and SP 500", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
moRet = cbind(sp500Region, moRetTopDecile.z, moRetBotDecile.z, moRetLongShort.z, coredata(moRetInterestRegion))
colnames(moRet) = c("S&P 500", "Momentum (Top 10%)", "Momentum (bottom 10%)", "Long/short", "Rf")
par(mfrow=c(1,1))
chart.CumReturns(moRet, wealth.index=T, main="S&P 500 and Momentum Portfolios", 
                 col=c("black", "green3", "blue", "magenta", "red"), 
                 lwd=4, type="l",
                 legend.loc="bottomleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

Over the back test period, none of the return momentum portfolios out performs the S\&P 500 and in most cases the portfolios do not out perform the risk free rate. In contrast, one of the value factor momentum portfolios, based on the EBITDA2EV factor, out performs both benchmarks (not taking into account transaction costs.)

\section*{Discussion}

The evidence strongly suggests that value factors are not effective for forecasting future returns for the large market capitalization S\&P 500 stocks. The only exception to this is the EBITDA2EV momentum factor in a market neutral portfolio (discussed above).  The value factor results seem to contradict a large literature that reports that portfolios constructed on the basis of value factors do show return in excess of the benchmark.

There may be several reasons for these contradictory results. Most of the studies that report excess returns from value factors use larger stock universes that are not confined to large market capitalization stocks. For example, the work by Fama and French \cite{Fama2004} on the value factor premium uses the entire investible stock universe. These larger stock universes may include stocks where value factors are stronger in forecasting returns. The value factor performance reported in Chapter 5 of cite{Qian2007} is based on the Russell 3000 stock universe.

The effectiveness of value investing relies on recognizing stocks that are under-priced relative to their value characteristics. Value factors may not show forecasting strength in the case of the S\&P 500 stocks because these stocks are closely followed by a larger number of market analysts. A stock that is closely followed in the market may already have the value factors accounted for in the stock price.

The results that are reported here mirror those reported by others. Israel and Moskowitz\cite{Israel2012} investigate the relationship between firm size and the effectiveness of value factors in forecasting returns. They find that as the firm size increases, the strength of value factors diminishes. The negative relation between firm size and return has been widely reported. In \cite{Fama1992}, Fama and French write  
\begin{quotation}
In multivariate tests, the negative relation between size and average return is robust to the inclusion of other variables.
\end{quotation}

<<smallCapValue, echo=FALSE, cache=TRUE>>=
# Vanguard Small-Cap Value ETF (VBR):
# 
#   Vanguard Small-Cap Value ETF seeks to track the performance of a benchmark
#   index that measures the investment return of small-capitalization value
#   stocks.
# 
# Inception: 01/26/2004
VBRStartDate =  as.Date("2004-03-31")
startIx = which(adjDates == VBRStartDate)
VBRDates = adjDates[startIx:length(adjDates)]
VBRCloseDates = closeDates[(which(closeDates == VBRStartDate)):length(closeDates)]
VBRClose = zoo(rep(0, length(VBRDates)))
# Get the close prices from Yahoo
for (i in 1:length(VBRDates)) {
  date = VBRDates[i]
  VBRClose[i] = get.hist.quote(instrument="VBR", start=date, end=date, quote="Close",
                                 provider="yahoo", compression="m", retclass="zoo", quiet=T)
}
# Use the quarterly dates which line up with the other calculations
index(VBRClose) = VBRCloseDates
VBRRet = diff(log(VBRClose))
VBRSP500RetRegion = sp500Ret[VBRCloseDates]
VBR_IR = infoRatio(VBRRet, VBRSP500RetRegion)

# Vanguard Mid-Cap Value Index (VOE) - for Mid-Cap Value stocks
# inception date: 08/17/2006
VOEStartDate = as.Date("2006-09-29")
startIx = which(adjDates == VOEStartDate)
VOEDates = adjDates[startIx:length(adjDates)]
VOECloseDates = closeDates[startIx:length(closeDates)]
VOEClose = zoo(rep(0, length(VOEDates)))
# Get the close prices from Yahoo
for (i in 1:length(VOEDates)) {
  date = VOEDates[i]
  VOEClose[i] = get.hist.quote(instrument="VOE", start=date, end=date, quote="Close",
                                 provider="yahoo", compression="m", retclass="zoo", quiet=T)
}
# Use the quarterly dates which line up with the other calculations
index(VOEClose) = VOECloseDates
VOERet = diff(log(VOEClose))
VOESP500RetRegion = sp500Ret[VOECloseDates]
VOE_IR = infoRatio(VOERet, VOESP500RetRegion)
@

Value factors may be better predictors of future return for small and mid-cap stocks. An in-depth exploration of this hypothesis is beyond the scope of this paper. A brief examination of two Exchange Traded Funds (ETF) value funds does raise the question of whether value factors do actually yield excess return for small and mid-capitalization stocks.

These two funds are the Vanguard Small-Cap Value Fund and the Vanguard Mid-Cap Value Fund.  Relative to the S\&P 500, the quarterly IR for the Vanguard funds is shown in Table \ref{vanguardIR}.

<<echo=FALSE, results='asis'>>=
m = matrix(c(VBR_IR, VOE_IR), ncol=1)
rownames(m) = c("Vanguard Small-Cap Value (VBR)", "Vanguard Mid-Cap Value (VOE)")
colnames(m) = c("Quarterly IR relative to the S&P 500")
t = xtable(m, label="vanguardIR", caption="Quarterly IR for Vanguard Small and Mid-Cap Value Funds", 
           xtable=4, digits=4)
print.xtable(t, include.rownames=T, size="\\small", floating=T, caption.placement="bottom")
@

As the IR values show, the value factors do provide some excess return over the S\&P 500 benchmark. Cumulative investment plots are shown below in Figures \ref{fig:smallCapValuePlot} and \ref{fig:midCapValuePlot}. While there is a slight excess return over the S\&P 500, transaction costs and ETF fees would reduce this advantage to the point where it might disappear. (Note that these funds have different inception dates.)

<<smallCapValuePlot, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Return for one dollar invested in VBR and the SP 500", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
r = cbind(VBRSP500RetRegion, VBRRet)
colnames(r) = c("S&P 500", "VBR")
par(mfrow=c(1,1))
chart.CumReturns(r, wealth.index=T, main="S&P 500 and Vanguard Small-Cap Value (VBR)", 
                 col=c("black", "blue"), 
                 lwd=4, type="l",
                 legend.loc="topleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

<<midCapValuePlot, echo=FALSE, fig.lp="fig:", echo=FALSE, fig.cap="Return for one dollar invested in VOE and the SP 500", fig.align='left', fig.pos='H', fig.width=8, fig.height=5>>=
r = cbind(VOESP500RetRegion, VOERet)
colnames(r) = c("S&P 500", "VOE")
par(mfrow=c(1,1))
chart.CumReturns(r, wealth.index=T, main="S&P 500 and Vanguard Mid-Cap Value (VOE)", 
                 col=c("black", "blue"), 
                 lwd=4, type="l",
                 legend.loc="bottomleft", grid.color="slategrey")
par(mfrow=c(1,1))
@

\subsection*{The Practice of Value Investing}

A company's future prospects are not easily summarized in a few value factors.
A portfolio manager will not only look at the current value factors, but how these factors are changing over time. 
For example, the EBITDA2EV momentum factor in a market neutral portfolio was the only factor that appeared to show excess return over risk free rate and the S\&P 500 benchmark.

Successful value investing frequently relies on human judgement, in addition to the type of quantitative value factor analysis used here. When analyzing a company, a value portfolio manager will take into account factors like the competitive position of a company relative to other companies in the same market.  The power of informed human judgement means that the results presented here do not necessarily condemn value investing. But these results may suggest one reason that the majority of value portfolio managers fail to beat their reference index.

\section*{Reproducible Research}

This paper was written using Knitr\cite{Xie2013}. Knitr combines text, formatted using \LaTeX, with the R code that generates the values, tables and plots in the document. In combination with the data, this paper can be completely regenerated from the Knitr source. The paper and the supporting R code are freely available without limitation.

\newpage

\bibliography{references}{}

\end{document}