This web page is associated with a paper titled Value Factors Do Not Forecast Returns for S&P 500 Stocks. The work described in this paper investigates how effective corporate value factors can be in selecting S&P 500 stocks for an investment portfolio. As the title of this web page suggests, the answer is "not very effective".
The PDF for the paper can be found here.
This paper grew out of my Masters thesis presentation for my Masters degree in Computational Finance and Risk Management through the University of Washington. I gave my thesis presentation on November 22, 2013 and was awarded a Masters degree in December 2013. I continued to refine the work that I presented, which resulted in this paper.
This paper is reproducible research. The paper is written using Knitr, which combines R and the typesetting language LaTex. Using R and RStudio the PDF for the paper can be regenerated from the document source and data. The code to generate every table and diagram in the document is included in the Knitr source code.
The data used in this paper consists of approximately fifteen years of corporate quarterly report data. Through my Masters program I had access to the Wharton Research Data Service (WRDS). The CRSP/Compustat data sets, which I used in this work, are available from WRDS. Unfortunately redistribution of this data is prohibited, so I cannot include the data here.
Working with WRDS and the CRSP/Compustat data and cleaning it up so that it can be used in historial back tests is a very time consuming process. I have tried to document my work with this data so that the data set can be reproduced by anyone with access to this data. See The Wharton Research Data Service (WRDS) data set and Factor Model Factors
Open Source Corporate Value Factor Data |
---|
The Quandl site publishes corporate factor (fundamental) data for approximately 15,000 stocks. The fundamental data can be accessed via a Web API. Quandl does not have as much history as WRDS, but for the time period where they have data they are an attractive alternative to the CRSP/Compustat data which is both costly and missing values. |
Diagram for the paper (Open office with jpegs generated via screen capture): /finance/thesis_project/diagrams
Root source directory: http://www.bearcave.com/finance/thesis_project/r_code
factor_analysis.Rnw
This is the Knitr source code for the paper. The document consists of
executable R code and LaTex formatted text.
references.bib
These are the bibtex formatted
references that are used in the paper (factor_analysis.Rnw).
s_and_p.r
This R code computes the quarterly S&P 500 constituents from the
Compustat data. See Building the S&P 500 Constituents
s_and_p_monthly.r
The S&P 500 constituent data is available quarterly. This R code is
similar to s_and_p.r, but it fills in the months between the
quarterly boundaries. This code supports the papers section on monthly
linear models.
fix_compustat_data.r
In most cases the Compustat data must be preprocessed to deal with
missing values and other issues. This R code does this preprocessing
so that the data could be used in the paper. See The Wharton Research
Data Service (WRDS) data set and Factor Model Factors which
discusses the CRSP/Compustat data set and how it must be preprocessed
in order to calculate the value factors used in the paper.
factor_calc.r
This R code calculates the value factors from the preprocessed
CRSP/Compustat data.
monthly_factor_calc.r
This R code is similar to factor_calc.r, but it fills in
factor values using monthly close prices.
Miscellaneous support code
fix_interest_rate.r
This R code cleans up the "risk free" interest rate data downloaded
from CRSP/Compustat.
factor_distribution.r
The WRDS quarterly Corporate Factor data has hundreds of values that
can be selected for download. Many of these factors are either
unpopulated or sparsely populated. By downloading all of the factors
and running this code on the result, an analysis of the factor density
can be performed. This was critical in understanding how to calculate
the value factors used in the paper. The results are displayed on
The Wharton Research
Data Service (WRDS) data set and Factor Model Factors.
Ian Kaplan
March 2014
Last revised: