Random R Hacks

The R languages in an increasingly popular language for mathematics. Among the reasons that R is popular are:

R is free and open source. R and the popular R development environment can be downloaded without fee. This software runs on Windows, Linux and the Mac.
There are a huge number of mathematics libraries available in R. These include statistics libraries, signal processing libraries and increasingly powerful support for computational finance.
The plotting libraries available in R are very good, especially for 2-D plots. R is not as good (currently) for 3-D plos as Mathematica or MathLab.

I have used R to analyze data and generate plots at work, but mostlhy I use R for computational fiance in the University of Washington Master's degree program. Some of the work that I have done in R can be found on the parent page to this page Topics in Quantitative Finance.

This page publishes various random R hacks, mainly so I'll have a place to look up the code. But I hope that it is useful to you as well.

An R Code for Time Series Smoothing Using a Kalman Filter

Read a page of stock ticker symbols, company names, industry sectors and industries from a URL.

This little snippet of code demonstrates something very cool about R: you can substitute a URL for a file name and R will read from that URL.


# 
# Fetch stock symbols, company names, industry sectors and industries from NASDAQ.
#
# From a Stack Overflow discussion: http://stackoverflow.com/a/6391810
#
# This relies on running a query on the NASDAQ web page. Although this worked in October 2012,
# NASDAQ may change their web page and this may not work in the future.
#

library(utils)
market = "nyse"  # or nasdaq
query = paste("http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=", 
               market, "&render=download", sep="");

data = read.csv(file=query)
col.names = c("Symbol", "Name", "Sector", "industry")
data = data[, col.names]

The result is an R data frame object.

Factors and contrasts

R supports something called a factor, which is a bit like an enumeration type.

firm.size.names = c("small", "med", "large", "TBF")
firm.size = factor( x = firm.size.names, levels=firm.size.names)
(cont = contrasts(firm.size))

Which will print:

      med large TBF
small   0     0   0
med     1     0   0
large   0     1   0
TBF     0     0   1

More on R factor objects

> small = c("AVD", "DEEP", "PZG")
> mid = c("HRS", "WPO", "NCR", "DFS")
> large = c("NOC", "UNH", "CI")
> tbf = c("BAC", "JPM", "GS")
>
> frame = data.frame(list(company=c(small, mid, large, tbf)),
+                    list(cat=c(rep("small", length(small)), 
+                               rep("mid", length(mid)), 
+                               rep("large", length(large)),
+                               rep("tbf", length(tbf)))))
> 
> companyFactor = factor(frame$cat, levels=c("small", "mid", "large",
"tbf"))
> names(companyFactor) = frame$company
> companyContrast = contrasts(companyFactor)
> companyTreatment = contr.treatment( companyFactor )
> companyFactor
  AVD  DEEP   PZG   HRS   WPO   NCR   DFS   NOC   UNH    CI   BAC
  JPM    GS 
small small small   mid   mid   mid   mid large large large   tbf
tbf   tbf 
Levels: small mid large tbf
> companyContrast
      mid large tbf
small   0     0   0
mid     1     0   0
large   0     1   0
tbf     0     0   1
> companyTreatment
      small small mid mid mid mid large large large tbf tbf tbf
small     0     0   0   0   0   0     0     0     0   0   0   0
small     1     0   0   0   0   0     0     0     0   0   0   0
small     0     1   0   0   0   0     0     0     0   0   0   0
mid       0     0   1   0   0   0     0     0     0   0   0   0
mid       0     0   0   1   0   0     0     0     0   0   0   0
mid       0     0   0   0   1   0     0     0     0   0   0   0
mid       0     0   0   0   0   1     0     0     0   0   0   0
large     0     0   0   0   0   0     1     0     0   0   0   0
large     0     0   0   0   0   0     0     1     0   0   0   0
large     0     0   0   0   0   0     0     0     1   0   0   0
tbf       0     0   0   0   0   0     0     0     0   1   0   0
tbf       0     0   0   0   0   0     0     0     0   0   1   0
tbf       0     0   0   0   0   0     0     0     0   0   0   1

Get the Dividend for a Stock

One of my classmates (in the UW Computational Finace program) found this cool hack (thanks Frank!)

The getDividend() function in the quantmod package can be used to calculate dividends from the close and adjusted close price.

require("quantmod") 
s <- "SPY"
# get Closing prices
p <- Cl(getSymbols(s, src='yahoo', auto.assign=FALSE,
from='2010-01-01', 
                   to='2012-06-20'))
# get dividends
div <- getDividends(s, from=start(p), to=end(p))
ydiv <- runSum(div, n=4) # rolling sum of last 4 quarters
# merge and fill in NAs with previous values
out <- na.locf(merge(p, ydiv, all=TRUE)) 
out$yld <- out[, 2] / out[, 1] # "current yield"
tail(out)

From http://stackoverflow.com/questions/10321103/r-is-there-a-package-to-calculate-daily-dividend-yield

Get a vector of trading days

This came up for an application where I needed to calculate the volatility (standard deviation) from day t-20 to t (e.g., a period of 20 trading days). So I needed a list of trading days, where I could just go back 20 elements to find the right trading day.

I searched around and didn't find a way to do this with a simple function call. This is what I came up with.

#
# Get a vector of trading days. These are week days, with holidays removed. The trading days
# are relative to the New York Stock Exchange (NYSE).
#
# startDate and endDate are Date objects. The function returns a date ordered vector of Date
# objects which are the trading days.
#
# This function requires the timeDate package
#
tradingCalendar = function( startDate, endDate)
{
  require(timeDate)
  timeSeq = timeSequence(from=as.character(startDate),
                         to=as.character(endDate), 
                         by="day", format = "%Y-%m-%d",
                         zone = "NewYork", FinCenter = "America/New_York")
  ix = as.logical(isWeekday(timeSeq, wday=1:5))
  tradingDays = timeSeq[ix]
  startYear = as.POSIXlt(startDate)$year + 1900
  endYear = as.POSIXlt(endDate)$year + 1900
  tradingDays.dt = as.Date(tradingDays)
  hol = as.Date(holidayNYSE(startYear:endYear))
  ix = which(tradingDays.dt %in% hol)
  tradingDays.dt = tradingDays.dt[-ix]
  return(tradingDays.dt)
} # tradingCalendar

Ian Kaplan
Last Modified: May, 2013

back to home page