Main Page   Namespace List   Class Hierarchy   Compound List   File List   Compound Members   File Members  

yahooTS Class Reference

Process historical equity (stock) data downloaded from finance.yahoo.com. More...

#include <yahooTS.h>

List of all members.

Public Types

enum  dataKind {
  badEnum, Open, High, Low,
  Close, Volume, lastEnum
}

Public Methods

 yahooTS ()
 yahooTS (const char *p)
const double* getTS (const char *fileName, double *a, size_t &N, dataKind kind) const
 Read a Yahoo equity time series from a file. More...

void path (const char *p)
const char* path ()

Private Methods

const char* getStr_ (char *&line, char *buf, size_t bufSize) const
 Copy from the input string until either the end of the string (e.g., the null) is reached or a comma is found. More...

void parseVals_ (char *line, double *vals, const size_t n) const
 Parse a comma separated line of values into a vector of doubles. More...

const double getValue_ (char *line, const yahooTS::dataKind kind) const
 A data line from a Yahoo historical data file consists of a set of comma separated values:. More...


Private Attributes

const char* path_


Detailed Description

Process historical equity (stock) data downloaded from finance.yahoo.com.

The data is downloaded in "spread sheet" format from the historical data page. There is probably some limitation on using this data (e.g., no commercial use and no resale) so use at your own risk.

The format of the file is ASCII. The first line lists the title for each of the fields in the file. The titles and the fields are comma separated.

This class is specific to the data format that was downloaded from Yahoo at the time. More general code could be written to easily account for changing formats. However, I just wanted to extract the data.

The Yahoo data has two places of accuracy, presumably reflecting decimalization. The equity time series are adjusted for splits and dividends, backward in time from the most recent time in the time series. This can cause problems over long periods of time since at some point a stock that pays dividends will pay all of it's worth out in dividends and the value will become negative (as a result, a reinvest is a better choice).

The format for the data is:

<title line> <time series line>+

(e.g,. a titled followed by one or more time series lines).

The title line consists of six comma separated strings (e.g., "Date,Open,High,Low,Close,Volume"). Time time series lines have the values suggested in the title. For my current purposes I am not interested in date values, so these are ignored. All values are returned as vectors of doubles, although volume is an unsigned integer value.

Definition at line 79 of file yahooTS.h.


Member Enumeration Documentation

enum yahooTS::dataKind
 

Enumeration values:
badEnum  
Open  
High  
Low  
Close  
Volume  
lastEnum  

Definition at line 86 of file yahooTS.h.

00086                { badEnum,
00087                  Open,
00088                  High,
00089                  Low,
00090                  Close,
00091                  Volume,
00092                  lastEnum } dataKind;


Constructor & Destructor Documentation

yahooTS::yahooTS ( ) [inline]
 

Definition at line 94 of file yahooTS.h.

00095   { 
00096     path_ = 0;
00097   };

yahooTS::yahooTS ( const char * p ) [inline]
 

Definition at line 98 of file yahooTS.h.

00098 : path_(p) {}


Member Function Documentation

const char * yahooTS::getStr_ ( char *& line,
char * buf,
size_t bufSize ) const [private]
 

Copy from the input string until either the end of the string (e.g., the null) is reached or a comma is found.

Parameters:
line   A reference to a pointer to the input string. This pointer is incremented until either the end of string or a comma is encountered. When this function returns line will either point to the end of the string or a character following a comma.
buf   A buffer into which the string will be copied.
bufSize   The size of buf.

Definition at line 56 of file yahooTS.cpp.

Referenced by parseVals_().

00059 {
00060   const char *rtnPtr = 0;
00061   if (line != 0) {
00062     for (size_t charCnt = 0; charCnt < bufSize-1 && *line != '\0'; charCnt++) {
00063       if (*line == ',') {
00064         line++;
00065         break;
00066       }
00067       else {
00068         buf[charCnt] = *line++;
00069       }
00070     }
00071     
00072     buf[charCnt] = '\0';
00073     if (charCnt > 0)
00074     {
00075       rtnPtr = buf;
00076     }
00077   }
00078   return rtnPtr;
00079 } // getStr_

const double * yahooTS::getTS ( const char * fileName,
double * a,
size_t & N,
const yahooTS::dataKind kind ) const
 

Read a Yahoo equity time series from a file.

Yahoo allows historical equity data to be downloaded in "spread sheet" format. In this format there is a title line, listing the data columns (e.g., date, open, high, low, close and volume). Following the title line are comma separated values. In reading this Yahoo data file, the first line is skipped.

The Yahoo data values are listed from most recent to oldest. In the data vector returned, a[0] will be the oldest and a[N-1] will be the most recent.

Parameters:
fileName   name of the file containing the time series. This file will be prefixed by the path in the class variable path_.
a   A pointer to a vector of doubles that will be initialized with values from fileName.
N   Number of doubles that will fit in a N is an input/output variable. The value returned in N will be the actual number of values read.
kind   The kind of time series to fetch from fileName (e.g., open, high, low, close, volume.

Returns:
If there was no error reading data from fileName the function returns a pointer to the initialized array (e.g., the argument a). If the data could not be read, a null pointer (0) is returned.

Definition at line 208 of file yahooTS.cpp.

Referenced by main().

00212 {
00213   const double *rtnPtr = 0;
00214   char fullPath[512];
00215   size_t freePath = sizeof( fullPath );
00216   FILE *fptr;
00217 
00218   if (path_ != 0) {
00219     strncpy( fullPath, path_, freePath-1 );
00220     freePath = freePath - strlen( fullPath );
00221   }
00222   strncat( fullPath, fileName, freePath-1 );
00223   fptr = fopen( fullPath, "r" );
00224   if (fptr != 0) {
00225     char line[512];
00226     size_t lineSize = sizeof( line );
00227     int ix = N-1;
00228 
00229     if (fgets( line, lineSize, fptr ) != 0) {
00230       rtnPtr = a;
00231       while (fgets( line, lineSize, fptr ) != 0) {
00232         if (ix >= 0) {
00233           a[ix] = getValue_( line, kind );
00234           ix--;
00235         }
00236         else {
00237           break;
00238         }
00239       } // while
00240     }
00241     else {
00242       fprintf(stderr, "getTS: title line expected\n");
00243     }
00244     ix++;
00245     N = N - ix;
00246   }
00247   else {
00248     const char *error = strerror( errno );
00249     fprintf(stderr, "getTS: Error opening %s: %s\n", fullPath, error );
00250   }
00251 
00252   return rtnPtr;
00253 } // getTS

const double yahooTS::getValue_ ( char * line,
const yahooTS::dataKind kind ) const [private]
 

A data line from a Yahoo historical data file consists of a set of comma separated values:.

    date,open,high,low,close,volume

This function is passed a Yahoo data line and a kind value which indicates which value to return. Date is is ignored, so the value of kind should be one of: Open, High, Low, Close, Volume.

Definition at line 151 of file yahooTS.cpp.

Referenced by getTS().

00153 {
00154   double retval = 0;
00155 
00156   if (kind > badEnum && kind < lastEnum) {
00157     const size_t NUM_VALS = 5;
00158     double vals[ NUM_VALS ];
00159 
00160     parseVals_( line, vals, NUM_VALS );
00161 
00162     size_t ix = (size_t)kind - 1;
00163     if (ix < NUM_VALS) {
00164       retval = vals[ix];
00165     }
00166   }
00167 
00168   return retval;
00169 } // getValue

void yahooTS::parseVals_ ( char * line,
double * vals,
const size_t n ) const [private]
 

Parse a comma separated line of values into a vector of doubles.

The comma separated values are:

Date,Open,High,Low,Close,Volume

The date value is skipped.

Parameters:
line   A pointer to a line of Yahoo historical data
vals   A vector of doubles that the values in the historical data line will be stored.
n   The number of elements in vals

Definition at line 101 of file yahooTS.cpp.

Referenced by getValue_().

00104 {
00105   char buf[128];
00106   const char *ptr;
00107 
00108   // skip the date
00109   ptr = getStr_( line, buf, sizeof( buf ) );
00110   if (ptr == 0) {
00111     fprintf(stderr, "parseVals: date expected\n" );
00112     return;
00113   }
00114 
00115   // get the Open, High, Low, Close and Volume values
00116   size_t cnt = 0;
00117   for (dataKind kind = Open; 
00118        kind <= Volume && cnt < n; 
00119        kind = (dataKind)((size_t)kind + 1)) {
00120 
00121     ptr = getStr_( line, buf, sizeof( buf ) );
00122     if (ptr == 0) {
00123       fprintf(stderr, "parseVals: value expected\n");
00124       return;
00125     }
00126 
00127     double v;
00128 
00129     sscanf( buf, "%lf", &v );
00130     vals[cnt] = v;
00131     cnt++;
00132   }
00133 
00134 } // parseVals_

const char * yahooTS::path ( ) [inline]
 

Definition at line 106 of file yahooTS.h.

00106 { return path_; }

void yahooTS::path ( const char * p ) [inline]
 

Definition at line 105 of file yahooTS.h.

00105 { path_ = p; }


Member Data Documentation

const char * yahooTS::path_ [private]
 

Definition at line 82 of file yahooTS.h.


The documentation for this class was generated from the following files:
Generated at Tue May 27 21:56:17 2003 for Wavelet compression, determinism and time series forecasting by doxygen1.2.8.1 written by Dimitri van Heesch, © 1997-2001