Main Page   Namespace List   Class Hierarchy   Compound List   File List   Compound Members   File Members  

Wavelet compression, determinism and time series forecasting Documentation

Introduction

I originally developed the wavelet packet transform code published here (and on other www.bearcave.com web pages) to support wavelet frequency analysis of varying signals. As I learned about the wavelet packet transform I theorized that it would be useful as a lossless compression algorithm that could be used as an estimator for predictability in a time series.

This the main page for the doxygen documentation for the following C++ source code that implements:

The C++ wavelet algorithms published here are related to a set of Java algorithms also published on bearcave.com (see Wavelets in Java).

C++ was used to implement the wavelet packet algorithms because C++ provides features like operator overloading and templates which proved useful in creating reusable wavelet packet code.

The associated Web pages can be found on Bearcave.com at:

The web pages above provide an overview of the wavelet packet transform. These web pages are meant to be read along with this source code, which is extensively documented.

The wavelet packet transform has a variety of applications, but one of the most common involves data compression. A cost function is associated with the wavelet packet transform which will find the best wavelet "basis" for various regions of the data set. The "best basis" is the wavelet scale that best fits that region. The determination of "best fit" is made relative to a given wavelet and cost function pair.

Assuming that the wavelet function can produce an approxiation of the data set, the end result of both the floating point and integer versions of the wavelet packet transform will be the data set mapped into a set of smaller values. In the case of lossy compression small values can be set to zero, allowing coding compression (e.g., Huffman or "run length" encoding).

The integer version of the wavelet packet algorithm (which uses lossless integer to integer wavelet transforms) is useful for lossless compression. In this case, compression is achieved by the fact that the data set can be mapped to smaller values which can be represented in fewer bits.

The source code included here also includes a modified wavelet packet algorithm, which is used for time/requency analysis.

Given the complexity of wavelets, the web pages above (and the commented source code) fall short in completeness of explanation. The material I've written should be read along with other sources (e.g., books and articles on wavelets). No single reference will cover all the facets of wavelet mathematics and application. There is a vast literature on wavelets. Many articles and books are written by mathematicians, either for other mathematicians or for students of mathematics. One of the few books I've found that leans less toward theory and more toward application is Ripples in Mathematics by Jensen and la-Cour Harbo, Springer Verlag, 2001. The wavelet packet algorithms implemented here are based on Chapter 8 and 9 of this book.

Download

You can download the source code, in tar format, for the wavelet packet transform here (a UNIX tar file, compressed with gzip). The source include Makefiles for both Windows NT (nmake) and UNIX.

This source code contains both the wavelet packet transform code, the modified wavelet packet transform for wavelet time frequency analysis, lossless wavelet compression code and support to read Yahoo equity time series files.

Trees vs. Tables

I have chosen to represent the result of the wavelet packet transform as a dynamically allocated tree. The same data can be represented as a table, where each table row corresponds to a horizontal slice through the tree (e.g., the level basis). I think that the tree representation is more flexible and natural. Dynamic data structures are easily implemented in languages like C++ and Java. This may not be true of MATLAB code.

Overview of the Source Code

The source code published here can be divided into three catagories:

  1. Floating point wavelet packet and modified wavelet packet code. The wavelet packet code uses floating point wavelet algorithms and floating point cost functions (e.g., Shannon and threshold cost functions).

  2. Lossless wavelet packet compression code. This code is supported by integer to integer wavelet transforms. Some test code is included to calculate moving compression windows.

  3. Memory management. The wavelet packet algorithm uses dynamic allocation (e.g., new). To make deallocation fast, memory is allocated from a memory pool. This allows all allocated memory to be deallocated at once. Pool allocation also simplifies the code, since a single deallocation call is made when the data structures are no longer needed.

  4. Data input support for the lossless wavelet compression algorithm. My motivation for developing the lossless wavelet compression code is the application of this algorithm to financial time series (e.g., stock close prices). In this case these are historical data sets downloaded from Yahoo.

Data structure templates

The wavelet packet algorithm makes use of a several general data structure templates:

  1. FIFO_LIST template. This template supports a list with both a list head and a tail tail pointer. Items are added to the end of the list. When read from the front, the items will be read in first-in, first-out order. This is used to build a "queue" which is used in calculating the inverse wavelet packet transform and in constructing the wavelet packet data list that results from the wavelet packet transform.
  2. LIST template. This supports a simple linked list with a single list head pointer.
  3. GrowableArray template. This is a generic growable array that is by the modified wavelet packet algorithm for time frequency analysis.

Wavelet Algorithm Templates

The wavelet algorithms used by the wavelet packet algorithms are based on the wavelet lifting scheme, where thare are one or more predict and update stages. The liftbase template provides a common base class and can be instantiated for the array type and the array element type. The wavelet algorithms themselfs are also templatized to allow them to be used with simple data structures and with an indexible data structure like packcontainer.

Wavelet packet tree code

The packtree and packfreq classes (derived from the common base class packtree_base) build wavelet packet trees. In the case of the packfreq class, the nodes at a given level (a so called level basis) of the tree are ordered by frequency, (moving from left to right).

The wavelet packet algorithm is independent of the actual wavelet algorithm that is used while building the tree. The following wavelet algorithms are included:

The lossless wavelet packet tree code uses integer to integer versions of the Haar and line wavelet algorithms. Another algorithm, sometimes called the TS transform, which is popular in image processing is also included.

Once the wavelet packet tree is constructed a "best basis" algorithm can be applied. The "best basis" is the minimal representation of the data relative to a particular cost function. This code provides cost functions derived from a common base class, costbase.

The packcontainer class allows a sequence of N numbers to be treated as both a contiguous array or as a left half and a right half (each consisting of N/2 data elements).

The wavelet packet and modified wavelet packet algorithms were developed for floating point (e.g., double) arrays (or containers with double elements). The lossless wavelet packet compression code uses integer to integer transforms. In writing this code I have attempted to share as much as possible with the original floating point wavelet packet code. However, in some cases the lossless compression code consists of integer specific versions of the original wavelet packet code. For example, costbase_int is the integer specific cost function base class. The packtree_int and invpacktree_int classes provide integer specific wavelet packet tree support.

In most cases memory is allocated from a memory pool. This approach to dynamic allocation allows memory to be allocated without worrying about deallocation. At a point in the program when the memory is no longer needed, it is all deallocated at once by deallocating the memory pool. The only exception to pool allocation is the GrowableArray class, which uses the standard new and delete operators.

Copyright and Use

You may use this source code without limitation and without fee as long as you include: This software was written and is copyrighted by Ian Kaplan, Bear Products International, www.bearcave.com, 2002.

This software is provided "as is", without any warranty or claim as to its usefulness. Anyone who uses this source code uses it at their own risk. Nor is any support provided by Ian Kaplan and Bear Products International.

Please send any bug fixes or suggested source changes to:

     iank@bearcave.com


Generated at Tue May 27 21:56:16 2003 for Wavelet compression, determinism and time series forecasting by doxygen1.2.8.1 written by Dimitri van Heesch, © 1997-2001