wavelet_util
Class noise_filter

java.lang.Object
  |
  +--wavelet_util.plot
        |
        +--wavelet_util.noise_filter

public class noise_filter
extends plot

The objective in filtering is to remove noise while keeping the features that are interesting.

Wavelets allow a time series to be examined at various resolutions. This can be a powerful tool in filtering out noise. This class supports the subtraction of gaussian noise from the time series.

The identification of noise is complex and I have not found any material that I could understand which discussed noise identification in the context of wavelets. I did find some material that has been difficult and frustrating. In particular Image Processing and Data Analysis: the multiscale approach by Starck, Murtagh and Bijaoui.

If the price of a stock follows a random walk, its price will be distributed in a bell (gaussian) curve. This is one way of stating the concept from financial theory that the daily return is normally distributed (here daily return is defined as the difference between yesterdays close price and today's close price). Movement outside the bounds of the curve may represent something other than a random walk and so, in theory, might be interesting.

At least in the case of the single test case used in developing this code (Applied Materials, symbol: AMAT), the coefficient distribution in the highest frequency is almost a perfect normal curve. That is, the mean is close to zero and the standard deviation is close to one. The area under this curve is very close to one. This resolution approximates the daily return. At lower frequencies the mean moves away from zero and the standard deviation increases. This results is a flattened curve, whose area in the coefficient range is increasingly less than one.

The code in this class subtracts the normal curve from the coefficients at each frequency up to some minimum. This leaves only the coefficients above the curve which are used to regenerate the time series (without the noise, in theory). This filter removes 50 to 60 percent of the coefficients.

Its probably worth mentioning that there are other kinds of noise, most notably Poisson noise. In theory daily data tends to show gaussian noise, while intraday data would should Poisson noise. Intraday Poisson noise would result from the random arrival and size of orders.

This function has two public methods:

  1. n filter_time_series, which is passed a file name and a time series

  2. gaussian_filter which is passed a set of Haar coefficient spectrum and an array allocated for the noise values. The noise array will be the same size as the coefficient array.


    1. Inner Class Summary
      private  class noise_filter.bell_info
                Bell curve info: mean, sigma (the standard deviation)
      private  class noise_filter.bin
                 A histogram bin
      private  class noise_filter.point
                 The point class represents a coefficient value so that it can be sorted for histogramming and then resorted back into the orignal ordering (e.g., sorted by value and then sorted by index)
      private  class noise_filter.sort_by_index
                Sort an array of point objects by the index field.
      private  class noise_filter.sort_by_val
                Sort an array of point objects by the val filed.
       
      Constructor Summary
      noise_filter()
                 
       
      Method Summary
      private  noise_filter.bin[] alloc_bins(int num_bins, double low, double high)
                Allocate an array of histogram bins that is num_bins in length.
      private  noise_filter.point[] alloc_points(double[] coef, int start, int end, noise_filter.bell_info info)
                 Allocate and initialize an array of point objects.
      private  noise_filter.bin[] calc_histo(noise_filter.point[] pointz, int num_bins)
                 Calculate the histogram of the coefficients using num_bins histogram bins
      (package private)  java.lang.String class_name()
                 
      private  int filter_spectrum(double[] coef, int start, int end, double[] noise)
                 This function is passed the section of the Haar coefficients that correspond to a single spectrum.
       void filter_time_series(java.lang.String file_name, double[] ts)
                Calculate the Haar tranform on the time series (whose length must be a factor of two) and filter it.
       void gaussian_filter(double[] coef, double[] noise)
                 This function is passed a set of Haar wavelet coefficients that result from the Haar wavelet transform.
      private  void histogram(noise_filter.bin[] binz, noise_filter.point[] pointz)
                 Build a histogram from the sorted data in the pointz array.
      private  double normal_interval(noise_filter.bell_info info, double low, double high, int num_points)
                 normal_interval
      private  void normalize_to_zero(double[] noise)
                Normalize the noise array to zero by subtracting the smallest value from all points.
      private  int subtract_gauss_curve(noise_filter.bin[] binz, noise_filter.bell_info info, int total_points, double[] noise)
                 Subtract the gaussian (or normal) curve from the histogram of the coefficients.
      private  void zero_points(noise_filter.bin b, int num_zero, double[] noise)
                 Set num_points values in the histogram bin b to zero.
       
      Methods inherited from class wavelet_util.plot
      OpenFile
       
      Methods inherited from class java.lang.Object
      , clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
       

      Constructor Detail

      noise_filter

      public noise_filter()
      Method Detail

      class_name

      java.lang.String class_name()
      Overrides:
      class_name in class plot

      histogram

      private void histogram(noise_filter.bin[] binz,
                             noise_filter.point[] pointz)

      Build a histogram from the sorted data in the pointz array. The histogram is constructed by appending a point object to the the bin vals Vector if the value of the point is between b[i].start and b[i].start + step.


      alloc_bins

      private noise_filter.bin[] alloc_bins(int num_bins,
                                            double low,
                                            double high)
      Allocate an array of histogram bins that is num_bins in length. Initialize the start value of each bin with a start value calculated from low and high.

      calc_histo

      private noise_filter.bin[] calc_histo(noise_filter.point[] pointz,
                                            int num_bins)

      Calculate the histogram of the coefficients using num_bins histogram bins

      The Haar coefficients are stored in point objects which consist of the coefficient value and the index in the point array.

      To calculate the histogram, the pointz array is sorted by value. After it is histogrammed it is resorted by index to return the original ordering.


      alloc_points

      private noise_filter.point[] alloc_points(double[] coef,
                                                int start,
                                                int end,
                                                noise_filter.bell_info info)

      Allocate and initialize an array of point objects. The size of the array is end - start. Each point object in the array is initialized with its index and a Haar coefficient (from the coef array).

      Since the allocation code has to iterate through the coefficient spectrum the mean and standard deviation are also calculated to avoid an extra iteration. These values are returned in the bell_info object.


      normal_interval

      private double normal_interval(noise_filter.bell_info info,
                                     double low,
                                     double high,
                                     int num_points)

      normal_interval

      Numerically integreate the normal curve with mean info.mean and standard deviation info.sigma over the range low to high.

      There normal curve equation that is integrated is:

      f(y) = (1/(s * sqrt(2 * pi)) e-(1/(2 * s2)(y-u)2
      

      Where u is the mean and s is the standard deviation.

      The area under the section of this curve from low to high is returned as the function result.

      The normal curve equation results in a curve expressed as a probability distribution, where probabilities are expressed as values greater than zero and less than one. The total area under a normal curve with a mean of zero and a standard deviation of one is is one.

      The integral is calculated in a dumb fashion (e.g., we're not using anything fancy like simpson's rule). The area in the interval xi to xi+1 is

      area = (xi+1 - xi) * g(xi)
      

      where the function g(xi) is the point on the normal curve probability distribution at xi.

      Parameters:
      info - This object encapsulates the mean and standard deviation
      low - Start of the integral
      high - End of the integral
      num_points - Number of points to calculate (should be even)

      zero_points

      private void zero_points(noise_filter.bin b,
                               int num_zero,
                               double[] noise)

      Set num_points values in the histogram bin b to zero. Or, if the number of values is less than num_zero, set all values in the bin to zero.

      The num_zero argument is derived from the area under the normal curve in the histogram bin interval. This area is a fraction of the total curve area. When multiplied by the total number of coefficient points we get num_zero.

      The noise coefficients are preserved (returned) in the noise array argument.


      subtract_gauss_curve

      private int subtract_gauss_curve(noise_filter.bin[] binz,
                                       noise_filter.bell_info info,
                                       int total_points,
                                       double[] noise)

      Subtract the gaussian (or normal) curve from the histogram of the coefficients. This is done by integrating the gaussian curve over the range of a bin. If the number of items in the bin is less than or equal to the area under the curve in that interval, all items in the bin are set to zero. If the number of items in the bin is greater than the area under the curve, then a number of bin items equal to the curve area is set to zero.

      The area under a normal curve is always less than or equal to one. So the area returned by normal_interval is the fraction of the total area. This is multiplied by the total number of coefficients.

      The function returns the number of coefficients that are set to zero (e.g., the number of coefficients that fell within the gaussian curve). These coefficients are the noise coefficients. The noise coefficients are returned in the noise argument.


      filter_spectrum

      private int filter_spectrum(double[] coef,
                                  int start,
                                  int end,
                                  double[] noise)

      This function is passed the section of the Haar coefficients that correspond to a single spectrum. It compares this spectrum to a gaussian curve and zeros out the coefficients within the gaussian curve.

      The function returns the number of points filtered out as the function result. The noise spectrum is also returned in the noise argument.


      normalize_to_zero

      private void normalize_to_zero(double[] noise)
      Normalize the noise array to zero by subtracting the smallest value from all points.

      gaussian_filter

      public void gaussian_filter(double[] coef,
                                  double[] noise)

      This function is passed a set of Haar wavelet coefficients that result from the Haar wavelet transform. It applies a gaussian noise filter to each frequency spectrum. This filter zeros out coefficients that fall within a gaussian curve. This alters the input data (the coef array).

      The coef argument is the input argument and contains the coefficients. The noise argument is an output argument and contains the coefficients that have been filtered out. This allows a noise spectrum to be rebuilt.


      filter_time_series

      public void filter_time_series(java.lang.String file_name,
                                     double[] ts)
      Calculate the Haar tranform on the time series (whose length must be a factor of two) and filter it. Then calculate the inverse transform and write the result to a file whose name is file_name. A noise spectrum is written to file_name_noise.