distancematrix.generator.filter_generator

Module Contents

Classes

FilterGenerator

Helper class that provides a standard way to create an ABC using

BoundFilterGenerator

Wrapper around other generators that will replace values in the distance matrix marked as invalid

BoundStreamingFilterGenerator

Wrapper around other generators that will replace values in the distance matrix marked as invalid

Functions

is_not_finite(data, subseq_length)

Marks infinite or nan values as invalid.

distancematrix.generator.filter_generator.is_not_finite(data, subseq_length)

Marks infinite or nan values as invalid.

class distancematrix.generator.filter_generator.FilterGenerator(generator, invalid_data_function=is_not_finite, rb_scale_factor=2.0)

Bases: distancematrix.generator.abstract_generator.AbstractGenerator

Helper class that provides a standard way to create an ABC using inheritance.

prepare_streaming(self, m, series_window, query_window=None)

Create a bound generator that supports streaming data. The generator will need to receive data before any distances can be calculated.

Parameters
  • m – the size of the subsequences used to calculate distances between series and query

  • series_window – number of values to keep in memory for series, the length of the horizontal axis of the distance matrix will be equal to (series_window - m + 1)

  • query_window – number of values to keep in memory for query, the length of the vertical axis of the distance matrix will be equal to (query_window - m + 1), or None to indicate a self-join.

Returns

a bound generator that supports streaming

prepare(self, m, series, query=None)

Create a bound non-streaming generator for the given series and query sequences.

Parameters
  • m – the size of the subsequences used to calculate distances between series and query

  • series – 1D array, used as the horizontal axis of a distance matrix

  • query – 1D array, used as the vertical axis of a distance matrix, or None to indicate a self-join

Returns

a bound generator

class distancematrix.generator.filter_generator.BoundFilterGenerator(generator, m, num_q_subseq, invalid_series_subseq, invalid_query_subseq)

Bases: distancematrix.generator.abstract_generator.AbstractBoundGenerator

Wrapper around other generators that will replace values in the distance matrix marked as invalid by positive infinity. It can also perform a data pre-processing step before data reaches the wrapped generator, by setting values marked as invalid to zero, this can be useful for example to remove nan values for a generator that does not support nan values.

calc_diagonal(self, diag)

Calculates all distances of the distance matrix diagonal with the given index for the available data.

If diag is zero, this calculates the main diagonal, running from the top left to the bottom right. Any positive value represents a diagonal above the main diagonal, and a negative value represents a diagonal below the main diagonal.

Parameters

diag – the diagonal index

Returns

1D array, containing all values

calc_column(self, column)

Calculates all distances of the distance matrix on the specified column for the available data.

Parameters

column – the column index (starting at 0)

Returns

1D array, containing all values

class distancematrix.generator.filter_generator.BoundStreamingFilterGenerator(generator, m, num_s_subseq, num_q_subseq, invalid_data_function, rb_scale_factor)

Bases: distancematrix.generator.filter_generator.BoundFilterGenerator, distancematrix.generator.abstract_generator.AbstractBoundStreamingGenerator

Wrapper around other generators that will replace values in the distance matrix marked as invalid by positive infinity. It can also perform a data pre-processing step before data reaches the wrapped generator, by setting values marked as invalid to zero, this can be useful for example to remove nan values for a generator that does not support nan values.

append_series(self, values)

Adds more data points to the series sequence (and the query in case of a self-join). Older data points will be dropped if the series would become larger than the foreseen capacity.

Parameters

values – 1D array, the new values to append to the series

Returns

None

append_query(self, values)

Adds more data points to the query sequence. Older data points will be dropped if the query would become larger than the foreseen capacity.

Parameters

values – 1D array, the new values to append to the query

Returns

None

calc_column(self, column)

Calculates all distances of the distance matrix on the specified column for the available data.

Parameters

column – the column index (starting at 0)

Returns

1D array, containing all values