Code:
Write YOUR OWN, INDIVIDUAL program that has
the following functionalily. Taking code available online will
be considered cheating.
You can write your code in any high level programming language
(C, C++, Java, ...).
Given
a collection of time series CS,
a collection of time series CT to be used as templates,
a non-negative integer value r,
and
a non-negative real value epsilon,
- Given a time series S (in the time domain), it transforms the time series into a
sequence FS (in the frequency domain) using the Discrete Fourier Transformation
and truncates this resulting sequence by keeping just the first r
coefficients.
- Given two sequences, it computes the Euclidean distance between the sequences.
- For each template T in CT, it finds all the sequences in CS that are within
an epsilon distance from T. This should be done following this procedure:
- All time series in CS and all the templates in CT are transformed and
truncated as described above.
- For each transformed template FT:
- For each transformed time series FS:
- Compute the Euclidean distance between FT and FS.
- If this distance is less than or equal to epsilon
- Compute the Euclidean distance between T (the template
before the transformation) and S (the time series before the
transformation).
- If this distance is less than or equal to epsilon
then output the fact that T and S are "similar".
- For each time series in CS, it finds all the other time series in CS that are
similar to it (that is that are within an epsilon distance form it)
using the procedure sketched above.
This is what the paper calls "All-pairs query (or spatial join): Given N sequences,
find the pairs of sequences that are within epsilon of each other").
You DO NOT need to use any sophisticated data structures to store sequences
before or after they are transformed. Simple data structures are acceptable for
the scope of this project.