Correlation Analysis in Time Series

7 min readAug 21, 2019

1. Introductions

1.1 Concepts

Correlation means that a pair of time series also seen as two variables are related to each other. The relationship could be one of those:

causal: one variable is the result of another one
relevant but not causal: the two variables are relevant, but not causal.

Causality is easy to understand, which means one results to another one. However, you may feel confused about relevant but not causal, how could that happen?

See the following example:

Researches found that the sales volume of Coke, together with the number of drowners are usually increasing or decreasing at the same pace.

Take the sales volume of Coke as Variable A, and the number of drowners as Variable B. The questions will be:

Are A and B relevant?

> Obviously, yes they are relevant.

2. Are A and B causal?

> No, they are not causal. It’s not reasonable to think drinking too much coke will lead to drowning. And after deeper investigation, you will find the truth is that every time the temperature goes increasing, the cokes' sale volume will increase, so do the number of drowners.

Whether the two variables are relevant or not is easier to judge comparing with causal. Generally, when we talk about correlation in this article, we mean the two variables are relevant, but whether they are causal, we don’t know.

1.2 Applications

1.2.1 Similarity Detection

The similarity is a kind of correlation, and it’s a special case.

Suppose this situation:

Providing a lot of time series with one target time series, we need to find a similar series as the target one.

2. Algorithms

2.1 DTW

2.1.1 Introduction

DTW[1] is for Dynamic time wrapping algorithm. DTW could measure the similarity of two sequences, and it could successfully handle the following situations:

Different scale of two series
Time shifting on two series
Different length of two input series

DTW’s time complexity is O(n²). The most popular application of DTW is in Voice Recognization, Music Recognization.

2.1.2 Principles

Preconditions:

Suppose there are two sequences S1 and S2, and:

S1 = (a1, a2, a3, a4, …, an)
S2 = (b1, b2, b3, b4, …, bm)

Which means:

The length of S1 is n, and the length of S2 is m
S1 is consist of point a1, a2 …, S2 is consist of point b1, b2 ….
Generally, ai/bi could be a number or a vector

Details:

Generally, the easiest way to compare two series is to make both of them have the same length and compared each point. There are some methods to make them have the same length, such as cutting longer one or scale the shorter one up, etc. However, these kinds of method will lose some features and cannot handle the shifting or scale in a good way, which performs badly in reality.

The way DTW implements is to calculate a matrix of the two series. As the length of the series is n and m respectively, so the matrix’s size will be n*m. In the matrix, every element Value(i, j) means the distance between ai(in S1) and bj(in S2). Mathematically:

In it, the distance could be the Euclidean Distance, which is

As we could see in the expression, the less distance is, the more similar the two values are.

From the view of the whole matrix, we can find out a path in the matrix which goes from (1,1) to (n,m), and if this path has the minimized cumulative distances, we could take the minimized cumulative distances as the Similarity Level. In application, if we have several series, we could use DTW to find the most similar one according to this.

As you can see in the following graph, the left and top lines are the two series, and our target is to find the path like the one in the right-bottom.

So, how to find out the path?

Let’s give the path a definition like this:

In it,

And the cumulative distance(D) of the path could be denoted as:

v(p,q) is the element of the matrix, and i,p,q does not have any relations, we just give some symbols here for convenience.

Next, let’s continue to talk about the path, it’s noted that the path should satisfy the following conditions:

Boundary: the path should start at (1,1), and ends at (n,m), which is, w1=v1,1 and wk=v_(n,m)
Contiunity&Monotonicity: for any wp−1=va1,b1 and wp=va2,b2, wp should be the (up/right/up-right)neighbour element of wp−1, that’s to say, 0≤a2−a1≤1 and 0≤b2−b1≤1.

There are still a lot of paths who satisfy the above conditions, the minimized cumulative distance, the one we want could be denoted as:

To make up for the longer distance, the above equation could be improved as:

This equation could be figured out with dynamic programming.

Brief Summary:

As we can see, DTW aims to find a path whose cumulative distance is smallest. In fact, different path stands for different “scale up/down” on part of the series, which make the comparison more flexible. Moreover, once we find the smallest distance, we could get the “similarity level” of the two series, which allows us to decide they are similar or not.

2.2 Fast DTW

2.2.1 Introduction

DTW is an excellent algorithm for validating the similarity, but it’s a time-consuming algorithm(O(m∗n)) which makes that bad for business application.

Usually, in business application, the series to compare are not just two but plenty of pairs. Suppose given one target series, and we are going to find the similar series from the left k serries, the time complexity would be O(m∗n∗k).

There are plenty of methods to speed up the original DTW algorithm, I will list some popular ones in this part. And those DTW algorithms are improved with different strategy and views, which may inspire you.

2.2.2 Speed-Up: Decrease the search space

As we know there is a cost matrix in the original DTW and there have already some constraints on it to limit the search space. The constraints could be found in Section DTW. Following graph illustrate the search space, which is occupied by black color:

As you could see in the above graph, the rectangle stands for the cost matrix, and the black blocks mean the search space to calculate the best path.

in Original DTW algorithm, we have to calculate the whole cost matrix to find the best path.
with Optimize 1 and Optimize 2, the search space is decreased, so the DTW could be sped up.

Optimize 1 and Optimize 2 is mentioned in Paper[2], read that paper if you are interested in how it works. But please note that: the two optimizations do limit the search space and speed up the DTW, but it does not guarantee for finding out the best path.

2.2.3 Speed-Up: Data Abstraction

This kind of method takes another view to looked into the path-self. The core thought of this method is to simply(sample) the series first, find one best path and then mapped it to the original matrix.

See the following graph for illustration:

As you can see, this method focuses on finding the best path on a lower resolution(sampled) cost matrix and then maps the path to the original matrix. This does speed up the DTW, but the disadvantage is as the previous one, it does not guarantee the result is the best path.

Read more information about this method, please refer to Paper[2].

2.2.4 Speed-Up: Indexing

Indexing is more like the engineering method, we won’t discuss it too much in this article. You could find out a lot of materials about this by Google.

2.2.5 Integrated Algorithm

In paper[2], integrated methods are provided for the DTW algorithm.

2.3 PPMCC/PCC

2.3.1 Introduction

The PPMCC[3], also called PCC or Person correlation coefficients, is used to test if two variables are linear correlated or not. PPMCC will output a number, which:

belong to (0, 1]: the two variables are positively correlated
equals 0: the two variables are not correlated
belong to[-1, 0): the two variables are negatively correlated

It’s noted that: