Singular spectrum analysis for time series

This notebook illustrates how to perform a singular spectrum analysis (SSA) with the pyActigraphy package.

Briefly, the SSA is related to the Principal Component Analysis (PCA) for time series. The input signal is decomposed into additive components and their relative importance (i.e. variance) is quantified.

More informations about the SSA methodology:

In the context of Chronobiology, a nice overview of various technics (including SSA) is given in:

  • Fossion, R., Rivera, A. L., Toledo-Roy, J. C., & Angelova, M. (2018). Quantification of Irregular Rhythms in Chronobiology: A Time- Series Perspective. In Circadian Rhythm - Cellular and Molecular Mechanisms. InTech. https://doi.org/10.5772/intechopen.74742

Imports and input data

[1]:
import pyActigraphy
[2]:
from pyActigraphy.analysis import SSA
[3]:
import numpy as np
[4]:
import plotly.graph_objs as go
[5]:
import os
[6]:
# Define the path to your input data
fpath = os.path.join(os.path.dirname(pyActigraphy.__file__),'tests/data/')

Read test file:

[7]:
raw = pyActigraphy.io.read_raw_awd(fpath+'example_01.AWD', start_time='1918-01-24 08:00:00', period='9 days')

SSA methodology

Embedding

Time-series \(x=(x_0,x_1,\dots,x_n,\dots,x_{N−1})^\intercal\), with length N, represents the signal under analysis. The mapping of this signal into a matrix A, of dimension L × K , assuming \(L \leq K\) , is called embedding, and can be defined as:

\[\begin{split}A = \begin{bmatrix} x_{0} & x_{1} & x_{2} & \dots & x_{K-1} \\ x_{1} & x_{2} & x_{3} & \dots & x_{K} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{L-1} & x_{L} & x_{L+1} & \dots & x_{N-1} \end{bmatrix}\end{split}\]

where L is the window length, or embedding dimension, and \(K = N − L + 1\). A is a Hankel matrix, called the trajectory matrix.

During the embedding step, the window length \(L\) is a crucial parameter; only the components with a (quasi)periodicity \(T \leq L\) will be resolved.

In the context of human locomotor activities, a natural choice for \(L\) is \(24\) h.

[8]:
mySSA = SSA(raw.data,window_length='24h')
[9]:
# Access the trajectory matrix
mySSA.trajectory_matrix().shape
[9]:
(1440, 11522)

In our example file, the acquisition frequency is \(1\) min. Therefore, the trajectory matrix dimension is 1440, ie. \(24\) h.

Singular value decomposition

Factorization of the trajectory matrix A, using Singular Value Decomposition (SVD), yields to:

\[A = U\Sigma V^\intercal =\sum_{r=1}^{R} \sigma_r u_r v_{r}^\intercal\]

where \(R = rank(A) \leq L\), \({u_1,\ldots, u_d }\) is the corresponding orthonormal system of the eigenvectors of the matrix \(S = AA^\intercal\) such as \(ui \cdot uj = 0\) for \(i \neq j\) and \(\lVert u_r \rVert = 1\), \(v_r = A^{\intercal} u_r / \sigma_r\), and \(\Sigma\) is a diagonal matrix \(\in \mathbb{R}^{L×K}\) , whose diagonal elements \({\sigma_r}\) are the singular values of A. The eigenvalues of \(AA^\intercal\) are given by \(\lambda_r = \sigma_r^2\).

Within the pyActigraphy package, the decomposition of the trajectory matrix is performed via the fit function:

[10]:
mySSA.fit()

Fractional partial variances and Scree diagram

One of the main results of SSA analysis is the so-called scree diagram that visually represents the partial variances \(\lambda_k = \sigma^2_k\), ordered according to magnitude from the most to the least dominant, where \(\lambda_k\) can be interpreted as the variance of the “sub phase-space” of time-series component \(g_k(n)\) and where \(\lambda_{tot} = \sum^r_{k=1} \lambda_k\) is the total variance of the phase space of the original time series x(n).

[11]:
# By definition, the sum of the partial variances should be equal to 1:
mySSA.lambda_s.sum()
[11]:
1.0
[12]:
layout = go.Layout(
    height=600,
    width=800,
    title="Scree diagram",
    xaxis=dict(title="Singular value index", type='log', showgrid=True, gridwidth=1, gridcolor='LightPink', title_font = {"size": 20}),
    yaxis=dict(title=r'$\lambda_{k} / \lambda_{tot}$', type='log', showgrid=True, gridwidth=1, gridcolor='LightPink', ),
    showlegend=False
)
[13]:
go.Figure(data=[go.Scatter(x=np.arange(0,len(mySSA.lambda_s)+1),y=mySSA.lambda_s)], layout=layout)

Elementary matrices

The SVD of the trajectory matrix X can thus be written as:

\[X = X_1 + \ldots + X_R\]

where \(X_r = \sqrt{\lambda_r} u_r v_{r}^\intercal\).

The matrices \(X_r\) have rank 1; such matrices are sometimes called elementary matrices.

[14]:
x_elem_0 = mySSA.X_elementary(r=0)
[15]:
x_elem_0.shape
[15]:
(1440, 11522)

A 1k-by-10k matrix is about 100MB… So calculating and storing 1k of them is NOT a good idea

Grouping

Prior to reconstructing the different signal components, It makes sense to group the elementary matrices based on the separability of their corresponding reconstructed series (i.e diagonal-averaged elementary matrices, \(\tilde{x}_n\), see next step).

One measure of separability can be obtained via the so-called w-correlation matrix. This matrix is obtained by calculating the weighted correlation between all possible combinations of two reconstructed series (\(\tilde{\mathbb{X}}^{(i)}, \tilde{\mathbb{X}}^{(j)}\)):

\[\left( \tilde{\mathbb{X}}^{(i)}, \tilde{\mathbb{X}}^{(j)} \right)_w = \sum_{n=1}^N w_{n} \tilde{x}^{(i)}_n \tilde{x}^{(j)}_n\]

.

The weights are obtained as the number of antidiagonal element of the trajectory matrix:

\[w_n = \# \{(i,j) | i+j = n+1, 1 \leq i \leq L, 1 \leq j \leq K \}\]

Let us now calculate this w-correlation matrix for the reconstructed series obtained from the 10 first elementary matrices (ordering based on their corresponding partial variance).

[16]:
w_corr_mat = mySSA.w_correlation_matrix(10)
[17]:
go.Figure(data=[go.Heatmap(z=w_corr_mat)], layout=go.Layout(height=800,width=800))

From this plot, one can infer that the reconstructed components 1&2 (and 3&4, 5&6, etc) are “correlated” and their associated elementary matrices must be grouped together.

Side remark: the same conclusion can be drawn from the Scree diagram; the singular values \(1\) and \(2\) are almost degenerate.

Diagonal averaging

If the components of the series are separable and the indices are being split accordingly, then all the matrices in the expansion \(X = X_{I_1} + \ldots + X_{I_m}\) are the Hankel matrices. We thus immediately obtain the decomposition \(x_n = \sum_{k=1}^m \tilde{x}_n^{(k)}\) of the original series: for all k and n, \(\tilde{x}_n^{(k)}\) is equal to all entries \(x^{(k)}_{ij}\) along the antidiagonal \({(i, j)| i + j = n+1}\) of the matrix \(X_{Ik}\).

In practice, however, this situation is not realistic. In the general case, no antidiagonal consists of equal elements. We thus need a formal procedure of transforming an arbitrary matrix into a Hankel matrix and therefore into a series. As such, we shall consider the procedure of diagonal averaging, which defines the values of the time series \(\tilde{\mathbb{X}}^{(k)} = (\tilde{x}^{(k)}_1,\ldots,\tilde{x}^{(k)}_N)\) as averages for the corresponding antidiagonals of the matrices \(X_{I_k}\).

  • for \(1 \leq n < L^{\star}\):

    \[\tilde{x}_n^{(k)} = \frac{1}{n} * \sum_{m=1}^{n} x^{\star}_{I_k, (m,n-m+1)}\]
  • for \(L^{\star} \leq n < K^{\star}\):

    \[\tilde{x}_n^{(k)} = \frac{1}{L^{\star}} * \sum_{m=1}^{L^{\star}} x^{\star}_{I_k, (m,n-m+1)}\]
  • for \(K^{\star} < n \leq N\):

    \[\tilde{x}_n^{(k)} = \frac{1}{N-n+1} * \sum_{m=n-K^{\star}+1}^{N-K^{\star}+1} x^{\star}_{I_k, (m,n-m+1)}\]

Trend (k=0)

[18]:
trend = mySSA.X_tilde(0)
[19]:
# By definition, the reconstructed components must have the same dimension as the original signal:
trend.shape[0] == len(raw.data.index)
[19]:
True

Rhythmic components (k=1,2)

[20]:
et12 = mySSA.X_tilde([1,2])

Rhythmic components (k=3,4)

[21]:
et34 = mySSA.X_tilde([3,4])
[22]:
layout = go.Layout(
    height=600,
    width=800,
    title="",
    xaxis=dict(title='Date Time'),
    yaxis=dict(title='Count'),
    shapes=[],
    showlegend=True
)
[23]:
go.Figure(data=[
    go.Scatter(x=raw.data.index,y=raw.data, name='Activity'),
    go.Scatter(x=raw.data.index,y=trend, name='Trend'),
    go.Scatter(x=raw.data.index,y=trend+et12, name='Circadian component'),
    go.Scatter(x=raw.data.index,y=trend+et34, name='Ultradian component')
], layout=layout)

Compare original signal with its reconstruction

Finally, it is possible to reconstruct a signal with an appropriate subset of components (i.e circadian and ultradian components).

For example, let’s reconstruct the signal from its 7 first components (0: trend, 1-2:circadian comp., 3-6:ultradian comp.):

[24]:
rec = mySSA.reconstructed_signal([0,1,2,3,4,5,6])
[25]:
go.Figure(data=[
    go.Scatter(x=raw.data.index, y=raw.data, name='Activity'),
    go.Scatter(x=raw.data.index, y=rec, name='Reconstructed signal')
], layout=go.Layout(height=600,width=800,showlegend=True))

In this respect, the SSA acts as a filter!

Et voilà!