Functional linear modelling¶

This notebook illustrates how to perform a functional linear modelling of actigraphy data with pyActigraphy

Briefly speaking, the aim of such analysis is to find a “smooth” representation (functions) of the data for subsequent analysis.

Data can be converted to a functional form using either

basis function expansions (ex: Fourier, B-spline)
kernel functions

Both types are implemented in pyActigraphy.

NB: at the moment, the pyActigraphy package allows users to model the daily activity profile only. However, more profiles might be added in the future if needed.

The reference book for ‘functional data analysis’, including information about functional modelling:

Ramsay, J. O., & Silverman, B. W. (2005). Functional Data Analysis. New York, NY: Springer New York. https://doi.org/10.1007/b98888

In the context of actigraphy, a nice example can be found in:

Wang, J., Xian, H., Licis, A., Deych, E., Ding, J., McLeland, J., … Shannon, W. (2011). Measuring the impact of apnea and obesity on circadian activity patterns using functional linear modeling of actigraphy data. Journal of Circadian Rhythms, 9(1), 11. https://doi.org/10.1186/1740-3391-9-11

Imports and input data¶

[1]:

import pyActigraphy
from pyActigraphy.analysis import FLM

[2]:

import numpy as np

[3]:

import plotly.graph_objs as go

[4]:

# create objects for layout and traces
layout = go.Layout(autosize=False, width=850, height=600, title="",xaxis=dict(title=""), shapes=[], showlegend=True)

[5]:

import os

[6]:

# Define the path to your input data
fpath = os.path.join(os.path.dirname(pyActigraphy.__file__),'tests/data/')

Read an example file:

[7]:

raw = pyActigraphy.io.read_raw_awd(fpath+'example_01.AWD', start_time='1918-01-24 08:00:00', period='9 days')

[8]:

from pyActigraphy.analysis import FLM

Basis function expansion¶

The idea behind a basis function expansion is to represent the data \(\{x_i\}_{i\in[0,N]}\) as:

\[x_i = \beta_{1} \phi_1(t_i) + \beta_{2} \phi_2(t_i) +··· + \beta_{n} \phi_n(t_i)\]

where \(\{\beta_{i}\}_{i=1}^n\) are scalar coefficients and \(\{\phi_i(·)\}_{i=1}^n\) are basis functions. Possible basis functions include Fourier basis and splines.

Such expansion can be performed with a least-square fit

Using a Fourier basis expansion¶

In this case, the basis functions are simple (co)sine functions: \(\phi_{0}(t) = 1\), \(\phi_{2n}(t) = cos(n\omega t)\) and \(\phi_{2n-1}(t) = sin(n\omega t)\), \(\omega = \frac{2\pi}{T}\) where T is the period.

For the daily activity profile, T = 24h.

[9]:

# Resampling frequency for the daily activity profile
freq='1min'

[10]:

# The number of basis functions is max_order*2+1 (1 constant + n cosine functions + n sine functions)
max_order = 9

First, let’s define a FLM object with “Fourier” functions as a basis:

[11]:

flm = FLM(basis='fourier',sampling_freq=freq,max_order=max_order)

To estimate the scalar coefficients of the basis expansion (“beta” coefficients), use the fit function:

[12]:

# By setting the "verbose" parameter to True, the result of least-square fit is displayed:
flm.fit(raw,verbose=True)

Create first the basis functions: fourier
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.731
Model:                            OLS   Adj. R-squared:                  0.728
Method:                 Least Squares   F-statistic:                     214.7
Date:                Mon, 30 Jan 2023   Prob (F-statistic):               0.00
Time:                        16:42:38   Log-Likelihood:                -8390.5
No. Observations:                1440   AIC:                         1.682e+04
Df Residuals:                    1421   BIC:                         1.692e+04
Df Model:                          18
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        170.9896      2.178     78.523      0.000     166.718     175.261
x1          -139.4514      3.080    -45.283      0.000    -145.492    -133.410
x2           -73.6459      3.080    -23.915      0.000     -79.687     -67.605
x3           -10.5884      3.080     -3.438      0.001     -16.629      -4.547
x4           -29.8752      3.080     -9.701      0.000     -35.916     -23.834
x5            45.4418      3.080     14.756      0.000      39.401      51.483
x6            58.3750      3.080     18.956      0.000      52.334      64.416
x7             6.4342      3.080      2.089      0.037       0.393      12.475
x8           -28.0195      3.080     -9.099      0.000     -34.060     -21.979
x9           -13.4018      3.080     -4.352      0.000     -19.443      -7.361
x10          -23.0809      3.080     -7.495      0.000     -29.122     -17.040
x11           31.6536      3.080     10.279      0.000      25.613      37.695
x12          -13.2986      3.080     -4.318      0.000     -19.340      -7.258
x13          -27.9879      3.080     -9.088      0.000     -34.029     -21.947
x14           16.3752      3.080      5.317      0.000      10.334      22.416
x15           11.3963      3.080      3.701      0.000       5.355      17.437
x16          -33.7290      3.080    -10.953      0.000     -39.770     -27.688
x17          -15.9310      3.080     -5.173      0.000     -21.972      -9.890
x18            3.9516      3.080      1.283      0.200      -2.089       9.993
==============================================================================
Omnibus:                      379.235   Durbin-Watson:                   0.633
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1490.128
Skew:                           1.222   Prob(JB):                         0.00
Kurtosis:                       7.343   Cond. No.                         1.41
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Now, to reconstruct the signal using its expansion up to the 9th order:

[13]:

flm_est = flm.evaluate(raw)

And compare it to the original daily profile:

[14]:

daily_avg = raw.average_daily_activity(binarize=False,freq=freq)

[15]:

# set x-axis labels and their corresponding data values
labels = ['00:00', '06:00', '12:00', '18:00']
tickvals = ['00:00:00', '06:00:00', '12:00:00', '18:00:00']

layout = go.Layout(
    autosize=False, width=900, height=600,
    title="Daily profile",
    xaxis=dict(
        title="Time of day (HH:MM)",
        ticktext=labels,
        tickvals=tickvals),
    yaxis=dict(title="Counts (a.u)"),
    shapes=[], showlegend=True)

[16]:

go.Figure(data=[
    go.Scatter(x=daily_avg.index.astype(str),y=daily_avg,name='Raw activity'),
    go.Scatter(x=daily_avg.index.astype(str),y=flm_est,name='Fourier expansion (9th order)')
],layout=layout)

Using B-splines¶

B-splines are piecewise polynomial curves. By definition, they ensure the aforementioned “smoothness” of the data representation.

[17]:

daily_avg = raw.average_daily_activity(binarize=False,freq="30min")

In order to check how the data are interpolated, let’s set the resampling frequency at 30 min.

[18]:

flm_spline = FLM(basis='spline',sampling_freq='30min',max_order=3)

Again, like for the Fourier basis, the evaluation of the B-spline representation is performed via the fit function:

[19]:

flm_spline.fit(raw, verbose=False)

Now, let’s evaluate the splines:

[20]:

# The "r" parameter represents the ratio between the number of points at which the spline is evaluated and the original number of points.
r = 10

[21]:

spline_est = flm_spline.evaluate(raw,r=r)

To visualize the result, one needs to create 2 different X axis as there are “r” times more points in the evaluated spline:

[22]:

t = np.linspace(0,daily_avg.index.size,daily_avg.index.size,endpoint=True)

[23]:

t_10 = np.linspace(0,daily_avg.index.size,daily_avg.index.size*r,endpoint=True)

[24]:

data = [go.Scatter(x=t,y=daily_avg,name='Raw activity'),
        go.Scatter(x=t_10,y=spline_est,name='B-spline')
       ]

[25]:

go.Figure(data=data, layout=layout)

By zooming, you can verify the “smoothness” of the interpolation.

Gaussian kernel smoothing¶

As alreayd mentioned, another way to obtain a smooth representation of the data is to apply a smoothing function locally. This can be achieved by convoluting the data with a kernel function.

pyActigraphy makes available functions to smooth the data using a gaussian kernel.

When using kernel smoothing, the degree of smoothing is controlled by the so-called bandwith parameter. In our case, this corresponds to the “sigma” parameter of the gaussian kernel (i.e its “width”).

This parameter must be chosen as a trade-off between bias and variance: a too small bandwith will yield a high signal variability but a too high bandwith will result in a loss of details.

The pyActigraphy implements two usual “rules of thumbs” (Scott and Silverman) to automatically chose the bandwith, which usually provide a good starting point to search for the optimal bandwith.

[26]:

help(FLM.smooth)

Help on function smooth in module pyActigraphy.analysis.flm:

smooth(self, raw, binarize=False, method='scott', verbose=False)
    Smooth the actigraphy data using a gaussian kernel.

    Wrapper for the scipy.ndimage.gaussian_filter1d function.

    Parameters
    ----------
    raw : instance of BaseRaw or its child classes
        Raw measurements to be smoothed.
    binarize: bool.
        If True, the data are binarized (i.e 0 or 1).
        Default is False.
    method: str, float.
        Method to calculate the width of the gaussian kernel.
        Available methods are `scott`, `silverman`. Method can be
        a scalar value too.
        Default is `scott`.
    verbose: bool.
        If True, print the kernel size used to smooth the data.
        Default is False.

    Returns
    -------
    y_est : ndarray
        Returns the smoothed form of the actigraphy data.

[27]:

daily_avg = raw.average_daily_activity(freq=flm.sampling_freq, binarize=False)

[28]:

names = ['Raw activity', 'Scott', 'Silverman', 'Bandwith: 20']

[29]:

daily_avg_smoothed = []

[30]:

daily_avg_smoothed.append(flm.smooth(raw, method='scott', verbose=True))

Kernel size used to smooth the data: 39.16586876177629

[31]:

daily_avg_smoothed.append(flm.smooth(raw, method='silverman', verbose=True))

Kernel size used to smooth the data: 33.28544087403084

[32]:

daily_avg_smoothed.append(flm.smooth(raw, method=20))

[33]:

data = [go.Scatter(x=daily_avg.index.astype(str),y=daily_avg_smoothed[n], name=names[n+1]) for n in range(0,len(daily_avg_smoothed))]

[34]:

data.insert(0,go.Scatter(x=daily_avg.index.astype(str),y=daily_avg,name=names[0]))

[35]:

go.Figure(data=data,layout=layout)

Group analysis¶

In order to facilitate the analysis of large datasets, in addition to the ‘FML.fit’ functions, pyAcigraphy implements the ‘FLM.fit_reader’ functions. Theses functions make use of multiple CPU’s if available to speed up the computation.

Instead of using a simple “BaseRaw” object, which reads a single actigraphy recording, it takes as argument a “RawReader” object which allows users to read multiple recordings at once:

[36]:

reader = pyActigraphy.io.read_raw(fpath + 'example_*.AWD', 'AWD', n_jobs=10, prefer='threads', verbose=10)

[Parallel(n_jobs=10)]: Using backend ThreadingBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done   2 out of   6 | elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=10)]: Done   3 out of   6 | elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=10)]: Done   4 out of   6 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=10)]: Done   6 out of   6 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=10)]: Done   6 out of   6 | elapsed:    0.1s finished

The ‘reader’ object contains now the 6 .AWD recordings that are contained in the test directory of the pyActigraphy package.

Using Fourier functions¶

[37]:

# Define a FLM Object that can be (re-)used to fit the data
flm_fourier = FLM(basis='fourier',sampling_freq='10min',max_order=10)

[38]:

# Fit all the recordings contained in the "reader":
flm_fourier.fit_reader(reader, verbose_fit=False, n_jobs=2, prefer='threads', verbose_parallel=10)

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done   4 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s finished

Create first the basis functions: fourier

[39]:

y_est_group_fourier = flm_fourier.evaluate_reader(reader,r=10,n_jobs=2, prefer='threads', verbose_parallel=10)

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done   4 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s finished

[40]:

daily_avg = raw.average_daily_activity(binarize=False,freq='10min')

[41]:

go.Figure(data=[go.Scatter(x=daily_avg.index.astype(str),y=v,name=k) for k,v in y_est_group_fourier.items()],layout=layout)

Using B-splines¶

This section is similar to the one for the Fourier basis. It revolves around two functions:

‘fit_reader’
‘evaluate_reader’

[42]:

flm_spline = FLM(basis='spline',sampling_freq='30min',max_order=3)

[43]:

# Fit all the rawAWD instances
flm_spline.fit_reader(
    reader,
    verbose_fit=False,
    n_jobs=2,
    prefer='threads',
    verbose_parallel=10
)

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done   4 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s finished

[44]:

y_est_group_spline = flm_spline.evaluate_reader(
    reader,
    r=10,
    n_jobs=2,
    prefer='threads',
    verbose_parallel=10
)

[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done   4 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=2)]: Done   6 out of   6 | elapsed:    0.0s finished

[45]:

daily_avg = raw.average_daily_activity(binarize=False,freq='3min') # The original profile was sampled at T = 30 min.

[46]:

go.Figure(data=[go.Scatter(x=daily_avg.index.astype(str),y=v,name=k) for k,v in y_est_group_spline.items()],layout=layout)

Gaussian kernel smoothing¶

Do not forget that the ‘reader’ is just a container. So, in that case, a simple loop can do the trick:

[47]:

# Define a FLM object with the default Fourier basis (they won't be used, so set the max order to 1.)
flm_smooth = FLM(basis='fourier',sampling_freq='1min',max_order=1)

[48]:

y_smoothed_group = {
    ireader.display_name: flm_smooth.smooth(
        ireader,
        method='scott',
        verbose=True
    ) for ireader in reader.readers
}

Kernel size used to smooth the data: 16.055383376760133
Kernel size used to smooth the data: 27.681754609380107
Kernel size used to smooth the data: 39.269935742960044
Kernel size used to smooth the data: 49.341228558220386
Kernel size used to smooth the data: 32.458806988578594
Kernel size used to smooth the data: 29.162045711796324

[49]:

daily_avg = raw.average_daily_activity(binarize=False,freq='1min')

[50]:

go.Figure(data=[go.Scatter(x=daily_avg.index.astype(str),y=v,name=k) for k,v in y_smoothed_group.items()],layout=layout)

Et voilà!