Functional linear modelling¶
This notebook illustrates how to perform a functional linear modelling of actigraphy data with pyActigraphy
Briefly speaking, the aim of such analysis is to find a “smooth” representation (functions) of the data for subsequent analysis.
Data can be converted to a functional form using either
basis function expansions (ex: Fourier, B-spline)
kernel functions
Both types are implemented in pyActigraphy
.
NB: at the moment, the pyActigraphy
package allows users to model the daily activity profile only. However, more profiles might be added in the future if needed.
The reference book for ‘functional data analysis’, including information about functional modelling:
Ramsay, J. O., & Silverman, B. W. (2005). Functional Data Analysis. New York, NY: Springer New York. https://doi.org/10.1007/b98888
In the context of actigraphy, a nice example can be found in:
Wang, J., Xian, H., Licis, A., Deych, E., Ding, J., McLeland, J., … Shannon, W. (2011). Measuring the impact of apnea and obesity on circadian activity patterns using functional linear modeling of actigraphy data. Journal of Circadian Rhythms, 9(1), 11. https://doi.org/10.1186/1740-3391-9-11
Imports and input data¶
[1]:
import pyActigraphy
from pyActigraphy.analysis import FLM
[2]:
import numpy as np
[3]:
import plotly.graph_objs as go
[4]:
# create objects for layout and traces
layout = go.Layout(autosize=False, width=850, height=600, title="",xaxis=dict(title=""), shapes=[], showlegend=True)
[5]:
import os
[6]:
# Define the path to your input data
fpath = os.path.join(os.path.dirname(pyActigraphy.__file__),'tests/data/')
Read an example file:
[7]:
raw = pyActigraphy.io.read_raw_awd(fpath+'example_01.AWD', start_time='1918-01-24 08:00:00', period='9 days')
[8]:
from pyActigraphy.analysis import FLM
Basis function expansion¶
The idea behind a basis function expansion is to represent the data \(\{x_i\}_{i\in[0,N]}\) as:
where \(\{\beta_{i}\}_{i=1}^n\) are scalar coefficients and \(\{\phi_i(·)\}_{i=1}^n\) are basis functions. Possible basis functions include Fourier basis and splines.
Such expansion can be performed with a least-square fit
Using a Fourier basis expansion¶
In this case, the basis functions are simple (co)sine functions: \(\phi_{0}(t) = 1\), \(\phi_{2n}(t) = cos(n\omega t)\) and \(\phi_{2n-1}(t) = sin(n\omega t)\), \(\omega = \frac{2\pi}{T}\) where T is the period.
For the daily activity profile, T = 24h.
[9]:
# Resampling frequency for the daily activity profile
freq='1min'
[10]:
# The number of basis functions is max_order*2+1 (1 constant + n cosine functions + n sine functions)
max_order = 9
First, let’s define a FLM object with “Fourier” functions as a basis:
[11]:
flm = FLM(basis='fourier',sampling_freq=freq,max_order=max_order)
To estimate the scalar coefficients of the basis expansion (“beta” coefficients), use the fit
function:
[12]:
# By setting the "verbose" parameter to True, the result of least-square fit is displayed:
flm.fit(raw,verbose=True)
Create first the basis functions: fourier
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.731
Model: OLS Adj. R-squared: 0.728
Method: Least Squares F-statistic: 214.7
Date: Mon, 30 Jan 2023 Prob (F-statistic): 0.00
Time: 16:42:38 Log-Likelihood: -8390.5
No. Observations: 1440 AIC: 1.682e+04
Df Residuals: 1421 BIC: 1.692e+04
Df Model: 18
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 170.9896 2.178 78.523 0.000 166.718 175.261
x1 -139.4514 3.080 -45.283 0.000 -145.492 -133.410
x2 -73.6459 3.080 -23.915 0.000 -79.687 -67.605
x3 -10.5884 3.080 -3.438 0.001 -16.629 -4.547
x4 -29.8752 3.080 -9.701 0.000 -35.916 -23.834
x5 45.4418 3.080 14.756 0.000 39.401 51.483
x6 58.3750 3.080 18.956 0.000 52.334 64.416
x7 6.4342 3.080 2.089 0.037 0.393 12.475
x8 -28.0195 3.080 -9.099 0.000 -34.060 -21.979
x9 -13.4018 3.080 -4.352 0.000 -19.443 -7.361
x10 -23.0809 3.080 -7.495 0.000 -29.122 -17.040
x11 31.6536 3.080 10.279 0.000 25.613 37.695
x12 -13.2986 3.080 -4.318 0.000 -19.340 -7.258
x13 -27.9879 3.080 -9.088 0.000 -34.029 -21.947
x14 16.3752 3.080 5.317 0.000 10.334 22.416
x15 11.3963 3.080 3.701 0.000 5.355 17.437
x16 -33.7290 3.080 -10.953 0.000 -39.770 -27.688
x17 -15.9310 3.080 -5.173 0.000 -21.972 -9.890
x18 3.9516 3.080 1.283 0.200 -2.089 9.993
==============================================================================
Omnibus: 379.235 Durbin-Watson: 0.633
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1490.128
Skew: 1.222 Prob(JB): 0.00
Kurtosis: 7.343 Cond. No. 1.41
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Now, to reconstruct the signal using its expansion up to the 9th order:
[13]:
flm_est = flm.evaluate(raw)
And compare it to the original daily profile:
[14]:
daily_avg = raw.average_daily_activity(binarize=False,freq=freq)
[15]:
# set x-axis labels and their corresponding data values
labels = ['00:00', '06:00', '12:00', '18:00']
tickvals = ['00:00:00', '06:00:00', '12:00:00', '18:00:00']
layout = go.Layout(
autosize=False, width=900, height=600,
title="Daily profile",
xaxis=dict(
title="Time of day (HH:MM)",
ticktext=labels,
tickvals=tickvals),
yaxis=dict(title="Counts (a.u)"),
shapes=[], showlegend=True)
[16]:
go.Figure(data=[
go.Scatter(x=daily_avg.index.astype(str),y=daily_avg,name='Raw activity'),
go.Scatter(x=daily_avg.index.astype(str),y=flm_est,name='Fourier expansion (9th order)')
],layout=layout)
Using B-splines¶
B-splines are piecewise polynomial curves. By definition, they ensure the aforementioned “smoothness” of the data representation.
[17]:
daily_avg = raw.average_daily_activity(binarize=False,freq="30min")
In order to check how the data are interpolated, let’s set the resampling frequency at 30 min.
[18]:
flm_spline = FLM(basis='spline',sampling_freq='30min',max_order=3)
Again, like for the Fourier basis, the evaluation of the B-spline representation is performed via the fit
function:
[19]:
flm_spline.fit(raw, verbose=False)
Now, let’s evaluate the splines:
[20]:
# The "r" parameter represents the ratio between the number of points at which the spline is evaluated and the original number of points.
r = 10
[21]:
spline_est = flm_spline.evaluate(raw,r=r)
To visualize the result, one needs to create 2 different X axis as there are “r” times more points in the evaluated spline:
[22]:
t = np.linspace(0,daily_avg.index.size,daily_avg.index.size,endpoint=True)
[23]:
t_10 = np.linspace(0,daily_avg.index.size,daily_avg.index.size*r,endpoint=True)
[24]:
data = [go.Scatter(x=t,y=daily_avg,name='Raw activity'),
go.Scatter(x=t_10,y=spline_est,name='B-spline')
]
[25]:
go.Figure(data=data, layout=layout)
By zooming, you can verify the “smoothness” of the interpolation.
Gaussian kernel smoothing¶
As alreayd mentioned, another way to obtain a smooth representation of the data is to apply a smoothing function locally. This can be achieved by convoluting the data with a kernel function.
pyActigraphy
makes available functions to smooth the data using a gaussian kernel.
When using kernel smoothing, the degree of smoothing is controlled by the so-called bandwith parameter. In our case, this corresponds to the “sigma” parameter of the gaussian kernel (i.e its “width”).
This parameter must be chosen as a trade-off between bias and variance: a too small bandwith will yield a high signal variability but a too high bandwith will result in a loss of details.
The pyActigraphy
implements two usual “rules of thumbs” (Scott and Silverman) to automatically chose the bandwith, which usually provide a good starting point to search for the optimal bandwith.
[26]:
help(FLM.smooth)
Help on function smooth in module pyActigraphy.analysis.flm:
smooth(self, raw, binarize=False, method='scott', verbose=False)
Smooth the actigraphy data using a gaussian kernel.
Wrapper for the scipy.ndimage.gaussian_filter1d function.
Parameters
----------
raw : instance of BaseRaw or its child classes
Raw measurements to be smoothed.
binarize: bool.
If True, the data are binarized (i.e 0 or 1).
Default is False.
method: str, float.
Method to calculate the width of the gaussian kernel.
Available methods are `scott`, `silverman`. Method can be
a scalar value too.
Default is `scott`.
verbose: bool.
If True, print the kernel size used to smooth the data.
Default is False.
Returns
-------
y_est : ndarray
Returns the smoothed form of the actigraphy data.
[27]:
daily_avg = raw.average_daily_activity(freq=flm.sampling_freq, binarize=False)
[28]:
names = ['Raw activity', 'Scott', 'Silverman', 'Bandwith: 20']
[29]:
daily_avg_smoothed = []
[30]:
daily_avg_smoothed.append(flm.smooth(raw, method='scott', verbose=True))
Kernel size used to smooth the data: 39.16586876177629
[31]:
daily_avg_smoothed.append(flm.smooth(raw, method='silverman', verbose=True))
Kernel size used to smooth the data: 33.28544087403084
[32]:
daily_avg_smoothed.append(flm.smooth(raw, method=20))
[33]:
data = [go.Scatter(x=daily_avg.index.astype(str),y=daily_avg_smoothed[n], name=names[n+1]) for n in range(0,len(daily_avg_smoothed))]
[34]:
data.insert(0,go.Scatter(x=daily_avg.index.astype(str),y=daily_avg,name=names[0]))
[35]:
go.Figure(data=data,layout=layout)
Group analysis¶
In order to facilitate the analysis of large datasets, in addition to the ‘FML.fit’ functions, pyAcigraphy
implements the ‘FLM.fit_reader’ functions. Theses functions make use of multiple CPU’s if available to speed up the computation.
Instead of using a simple “BaseRaw” object, which reads a single actigraphy recording, it takes as argument a “RawReader” object which allows users to read multiple recordings at once:
[36]:
reader = pyActigraphy.io.read_raw(fpath + 'example_*.AWD', 'AWD', n_jobs=10, prefer='threads', verbose=10)
[Parallel(n_jobs=10)]: Using backend ThreadingBackend with 10 concurrent workers.
[Parallel(n_jobs=10)]: Done 2 out of 6 | elapsed: 0.1s remaining: 0.1s
[Parallel(n_jobs=10)]: Done 3 out of 6 | elapsed: 0.1s remaining: 0.1s
[Parallel(n_jobs=10)]: Done 4 out of 6 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=10)]: Done 6 out of 6 | elapsed: 0.1s remaining: 0.0s
[Parallel(n_jobs=10)]: Done 6 out of 6 | elapsed: 0.1s finished
The ‘reader’ object contains now the 6 .AWD recordings that are contained in the test directory of the pyActigraphy
package.
Using Fourier functions¶
[37]:
# Define a FLM Object that can be (re-)used to fit the data
flm_fourier = FLM(basis='fourier',sampling_freq='10min',max_order=10)
[38]:
# Fit all the recordings contained in the "reader":
flm_fourier.fit_reader(reader, verbose_fit=False, n_jobs=2, prefer='threads', verbose_parallel=10)
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 4 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished
Create first the basis functions: fourier
[39]:
y_est_group_fourier = flm_fourier.evaluate_reader(reader,r=10,n_jobs=2, prefer='threads', verbose_parallel=10)
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 4 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished
[40]:
daily_avg = raw.average_daily_activity(binarize=False,freq='10min')
[41]:
go.Figure(data=[go.Scatter(x=daily_avg.index.astype(str),y=v,name=k) for k,v in y_est_group_fourier.items()],layout=layout)
Using B-splines¶
This section is similar to the one for the Fourier basis. It revolves around two functions:
‘fit_reader’
‘evaluate_reader’
[42]:
flm_spline = FLM(basis='spline',sampling_freq='30min',max_order=3)
[43]:
# Fit all the rawAWD instances
flm_spline.fit_reader(
reader,
verbose_fit=False,
n_jobs=2,
prefer='threads',
verbose_parallel=10
)
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 4 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished
[44]:
y_est_group_spline = flm_spline.evaluate_reader(
reader,
r=10,
n_jobs=2,
prefer='threads',
verbose_parallel=10
)
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 4 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s
[Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished
[45]:
daily_avg = raw.average_daily_activity(binarize=False,freq='3min') # The original profile was sampled at T = 30 min.
[46]:
go.Figure(data=[go.Scatter(x=daily_avg.index.astype(str),y=v,name=k) for k,v in y_est_group_spline.items()],layout=layout)
Gaussian kernel smoothing¶
Do not forget that the ‘reader’ is just a container. So, in that case, a simple loop can do the trick:
[47]:
# Define a FLM object with the default Fourier basis (they won't be used, so set the max order to 1.)
flm_smooth = FLM(basis='fourier',sampling_freq='1min',max_order=1)
[48]:
y_smoothed_group = {
ireader.display_name: flm_smooth.smooth(
ireader,
method='scott',
verbose=True
) for ireader in reader.readers
}
Kernel size used to smooth the data: 16.055383376760133
Kernel size used to smooth the data: 27.681754609380107
Kernel size used to smooth the data: 39.269935742960044
Kernel size used to smooth the data: 49.341228558220386
Kernel size used to smooth the data: 32.458806988578594
Kernel size used to smooth the data: 29.162045711796324
[49]:
daily_avg = raw.average_daily_activity(binarize=False,freq='1min')
[50]:
go.Figure(data=[go.Scatter(x=daily_avg.index.astype(str),y=v,name=k) for k,v in y_smoothed_group.items()],layout=layout)
Et voilà!