Sleep EMA Linear Mixed Models (LMM)¶

Replication Data for Wu et al. (2020) “Multi-Modal Data Collection for Measuring Health, Behavior, and Living Environment of Large-Scale Participant Cohorts: Conceptual Framework and Findings from Deployments”: Ecological Momentary Assessment Data (Beiwe)

[1]:

import pandas as pd
import numpy as np
from glob import glob
import seaborn as sns
import matplotlib.pyplot as plt
from pymer4 import Lmer
from helpers import diagnostic_plots
from jmspack.utils import JmsColors

[2]:

if "jms_style_sheet" in plt.style.available:
    plt.style.use("jms_style_sheet")

[3]:

df = (pd.read_csv("data/UT1000_ema_wide.csv")
      .dropna()
      .assign(**{"date": lambda d: pd.to_datetime(d["survey.date"])})
      # .pipe(lambda d: d[d["survey.date"].str.contains("2019")])
      .drop("survey.date", axis=1)
      .pipe(lambda d: d[d["date"] <= pd.to_datetime("03-15-2019")])
      .pipe(lambda d: d[d["date"] >= pd.to_datetime("02-14-2019")])
      .sort_values(["pid", "date"])
      .reset_index(drop=True)
      )
display(df.head()); df.shape

	pid	content	energy	lonely	refreshed	restful	sad	sleep	stress	date
0	1193rv5x	3.0	3.0	2.0	2.0	3.0	1.0	5.0	1.0	2019-02-14
1	1193rv5x	2.0	3.0	3.0	1.0	2.0	2.0	6.0	1.0	2019-02-15
2	1193rv5x	1.0	2.0	3.0	1.0	2.0	2.0	9.0	1.0	2019-02-16
3	1193rv5x	1.0	1.0	3.0	2.0	2.0	3.0	9.0	2.0	2019-02-17
4	1193rv5x	3.0	3.0	1.0	1.0	2.0	1.0	5.0	1.0	2019-02-18

[3]:

(4913, 10)

[4]:

print(df.date.min()); print(df.date.max())

2019-02-14 00:00:00
2019-03-15 00:00:00

[5]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4913 entries, 0 to 4912
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   pid        4913 non-null   object
 1   content    4913 non-null   float64
 2   energy     4913 non-null   float64
 3   lonely     4913 non-null   float64
 4   refreshed  4913 non-null   float64
 5   restful    4913 non-null   float64
 6   sad        4913 non-null   float64
 7   sleep      4913 non-null   float64
 8   stress     4913 non-null   float64
 9   date       4913 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(8), object(1)
memory usage: 384.0+ KB

[6]:

df.describe()

[6]:

	content	energy	lonely	refreshed	restful	sad	sleep	stress
count	4913.000000	4913.000000	4913.000000	4913.000000	4913.000000	4913.000000	4913.000000	4913.000000
mean	1.871565	2.134948	0.513332	1.618563	1.834724	0.591696	6.237737	1.175453
std	0.823113	0.830693	0.731918	0.881945	0.858564	0.773247	1.687056	0.882876
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	1.000000	2.000000	0.000000	1.000000	1.000000	0.000000	5.000000	1.000000
50%	2.000000	2.000000	0.000000	2.000000	2.000000	0.000000	6.000000	1.000000
75%	2.000000	3.000000	1.000000	2.000000	2.000000	1.000000	7.000000	2.000000
max	3.000000	4.000000	3.000000	3.000000	3.000000	3.000000	12.000000	3.000000

[7]:

target="sleep"
group="pid"
checkpoint="date"
feature_list=df.select_dtypes(float).drop(target, axis=1).columns.tolist()
_ = plt.figure(figsize=(5,3))
_ = sns.countplot(y=df[target])

[8]:

_ = plt.figure(figsize=(20, 5))
_ = sns.stripplot(data=df.select_dtypes(float).melt(), x="variable", y="value", alpha=0.5)

[9]:

mask = np.triu(np.ones_like(df.corr(), dtype=bool))
_ = sns.heatmap(df.corr(method="spearman"), mask=mask, annot=True)

[10]:

cluster_collection = "+".join(feature_list)
cluster_collection

[10]:

'content+energy+lonely+refreshed+restful+sad+stress'

[11]:

formula_string = f"{target} ~ {cluster_collection} + (1 | {group}) + (1 | {checkpoint})"
model = Lmer(data=df, formula=formula_string)
initial_mod_fixed_eff = model.fit(no_warnings=True)
initial_mod_fixed_eff

Formula: sleep~content+energy+lonely+refreshed+restful+sad+stress+(1|pid)+(1|date)

Family: gaussian         Inference: parametric

Number of observations: 4913     Groups: {'pid': 411.0, 'date': 30.0}

Log-likelihood: -8756.933        AIC: 17513.866

Random effects:

                 Name    Var    Std
pid       (Intercept)  0.710  0.843
date      (Intercept)  0.046  0.214
Residual               1.780  1.334

No random effect correlations specified

Fixed effects:

[11]:

	Estimate	2.5_ci	97.5_ci	SE	DF	T-stat	P-val	Sig
(Intercept)	4.929	4.690	5.169	0.122	1006.443	40.325	0.000	***
content	-0.064	-0.135	0.007	0.036	4733.664	-1.766	0.077	.
energy	0.025	-0.035	0.085	0.031	4876.512	0.826	0.409
lonely	0.026	-0.059	0.110	0.043	4871.830	0.596	0.551
refreshed	0.338	0.268	0.408	0.036	4755.562	9.434	0.000	***
restful	0.433	0.361	0.505	0.037	4766.650	11.838	0.000	***
sad	0.010	-0.070	0.090	0.041	4878.341	0.237	0.813
stress	-0.040	-0.099	0.019	0.030	4885.360	-1.327	0.184

[12]:

cluster_collection="refreshed+restful"
formula_string = f"{target} ~ {cluster_collection} + (1 | {group}) + (1 | {checkpoint})"
model = Lmer(data=df, formula=formula_string)
fixed_eff = model.fit(no_warnings=True)
fixed_eff

Formula: sleep~refreshed+restful+(1|pid)+(1|date)

Family: gaussian         Inference: parametric

Number of observations: 4913     Groups: {'pid': 411.0, 'date': 30.0}

Log-likelihood: -8747.223        AIC: 17494.446

Random effects:

                 Name    Var    Std
pid       (Intercept)  0.714  0.845
date      (Intercept)  0.046  0.214
Residual               1.780  1.334

No random effect correlations specified

Fixed effects:

[12]:

	Estimate	2.5_ci	97.5_ci	SE	DF	T-stat	Sig
(Intercept)	4.840	4.685	4.996	0.079	228.722	61.103	***
refreshed	0.339	0.270	0.409	0.035	4781.391	9.574	***
restful	0.429	0.357	0.500	0.036	4786.763	11.765	***

[13]:

try:
    fig, axs = diagnostic_plots(model_fit=model,
                                         X=None,
                                         y=None,
                                         figsize = (8,8),
                                         limit_cooks_plot = False,
                                         subplot_adjust_args={"wspace": 0.3, "hspace": 0.3}
                                        )
except:
    print("The function did not run - this is likely due to multiple random effects being specified")

[14]:

def r2_GLMM(model):
    sigma2_fe = model.design_matrix.multiply(model.fixef[0].mean()).var().sum()
    sigma2_re = model.ranef_var.loc[group, "Var"]
    sigma2_ol = model.ranef_var.loc["Residual", "Var"]
    r2_marginal = (sigma2_fe) / (sigma2_fe + sigma2_re + sigma2_ol)
    r2_conditional = (sigma2_fe + sigma2_re) / (sigma2_fe + sigma2_re + sigma2_ol)

    return pd.DataFrame({"Marginal_R_squared": r2_marginal,
                        "Conditional_R_squared": r2_conditional},
                       index=[0])

Conditional and marginal R squared¶

based on Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed‐effects models. Methods in ecology and evolution, 4(2), 133-142.
Marginal R-squared is the R-squared for just the fixed effects
Conditional R-squared is the R-squared for both the fixed effects and random effects

[15]:

r2_GLMM(model)

[15]:

	Marginal_R_squared	Conditional_R_squared
0	0.082767	0.34553

[16]:

predict_df = df.assign(**{f"{target}_fitted": model.fits})

[19]:

_ = plt.figure(figsize=(10, 4))
ax = sns.lineplot(data=predict_df,
                 x=checkpoint,
                 y=target,
                 hue=group,
                 palette=list(np.repeat(JmsColors.OFFWHITE, repeats=predict_df[group].nunique())),
                 legend=False,
                 alpha=0.4)
_ = sns.lineplot(data=predict_df,
                 x=checkpoint,
                 y=target,
                 color=JmsColors.DARKBLUE,
                 ax=ax)
_ = sns.lineplot(data=predict_df,
                 x=checkpoint,
                 y=f"{target}_fitted",
                 color=JmsColors.PURPLE,
                 ax=ax)
_ = plt.xticks(rotation=90)
_ = plt.plot([], [], label=f"{group} per user", c=JmsColors.OFFWHITE)
_ = plt.plot([], [], label=f"{group} true", c=JmsColors.DARKBLUE)
_ = plt.plot([], [], label=f"{group} fitted", c=JmsColors.PURPLE)
_ = plt.legend()

[18]:

_ = sns.lmplot(data=predict_df, x=target, y=f"{target}_fitted", hue=group, legend=False)