Data analysis

The global quantitative endpoints data analysis process consists of the following main steps:

  1. Statistical description of the independent variables for each DS
    Ecosystem determinants are statistically described per each DS as centralized tendency measures (mean, mode median) and sample level of dispersion (range, sample variance, sample standard deviation), in order to properly represent the sample distribution.
  2. Global endpoints Statistical description
    Global endpoints measurements (i.e. EuroQoL scale values) are aggregated according to different independent variables (Use Case and DS in primis). Each Global variable is represented according to the reference descriptive statistics parameters: mean and median as central tendency, sample variance and sample standard deviation to represent the level of dispersion of data. In order to generate global values able to effectively represent the heterogeneity of the LSP data set, each statistical description of the variable is weighted according to the dimension of the different local data sets (weighted mean and standard deviation). On top of this, the distributions for each outcome at 12 months is compared to the distributions at baseline and tested using 2-tails t-tests with statistical significance set at 95%.
  3. Assessment of the outcomes and statistical significance 
    Statistical significance of the expected outcomes, considered as difference between measurement groups of the same Global variables at different stages of the trial, is evaluated and significance tests are implemented to verify whether or not the difference between two groups’ averages most likely reflects a “real” difference in the population from which the groups were sampled (control/experiment groups) or it is due to the casual effects. Two groups average can be also considered within the same group of users pre and post-intervention.
  4. Data Mining
    To better understand the outcomes and infer their impact, the ACTIVAGE protocol foresees to study which are the ecosystem determinants, that means which are the input features (independent variables) that more impact on the end points (dependent variables).

The table below reports the LSP protocol statistical data analysis in a nutshell:

Statistical methodologies
LSP

Statistical description of the independent variables: centralized tendency measures (mean, mode median) and sample level of dispersion (range, sample variance, sample standard deviation)

Statistical description of Global outcomes per RUC: centralized tendency measures (mean, mode median) and sample level of dispersion (range, sample variance, sample standard deviation)

Correlation Matrix between Global variables/input features (Spearman correlation index)

Normality test: sample distribution test (Shapiro Wilk test, Skew, Kurtosis)

Assessment of outcomes (baseline/final; experiment/control): paired –t student test and nonparametric tests (Kruskal-Wallis test; Wilcoxon test; Mann-Whitney U test)

Data mining and Features Ranking: Random Forest algorithm and multi variate statistic