Lan Luo & Teague Henry
November 7, 2018
CS-GIMME
Confirmatory Subgroup GIMME enables researches to conduct GIMME both on the entire group as well as within predefined (e.g., observed) subgroups, such as clinical populations or biological sex. This allows for researchers to identify aspects of dynamic processes that are shared across the subgroups (i.e., group-level paths) as well as how the subgroups differ (i.e., subgroup-level paths). This document is a brief tutorial on using CS-GIMME, and contains several functions for extracting paths that replicate across people within a subgroup.
A sample dataset can be found at https://github.com/kgates/gimme/blob/master/Example%20Data.zip. In this dataset, each separate file is for each individual/session, each variable is a column and the rows are the observations. Following setting up the GIMME directories and organizing one’s variable timeseries, to use CS-GIMME, one needs to create an additional data frame that contains the subgroup information. This data frame must contain two columns. The first column contains the names of each subject variable timeseries file, sans extension. For example, for our sample dataset, the first time series file is labeled ‘group_1_1.txt’, the correct labeling in the data frame would be ‘group_1_1’. The second column contains integer valued subgroup labels. A dataframe that contains the subgroup information for our sample dataset can be made using the following code:
require("tools") #needed to get file names sans extension
# get filenames in the folder without extension
filename <- file_path_sans_ext(list.files(path = "t_120_n_25_v_5", full.names = FALSE))
# Create subgroup vector of subgroup assignments; one per person
subgroup <- c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2)
# Create the dataframe
confirm_dataframe <- data.frame(filename, subgroup)
Note that the subgroup memberships cannot be zero and must be positive integers.
When running CS-GIMME, this source folder should include nothing other than the time series data. The researcher can either specify an output directory (as shown below) or store the output as an object. Once the dataframe is created, CS-GIMME using this sample dataset can be run using the following code:
output <- gimme(data = 't_120_n_25_v_5', # source directory
out = 'SampleOutput', # output directory
sep = ",", # how data are separated
header = FALSE, # Logical; is there a header
subgroup = TRUE, # Must be TRUE to perform
# confirmatory subgrouping
confirm_subgroup = confirm_dataframe, # confirm_dataframe is
#the dataframe constructed above
groupcutoff = .75, # the proportion that is considered
# the majority at the group level
subcutoff = .75 # the proportion that is considered
# the majority at the subgroup level
)
Please note that the user can specify how many individuals must have a significant path for it to be considered the ‘majority’ of individuals. The argument ‘groupcutoff’ dictates this for the group-level paths; 0.75 is default and has worked well in a variety of simulation studies, but emerging evidence suggests that 51% may work well in contexts were there might be a low signal to noise ratio or low power to detect effects due to shorter time series length.
‘subcutoff’ dictates the proportion that is considered the majority for the subgroups. Studies have performed well with both 51% and 75% being of individuals being considered the majority.