bgfc_kit package
Submodules
bgfc_kit.bgfc_analyses module
- bgfc_kit.bgfc_analyses.compute_epoch_cond_connectome_ztrans_nobadframe(epoch_data: ndarray)[source]
This function computes the z-transformed connectome for each epoch for each condition. So each subject will have multiple connectomes for each condition. This epoch-level processing is useful for training machine learning models. This function also cleans up unqualified epochs or unqualified frames: 1) If all frames within this epoch is np.nan, drop the epoch 2) If some frames within this epoch is np.nan, drop the frame then compute the correlation matrix
Parameters:
- epoch_data:
A list of 2d array. The length equals the number of participant. Each array is of the shape (# of epoch, # of parcels, # of TR per epoch)
Yields:
- all_connectomes:
A nested list. Each sub list is epoch level (nParcel by nParcel) correlation matrix for a participant.
- bgfc_kit.bgfc_analyses.compute_network_pc(pc, partition)[source]
This function takes in the PC for each parcel, then compute the PC for each network.
Parameters
- PC:
A (subject-level) dict with 200 items, one for each parcel
- partition:
A dictionary mapping each community name to a list of nodes in G
Returns
- network_pc:
PC measure for each network included in the input partition list.
- bgfc_kit.bgfc_analyses.compute_sub_cond_connectome_ztrans_nobadframe(epoch_data: ndarray)[source]
This function computes the z-transformed connectome for each subject for each condition. So each subject will have just 1 connectome each condition (averaged across epochs). This epoch-level processing is useful building graphs and for measuring individual differences. This function also cleans up unqualified epochs or unqualified frames: 1) If all frames within this epoch is np.nan, drop the epoch 2) If some frames within this epoch is np.nan, drop the frame then compute the correlation matrix
Parameters:
- epoch_data:
A list of 2d array. The length equals the number of participant. Each array is of the shape (# of epoch, # of parcels, # of TR per epoch)
Yields:
- sub_cond_connectome_ztrans:
A generator of 2d arrays. nParcel by nParcel Z-transformed correlation for each subject (averaged across all epochs). generator length is nsub
- bgfc_kit.bgfc_analyses.compute_threshold(corMat, density)[source]
This function computes threshold given edge density (i.e., the percentage of edges to keep).
Parameters:
- corMat:
One correlation matrix
- density:
The percentage of edges to keep (e.g., 15 means to keep the top 15%)
Returns:
- threshold:
The threshold computed based on the corMat and density
- bgfc_kit.bgfc_analyses.construct_graphs(corMats, threshold=0)[source]
This function construct a unthresholded, weighted, graph for each connectome
Parameters:
- corMat_list:
A list of connectome (i.e., correlation matrix)
- threshold:
Whether to include the edge, default is 0
Returns:
- graph_list:
A list of networkX graphs created based on the input correlation matrices.
- bgfc_kit.bgfc_analyses.construct_threshold_binary_graphs(corMats, density)[source]
This function construct thresholded, binary, graph for each connectome
Parameters:
- corMat_list:
A list of connectome (i.e., correlation matrix)
- density:
The percentage of edges to keep (e.g., 15 means to keep the top 15%)
Returns:
- graph_list:
A list of networkX graphs created based on the input correlation matrices.
- bgfc_kit.bgfc_analyses.define_epoch(FIRdesignMat_conf_dir: str, postfMRIprep_conf_dir: str)[source]
This function define the TRs for each epoch of each condition. This is necessary for separting the residual timeseries into epochs of each condition.
Parameters
- FIRdesignMat_conf_dir:
The directory for FIRdesignMat_conf.toml generated by the fir_design_matrix module. (see https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/fir_design_matrix.py)
- postfMRIprep_conf_dir:
The directory for postfMRIprep_pipeline_config.toml generated by the preprocessing_pipeline module. (https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/preprocessing_pipeline.py)
Return
- epoch:
3d array with the shape of (# of condition, # of epoch, # of total TR). This array defines the specific TRs associated with each epoch of each condition. For example, if each epoch contains 40 TR, then sum(epoch[0,0,:])=40
- bgfc_kit.bgfc_analyses.detect_bad_frame(sub_dir, signal, order, run_prop, spike)[source]
Residual timeseries can be contaminated by head motion, this function systematically scans the residual timeseries, identifying and masking the time points near a head motion spike based on framewise displacements measured by fMRIprep. Subsequently, motion-corrected BGFC matrices can be computed using numpy.ma.corrcoef followed by Fisher z-transformation, and this operation is available for both epoch- and condition-level analyses.
Parameters
- sub_dir:
This is used to locate the directory that contains the fmriprep output confounds for this subject, Which should be base_dir/derivative/sub_dir/func
- signal:
The output of load_sub_data, the shape is (nparcel, ts)
- order:
This information is contained in the configuration file This is a dictionary, with keys being the name of the run and values being its order. This is necessary to locate all the confound files in the derivative folder and to sort them in the same order as they will be concatenated and modeled. The name of a fMRIprep output confound file is sub-%s_task-%s_run-%s_desc-preproc_bold.nii, the keys here need to be specified as’task-%s_run-%s’. For example, ‘task-divPerFacePerTone_run-2’.
- run_prop:
This information is contained in the configuration file The proportion of frames with FD > 0.5 within each run, exceeding which removes the run (5%).
- spike:
This information is contained in the configuration file The spike cutoff, meaning that if a frame has FD greater than cutoff (2mm), then it would be treated as a spike.
Return
ts: signal without bad frames, the shape is still (nparcel, ts), but bad frames are np.nan across all TR
- bgfc_kit.bgfc_analyses.load_sub_data(input_data, atlas, mask) ndarray [source]
This function uses the selected parcellation scheme to divide the input data into parcels
Parameters
- input_data:
A string pointing to the 4D timeseries niimg-like object (most likely to be the residual activity data).
- atlas:
A string pointing to the 3D predefine parcellation scheme.
- mask:
A string pointing to the 3D subject functional mask.
Returns
signal: a 2d numpy array, first dimension is Parcel and the second is TR.
- bgfc_kit.bgfc_analyses.parcellate_rmMotion_batch(FIRdesignMat_conf_dir: str, postfMRIprep_conf_dir: str, sub, atlas)[source]
This function performs parcellation and remove motion frames for a list of subjects. Residual timeseries computed by postfMRIprep_pipeline was first parcellated using input atlas. Then the output signals further cleaned to remove the influnces of head motion.
Parameters
- FIRdesignMat_conf_dir:
The directory for FIRdesignMat_conf.toml generated by the fir_design_matrix module. (see https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/fir_design_matrix.py)
- postfMRIprep_conf_dir:
The directory for postfMRIprep_pipeline_config.toml generated by the preprocessing_pipeline module. (https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/preprocessing_pipeline.py)
- sub:
A list of subject, no prefix (e.g., 1,2,14)
- atlas:
A string pointing to the 3D predefine parcellation scheme.
Return
ts: signal without bad frames, the shape is still (nparcel, ts), but bad frames are np.nan across all TR
- bgfc_kit.bgfc_analyses.participation_coefficient(G, module_partition)[source]
Computes the participation coefficient of nodes of G with partition defined by module_partition. (Guimera et al. 2005).
Parameters
- G:
Class of networkx.Graph
- module_partition:
A dictionary mapping each community name to a list of nodes in G
Returns
- dict:
A dictionary mapping the nodes of G to their participation coefficient under the participation specified by module_partition.
- bgfc_kit.bgfc_analyses.plot_parcel_FIR_estimates(FIRdesignMat_conf_dir, postfMRIprep_conf_dir, parcel_id, sub_list, atlas)[source]
This function plots the estimates of FIR regressors across all subjects. These plots should serve as sanity checks for FIR model efficiency. The plot should reveal boxcart shape reflecting condition structure.
Parameters
- FIRdesignMat_conf_dir:
The directory for FIRdesignMat_conf.toml generated by the fir_design_matrix module. (see https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/fir_design_matrix.py)
- postfMRIprep_conf_dir:
The directory for postfMRIprep_pipeline_config.toml generated by the preprocessing_pipeline module. (https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/preprocessing_pipeline.py)
- sub_list:
A list of subject, no prefix (e.g., 1,2,14)
- atlas:
A string pointing to the 3D predefine parcellation scheme.
Return
- plot:
FIR estimates for a specific parcel.
- bgfc_kit.bgfc_analyses.separate_epochs(activity_data, epoch_list)[source]
This function is the first step of dividing the timeseries into list of epochs with condition labels, using the experiment specific epoch list. Specifically, this function divide the timeseries by condition, and extract all TRs within each condition.
Parameters
- activity_data:
3D array in shape [nSub, nVoxels/nParcel, nTRs] The masked activity data organized in voxel*TR formats of all subjects.
- epoch_list:
List of 3D array in shape [condition, nEpochs, nTRs] Specification of epochs and conditions, assuming all subjects have the same number of epoch. len(epoch_list) equals the number of subjects.
Returns
- raw_data:
List of 2D array in shape [nParcels, timepoints] The data organized in epochs. len(raw_data) equals (# of subjects) * (# of conditions per subject)
- labels:
List of 1D array, which is the condition labels of the epochs len(labels) labels equals len(raw_data). Showing the
- bgfc_kit.bgfc_analyses.separate_epochs_per_condition(raw_data, labels, condition_label, epoch_list)[source]
This function is the second step of dividing the full (residual) timeseries into epochs. Sepcifically, it filtered out all TRs of the same condition, then divided the TRs into epoch structure, based on the epoch_list.
Parameter
- raw_data:
Output from function separate_epochs
- labels:
Second otput from function separate_epochs
- condition_label:
One of the condition in the label output from separate_epochs. If there were 6 conditions, condition_labels are 0,1,2,3,4,5.
- epoch_list:
List of 3D array in shape [condition, nEpochs, nTRs] Specification of epochs and conditions, assuming all subjects have the same number of epoch. len(epoch_list) equals the number of subjects.
Return
- cond_epoch_ts:
a list of 3d array. The length of the list is the number of subject. Each array is of the shape (# of epochs of the condition, # of parcels, # of TR per epoch)
- bgfc_kit.bgfc_analyses.unpack_conf(FIRdesignMat_conf_dir: str, postfMRIprep_conf_dir: str)[source]
This function takes in the configuration toml file and unpack it into structdict data structure The configuration files contain important information that can be reused in subsequent analyses.
Parameters
- FIRdesignMat_conf_dir:
The directory for FIRdesignMat_conf.toml generated by the fir_design_matrix module. (see https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/fir_design_matrix.py)
- postfMRIprep_conf_dir:
The directory for postfMRIprep_pipeline_config.toml generated by the preprocessing_pipeline module. (https://github.com/peetal/bgfc_kit/blob/main/bgfc_kit/preprocessing_pipeline.py)
Returns
fir_cfg: dictionary-like data structure, keys are parameters; values are experiment-specific information in the config file preprocess_cfg: dictionary-like data structure, keys are parameters; values are experiment-specific information in the config file
- bgfc_kit.bgfc_analyses.vectorize_connectome(connectome_list)[source]
This function extracts the upper triangle of any connectome, which will potentially serve as features for training machine learning models.
Parameters:
- connectome_list:
A nested list of connectomes (2d array of the shape nParcel x nParcel)
Returns:
- connectome_vectors:
A list (len = subject) of list (len = epoch) of vectorized connectom(upper triangle, including diagnol)
bgfc_kit.fir_design_matrix module
- bgfc_kit.fir_design_matrix.generate_FIRdesignMat_template_toml(output_dir: str)[source]
This function generates the configuration file for constructing FIR design matrix at the specified output directory. You need to specify all the parameters listed below in order to build the FIR design matrix for your need. You can refer to the comments below for more information regarding each parameter. If you do not need to account for motion (i.e., running only ‘write_vanilla_FIRdesginMat’), you won’t need to fill out anything below ‘epoch_per_run’.
- Parameters:
output_dir – str Where the template configuration file will be created at.
- Returns:
None
- Notes:
Below are the details for all the configuration parameters included in the template config file. You can find all these information within the configuration file as well. It is also worthnoting that the length of each run should be run_leadingin_tr + epoch_per_run*epoch_tr + run_leadingout_tr
- conditions:
A list of conditions; The order should be IDENTICAL to how they will be concatenated when running the GLMs.
- rep:
The number of runs each condition is repeated.
- fir_regressors:
The name of the regressors (e.g., A TR within a block is usually one of the 3 components: instruction, task, and IBI), the names are only for clarity. The regressors do not need to cover the entire block. For example, you can only modle the first 36 TRs for each 40 TR block.
- epoch_tr:
The number of TR within each block/epoch.
- run_leadingin_tr:
The number of leading in TR.
- run_leadingout_tr:
The number of leading out TR.
- epoch_per_run:
The number of blocks/epochs within each run.
- fmriprep_dir:
Where fmriprep derivative is located.
- spike_cutoff:
The threshold to ignore a frame in the designmatrix.
- prop_spike_cutoff:
Between 0 and 100, if the percentage of frames within a run has fd > fd_cutoff is greater than prop_spike_cutoff, then remove the run.
- sub_id:
Subject id, naming convention should follow how the subject folder is named in the fMRIprep derivative folder, do not need to include the ‘sub-’ prefix.
- order:
This is a dictionary, with keys being the name of the run and values being its order. This is necessary to locate all the confound files in the derivative folder and to sort them in the same order as they will be concatenated and modeled. The name of a fMRIprep output confound file is sub-%s_task-%s_run-%s_desc-preproc_bold.nii, the keys here need to be specified as’task-%s_run-%s’. For example, ‘task-divPerFacePerTone_run-2’.
- bgfc_kit.fir_design_matrix.write_personalized_FIRdesginMat(cfg_dir: str, output_dir: str)[source]
This function personlize FIR design matrix for each participant, aims to remove bad runs and bad frames from the FIR model. Bad runs and bad frames were defined based on the framewise-displacement confound outputed by fMRIprep. The implementation here is that, 1) if more than prop_spike_cutoff (e.g., 5%) of frames with in a run has fd > fd_cutoff (e.g., 0.5), then all regressors will be 0 for all TRs within this run, 2) if the run is good, then look at each frame, if the frame has fd > spike_cutoff (e.g., 2mm), then all regressors of this frame and its preceding and following frames will be 0.
- Parameters:
cfg_dir – str The path for the toml configuration file.
output_dir – str Where the text file and heatmap will be created at.
- Returns:
None
- Notes:
This function generates FIR design matrix customized for each subject, removing spikes and bad runs (due to motion). For the specified subject, this function writes out fir_design_matrix.txt, which is finite impulse response (FIR) model design matrix, and fir_design_heatmap.png, which is the heatmap of the design matrix.
- bgfc_kit.fir_design_matrix.write_vanilla_FIRdesginMat(cfg_dir: str, output_dir: str)[source]
This function writes out the vanilla FIR design matrix to the output_dir. It also plots it out so you can eyeball the structure to make sure it looks correct. This function assumes the functional scans were ordered in the same way across the subjects even though they were run in different orders. This can easily be achieved by naming your EPI runs during the R&D session. As a result, this function generates the design matrix that works for all participants.
- Parameters:
cfg_dir – str The path for the toml configuration file.
output_dir – str Where the text file and heatmap will be created at.
- Returns:
None
- Notes:
This function does not account for spikes or bad runs, thus being ‘vanilla’. This function writes out fir_design_matrix.txt, which is finite impulse response (FIR) model design matrix, and fir_design_heatmap.png, which is the heatmap of the design matrix.
bgfc_kit.preprocessing_pipeline module
- bgfc_kit.preprocessing_pipeline.generate_postfMRIprep_pipeline_template_toml(output_dir)[source]
This is the configuration file for post-fMRIprep preprocessing pipeline. This pipeline includes 1) smoothing and high-pass filterinng, 2) nuisance GLM using FSL FEAT implemented with nipype, 3) demean of each run (i.e., zscoring the whole run using the mean and sd of the ‘resting TRs’, which will be defined by your configuration file), and concatenating all runs into one long timeseries (the order of concatenation is crucial, it should be identical to your FIR design matrix, which will be defined by your configuration file), and 4) FIR glm. You need to specify all the parameters listed below in order to run the pipeline. The important output files are 1) evoked timeseries after regressing out confounds, which can be found sub-xx/before_FIR folder, 2) residual timeseries after regressing out stimulus evoked activities, which can be found sub-xx/FIR_residual folder, and 3) FIR regressor beta values, which can be found sub-xx/FIR_betas.
Function parameters
- output_dir:
Where the template configuration file will be created at.
Configuration parameters
- sub_id:
subject id; This is placed at %s following ‘sub-’; be consistent with fMRIprep naming convention: sub-%s_task-%s_space-%s_desc-preproc_bold.nii.gz
- task_id:
A list of tasks; This is placed at %s following ‘task-’. IMPORTANT: make sure the order you provide is consistent with the design matrix
- space:
This be placed at %s following ‘space-’ (e.g., MNI152NLin2009cAsym_res-2)
- base_dir:
Where fMRIPrep derivative folder is at
- output_dir:
Output directory
- designMat_dir:
Directory for the FIR design matrix
- runRest_tr:
List of TRs that are ‘rest TR’, will serve as baseline activity level
- fwhm:
Smoothing kernel size
- hpcutoff:
High pass filter cut off, by setting default value being 50, high pass filter is 100
- nproc:
Multithreading
- bgfc_kit.preprocessing_pipeline.run_postfMRIprep_pipeline(cfg_dir)[source]
This function takes in the configuration file and generate the corresponding python command for post-fMRIprep preprocessing pipeline. Then run that command to submit a python job.
Parameters
- cfg_dir:
The directory of the configuration file
- bgfc_kit.preprocessing_pipeline.submit_postfMRIprep_pipeline_SLURM(cfg_dir, shell_dir, account, partition, jobname, memory, time='1-00:00:00', log='%x_%A_%a.log', env='jupyterlab-tf-pyt-20211020')[source]
This function first write out a shell script, then submit the python command to SLURM
Parameters
- cfg_dir:
The directory of the configuration file
- shell_dir:
Where to write the shell script, including script’s name
- account:
The lab account (e.g., hulacon)
- partition:
The node partition (e.g., long, short, fat)
- memory:
The amount of memory (e.g., 100GB)