stacker.kmeans.read_and_preprocess_data

stacker.kmeans.read_and_preprocess_data#

read_and_preprocess_data((file1, file2, ...))[source]#

Reads and preprocesses SSF data for K Means analysis per dataset.

Reads SSF data from txt files for each dataset, decompresses the data, and attaches each Trajectory to its frame-wise SSF results. The values are flattened SSF lists, so rather than a 3200 frames x 127 res x 127 res, it’s a 3200 frames x 16129 res-res pairs. For example, a 2-residue, 2-frame SSF of

[ [[1, 2], [3, 4]],

[[5, 6], [7, 8]] ]

is flattened to:

[[1, 2, 3, 4], [5, 6, 7, 8]]

Parameters:
file1, file2, …list of str

List of filenams to read and preprocess. Outputted from -s ssf -d output.txt. Should be in the format {datapath}/{traj_name}.txt.gz

Returns:
data_arraysdict

Dictionary where keys are dataset names and values are the processed data arrays.

See also

create_kmeans_input

Stacks SSF data into a single 2D Numpy array.

Examples

>>> import stacker as st
>>> dataset_names = ['testing/5JUP_N2_tGGG_aCCU_+1GCU.txt.gz', 'testing/5JUP_N2_tGGG_aCCU_+1CGU.txt.gz']  # 3200 frames, SSFs of 127 x 127 residues
>>> data_arrays = st.read_and_preprocess_data(dataset_names)
>>> print(data_arrays['dataset1'].shape)
(3200, 16129)