stacker.kmeans.read_and_preprocess_data#
- read_and_preprocess_data((file1, file2, ...))[source]#
Reads and preprocesses SSF data for K Means analysis per dataset.
Reads SSF data from txt files for each dataset, decompresses the data, and attaches each Trajectory to its frame-wise SSF results. The values are flattened SSF lists, so rather than a 3200 frames x 127 res x 127 res, it’s a 3200 frames x 16129 res-res pairs. For example, a 2-residue, 2-frame SSF of
[ [[1, 2], [3, 4]],
[[5, 6], [7, 8]] ]
is flattened to:
[[1, 2, 3, 4], [5, 6, 7, 8]]
- Parameters:
- file1, file2, …list of str
List of filenams to read and preprocess. Outputted from -s ssf -d output.txt. Should be in the format {datapath}/{traj_name}.txt.gz
- Returns:
- data_arraysdict
Dictionary where keys are dataset names and values are the processed data arrays.
See also
create_kmeans_input
Stacks SSF data into a single 2D Numpy array.
Examples
>>> import stacker as st >>> dataset_names = ['testing/5JUP_N2_tGGG_aCCU_+1GCU.txt.gz', 'testing/5JUP_N2_tGGG_aCCU_+1CGU.txt.gz'] # 3200 frames, SSFs of 127 x 127 residues >>> data_arrays = st.read_and_preprocess_data(dataset_names) >>> print(data_arrays['dataset1'].shape) (3200, 16129)