stacker.kmeans.plot_pca

Contents

stacker.kmeans.plot_pca#

plot_pca(blinded_data, dataset_names, coloring='dataset', outdir='', cluster_labels=None, new_dataset_names=None)[source]#

Creates PCA Plot to compare systems in 2D

Creates a PCA plot that can be colored by the KMeans clustering result or by dataset. Compares SSFs similarly to K Means.

Parameters:
blinded_datanp.ndarray

A 2D numpy array containing all frames stacked together. Output of create_kmeans_input()

dataset_nameslist of str

List of filenames to read and preprocess. Outputted from stacker -s ssf -d output.txt.gz. Should be in the format {datapath}/{traj_name}.txt.gz

coloring{‘dataset’, ‘kmeans’, ‘facet’}

Method to color the points on the scatterplot. Options: - dataset: Plot all points on the same scatterplot and color by dataset of origin. - kmeans: Plot all points on the same scatterplot and color by KMeans Cluster with n_clusters - facet: Same as dataset but plot each dataset on a different coordinate grid.

outdirstr, default=’’

Directory to save the clustering results.

cluster_labelsnp.ndarray, optional

The labels of the clusters for each frame, output from run_kmeans. Used if coloring = “kmeans” to color points by cluster

new_dataset_namesdict, optional

Dictionary to remap dataset names. Keys are original filenames in dataset_names and values are shortened names.

Returns:
None

See also

create_kmeans_input

blinds SSF Data for input to K Means

read_and_preprocess_data

reads and preprocesses SSF data for K Means analysis per dataset

sklearn.decomposition.PCA

Runs PCA