stacker.kmeans.plot_pca

stacker.kmeans.plot_pca#

plot_pca(blinded_data, dataset_names, coloring='dataset', outdir='', cluster_labels=None, new_dataset_names=None)[source]#

Creates PCA Plot to compare systems in 2D

Creates a PCA plot that can be colored by the KMeans clustering result or by dataset. Compares SSFs similarly to K Means.

Parameters:

blinded_datanp.ndarray: A 2D numpy array containing all frames stacked together. Output of create_kmeans_input()
dataset_nameslist of str: List of filenames to read and preprocess. Outputted from stacker -s ssf -d output.txt.gz. Should be in the format {datapath}/{traj_name}.txt.gz
coloring{‘dataset’, ‘kmeans’, ‘facet’}: Method to color the points on the scatterplot. Options: - dataset: Plot all points on the same scatterplot and color by dataset of origin. - kmeans: Plot all points on the same scatterplot and color by KMeans Cluster with n_clusters - facet: Same as dataset but plot each dataset on a different coordinate grid.
outdirstr, default=’’: Directory to save the clustering results.
cluster_labelsnp.ndarray, optional: The labels of the clusters for each frame, output from run_kmeans. Used if coloring = “kmeans” to color points by cluster
new_dataset_namesdict, optional: Dictionary to remap dataset names. Keys are original filenames in dataset_names and values are shortened names.

Returns: