stacker.kmeans.plot_pca#
- plot_pca(blinded_data, dataset_names, coloring='dataset', outdir='', cluster_labels=None, new_dataset_names=None)[source]#
Creates PCA Plot to compare systems in 2D
Creates a PCA plot that can be colored by the KMeans clustering result or by dataset. Compares SSFs similarly to K Means.
- Parameters:
- blinded_datanp.ndarray
A 2D numpy array containing all frames stacked together. Output of create_kmeans_input()
- dataset_nameslist of str
List of filenames to read and preprocess. Outputted from stacker -s ssf -d output.txt.gz. Should be in the format {datapath}/{traj_name}.txt.gz
- coloring{‘dataset’, ‘kmeans’, ‘facet’}
Method to color the points on the scatterplot. Options: - dataset: Plot all points on the same scatterplot and color by dataset of origin. - kmeans: Plot all points on the same scatterplot and color by KMeans Cluster with n_clusters - facet: Same as dataset but plot each dataset on a different coordinate grid.
- outdirstr, default=’’
Directory to save the clustering results.
- cluster_labelsnp.ndarray, optional
The labels of the clusters for each frame, output from run_kmeans. Used if coloring = “kmeans” to color points by cluster
- new_dataset_namesdict, optional
Dictionary to remap dataset names. Keys are original filenames in
dataset_names
and values are shortened names.
- Returns:
- None
See also
create_kmeans_input
blinds SSF Data for input to K Means
read_and_preprocess_data
reads and preprocesses SSF data for K Means analysis per dataset
sklearn.decomposition.PCA
Runs PCA