stacker.kmeans.run_kmeans#
- run_kmeans(data_arrays, n_clusters, max_iter=1000, n_init=20, random_state=1, outdir='')[source]#
Performs KMeans clustering on blinded SSF data and saves the results.
This function applies the KMeans clustering algorithm to the provided blinded SSF data, assigns each frame to a cluster, and counts the number of frames in each cluster for each dataset. The results are printed and saved to a file.
- Parameters:
- data_arraysdict
Output of read_and_preprocess_data(). Dictionary where keys are dataset names and values are the processed data arrays.
- n_clustersint
The number of clusters to form
- max_iterint, default=1000
Maximum number of iterations of the k-means algorithm for a single run.
- n_initint, default=20
Number of times the k-means algorithm will be run with different centroid seeds.
- random_stateint, default=1
Determines random number generation for centroid initialization.
- outdirstr, default=’’
Directory to save the clustering results. If empty, just prints to standard output.
- Returns:
- np.ndarray
The labels of the clusters for each frame.
See also
create_kmeans_input
blinds SSF Data for input to K Means
read_and_preprocess_data
reads and preprocesses SSF data for K Means analysis per dataset
Examples
>>> import stacker as st >>> data_arrays = { ... 'dataset1': np.random.rand(3200, 16129), ... 'dataset2': np.random.rand(3200, 16129) ... } >>> blinded_data = st.create_kmeans_input(data_arrays) >>> st.run_kmeans(blinded_data, n_clusters=4) Reading data: dataset1 Reading data: dataset2 (6400, 16129) {'dataset1': array([800, 800, 800, 800]), 'dataset2': array([800, 800, 800, 800])} Dataset: dataset1 Cluster 1: 800 matrices Cluster 2: 800 matrices Cluster 3: 800 matrices Cluster 4: 800 matrices Dataset: dataset2 Cluster 1: 800 matrices Cluster 2: 800 matrices Cluster 3: 800 matrices Cluster 4: 800 matrices