stacker.kmeans.run_kmeans

Contents

stacker.kmeans.run_kmeans#

run_kmeans(data_arrays, n_clusters, max_iter=1000, n_init=20, random_state=1, outdir='')[source]#

Performs KMeans clustering on blinded SSF data and saves the results.

This function applies the KMeans clustering algorithm to the provided blinded SSF data, assigns each frame to a cluster, and counts the number of frames in each cluster for each dataset. The results are printed and saved to a file.

Parameters:
data_arraysdict

Output of read_and_preprocess_data(). Dictionary where keys are dataset names and values are the processed data arrays.

n_clustersint

The number of clusters to form

max_iterint, default=1000

Maximum number of iterations of the k-means algorithm for a single run.

n_initint, default=20

Number of times the k-means algorithm will be run with different centroid seeds.

random_stateint, default=1

Determines random number generation for centroid initialization.

outdirstr, default=’’

Directory to save the clustering results. If empty, just prints to standard output.

Returns:
np.ndarray

The labels of the clusters for each frame.

See also

create_kmeans_input

blinds SSF Data for input to K Means

read_and_preprocess_data

reads and preprocesses SSF data for K Means analysis per dataset

Examples

>>> import stacker as st
>>> data_arrays = {
...     'dataset1': np.random.rand(3200, 16129),
...     'dataset2': np.random.rand(3200, 16129)
... }
>>> blinded_data = st.create_kmeans_input(data_arrays)
>>> st.run_kmeans(blinded_data, n_clusters=4)
Reading data: dataset1
Reading data: dataset2
(6400, 16129)
{'dataset1': array([800, 800, 800, 800]), 'dataset2': array([800, 800, 800, 800])}
Dataset: dataset1
    Cluster 1: 800 matrices
    Cluster 2: 800 matrices
    Cluster 3: 800 matrices
    Cluster 4: 800 matrices
Dataset: dataset2
    Cluster 1: 800 matrices
    Cluster 2: 800 matrices
    Cluster 3: 800 matrices
    Cluster 4: 800 matrices