stacker.kmeans.run_kmeans

stacker.kmeans.run_kmeans#

run_kmeans(data_arrays, n_clusters, max_iter=1000, n_init=20, random_state=1, outdir='')[source]#

Performs KMeans clustering on blinded SSF data and saves the results.

This function applies the KMeans clustering algorithm to the provided blinded SSF data, assigns each frame to a cluster, and counts the number of frames in each cluster for each dataset. The results are printed and saved to a file.

Parameters:

data_arraysdict: Output of read_and_preprocess_data(). Dictionary where keys are dataset names and values are the processed data arrays.
n_clustersint: The number of clusters to form
max_iterint, default=1000: Maximum number of iterations of the k-means algorithm for a single run.
n_initint, default=20: Number of times the k-means algorithm will be run with different centroid seeds.
random_stateint, default=1: Determines random number generation for centroid initialization.
outdirstr, default=’’: Directory to save the clustering results. If empty, just prints to standard output.

Returns:

np.ndarray: The labels of the clusters for each frame.

See also

create_kmeans_input: blinds SSF Data for input to K Means
read_and_preprocess_data: reads and preprocesses SSF data for K Means analysis per dataset

Examples

>>> import stacker as st
>>> data_arrays = {
...     'dataset1': np.random.rand(3200, 16129),
...     'dataset2': np.random.rand(3200, 16129)
... }
>>> blinded_data = st.create_kmeans_input(data_arrays)
>>> st.run_kmeans(blinded_data, n_clusters=4)
Reading data: dataset1
Reading data: dataset2
(6400, 16129)
{'dataset1': array([800, 800, 800, 800]), 'dataset2': array([800, 800, 800, 800])}
Dataset: dataset1
    Cluster 1: 800 matrices
    Cluster 2: 800 matrices
    Cluster 3: 800 matrices
    Cluster 4: 800 matrices
Dataset: dataset2
    Cluster 1: 800 matrices
    Cluster 2: 800 matrices
    Cluster 3: 800 matrices
    Cluster 4: 800 matrices

stacker.kmeans.run_kmeans

Contents

stacker.kmeans.run_kmeans#