Data Clustering

This program and the accompanying materials are made available under the terms of the Eclipse Public License v2.0 which accompanies this distribution, and is available at https://www.eclipse.org/legal/epl-v20.html SPDX-License-Identifier: EPL-2.0

Copyright Contributors to the Zincware Project.

Description: Methods to help with clustering data.

symdet.utils.data_clustering.compute_com(data: ndarray)[source]

Compute the center of mass of some data.

Parameters

data (np.ndarray) – Data on which to compute the center of mass.

symdet.utils.data_clustering.compute_radius_of_gyration(data: ndarray, com: ndarray)[source]

Compute the radius of gyration of some data.

Parameters
  • data (np.ndarray) –

  • com (np.ndarray) –

symdet.utils.data_clustering.range_binning(image: Tensor, domain: Tensor, value_range: list, bin_operation: list, representatives: int = 100) dict[source]

A method to apply simple range binning to some data.

Parameters
  • image (tf.Tensor) – data to cluster.

  • domain (tf.Tensor) – data pool to return clustered.

  • representatives (int) – Number of class representatives to have for each bin.

  • value_range (list) – The parameters within which to bin e.g. k in [-5, 5]

  • bin_operation (list) – Operation to apply to the bins e.g [1/5, 1e-3] will lead to bins of the form [k/5 - 1e-3, k/5 + 1e-3]

Returns

classes – Data class numbers and their data representatives as a dictionary.

Return type

dict