Clustering Challenges
Clustering is one of the most widely used methods in unsupervised analysis. Yet its results are often unstable, arbitrary and difficult to justify — limitations incompatible with Responsible AI and the AI Act.
Why Clustering Is Problematic
Unlike supervised models, clustering relies on no ground truth. Algorithms must “invent” a structure from the data, which introduces:
- high variability of results
- dependence on hyperparameters
- arbitrary choices that are hard to justify
- lack of explainability
- limited reproducibility
These limitations become critical in a demanding regulatory context such as the AI Act.
Variability and Instability
Different Results at Each Execution
Many algorithms (k‑means, GMM, spectral clustering…) produce different results depending on initialization. Two identical runs may yield different clusters.
The k‑means objective function aims to minimize:
$$ J = \sum_{i=1}^{k} \sum_{x \in C_i} \| x - \mu_i \|^2 $$
This minimization depends heavily on the initialization of the centers $\mu_i$, which explains the variability.
Sensitivity to Data
A slight modification of the dataset can lead to a completely different segmentation. This instability makes justification difficult for an auditor.
Arbitrariness of Hyperparameters
Choosing the number of clusters, distance metrics or density parameters often relies on heuristics. These choices are rarely scientifically justifiable.
Example: the “silhouette score”, often used to choose $k$:
$$ s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} $$
This type of metric has no strong regulatory or scientific justification, limiting its use under the AI Act.
Problem for the AI Act
The AI Act requires explainable, documented and reproducible decisions. Arbitrary hyperparameters are incompatible with these requirements.
Lack of Explainability
Clusters are often difficult to interpret: why does a given individual belong to a given group? Traditional algorithms do not provide clear narrative or mathematical justification.
This lack of explainability limits the use of clustering in sensitive or regulated contexts.
Building a Responsible AI Culture
Responsible AI is not only a regulatory requirement — it is a strategic capability. MathIAs+™ Academy helps your teams master modern, sovereign practices.
Explore the Academy