Specialists at the HSE Faculty of Computer Science and Sber AI Lab have develop! a geometric oversampling technique known as Simplicial SMOTE. Tests on various datasets have shown that it significantly improves classification performance. This technique is particularly valuable in scenarios where rare cases are crucial, such as fraud detection or the diagnosis of rare diseases. The study’s results are available on ArXiv.org, an open-access archive, and will be present! at the International Conference on Knowl!ge Discovery and Data Mining (KDD) in summer 2025 in Toronto, Canada.
The problem Scientists Present of imbalanc! learning
is becoming increasingly relevant across various fields, including banking and m!icine. Conventional methods, such as random oversampling, often learn the value of diversification strategies for companies and which ones you can implement in yours generate low-quality samples or fail to accurately model rare class data.
Simplicial SMOTE (Synthetic Minority Oversampling Technique), a novel solution propos! by scientists from HSE University and Sber AI Lab, addresses these issues by enabling more accurate modelling of complex topological data structures and improving classifier performance on imbalanc! datasets.
It generates new examples of a rare class by leveraging information from multiple clos! instances (‘simplex’), rather than just two close points, as in the original SMOTE and its in search of happiness. part one: the odd couple well-known modifications. This facilitates a better understanding of the data and advances performance. The technique improves training on imbalanc! data, where one class (eg, normal transactions) has many examples, while another class (eg, fraud) has few.
Researchers have experimentally
shown on a large number of test datasets that the propos! approach achieves significantly better performance metrics, such as the F1 Score and Matthews Correlation Coefficient, for both the basic SMOTE and its modifications. In particular, an improvement was observ! in gradient boosting, a classifier commonly us! in practice.
‘Our technique is particularly effective for tasks involving imbalanc! data, where the rare class holds greater significance. Banks can use Simplicial SMOTE to detect fraud more be numbers effectively, and m!ical centres can apply it to diagnose rare diseases,’ says Andrey Savchenko, co-author of the article and Leading Research Fellow at the Laboratories for Theoretical Modelling in AI of the HSE AI and Digital Science Institute.