New Machine Learning Framework Shows Promise in Detecting Tax Fraud

Tax fraud poses a considerable challenge for governments worldwide, resulting in significant financial losses. To enhance fraud detection capabilities and safeguard government revenues, tax authorities are increasingly turning to machine learning strategies. However, current detection strategies have limitations, prompting the need for a novel approach.

A recent publication by researchers from King Saud University introduces a groundbreaking machine learning framework for tax fraud detection. Unlike traditional approaches, this framework combines supervised and unsupervised models, utilizing ensemble learning paradigms to improve accuracy and comprehensiveness.

The framework consists of four modules:

1. Supervised Module: Implements an Extreme Gradient Boosting (XGBoost) model to classify tax returns into different groups. The model generates a matrix representing the assignment of the tax return to leaf nodes in each tree, which serves as input for the next module.

2. Unsupervised Module: Utilizes autoencoders to identify anomaly features in the original data. By encoding and regenerating the input, anomalies are detected based on the regeneration error. The resulting matrix and anomaly scores are fed into the next module.

3. Behavioral Module: Measures a compliance score for each taxpayer, considering audit outcomes and time. This score reflects compliance or non-compliance over time, providing valuable information for fraud detection.

4. Prediction Module: Combines all engineered features to predict tax fraud. It takes input from the supervised module, unsupervised module, and behavioral module, using two classifiers (Artificial Neural Network and Support Vector Machine) to test the performance of the engineered features.

The evaluation study conducted using data from the Saudi Zakat, Tax, and Customs Authority demonstrated promising results. The Artificial Neural Network model showed high precision in predicting the “fraud” class. The framework outperformed models using only original data, showcasing its potential for global adoption.

Despite its success, the framework has some limitations. These include assumptions of homogeneous behavior within sectors/business sizes and compliance scores close to zero for many taxpayers. Nevertheless, this innovative approach significantly enhances tax authorities’ capabilities in detecting tax fraud. The framework’s integration of supervised and unsupervised models with behavioral compliance scores offers a potential paradigm shift in fraud detection, promoting more accurate and comprehensive measures.

The source of the article is from the blog anexartiti.gr