In the fast-evolving area of AI, the AI-DAPT framework stands out for emphasizing data quality and valuation as
driving factors for more accurate and robust Machine Learning (ML) models. Data valuation is a critical process
that assesses the quality and the contribution of data in relation to ML models performance or an entire AI
system. The data valuation encloses several aspects [6], including:

In the context of the AI-DAPT framework, data valuation is considered multi-dimensional and supportive of the
whole lifecycle of ML models. It features methods for assessing data quality, improving feature selection,
detecting biases and optimizing model interpretability. This blog post discusses state-of-the-art data valuation
methods and outlines how the AI-DAPT project will apply these methodologies to assess and enhance data
quality.

DATA VALUATION METHODS AND PURPOSES

Data valuation is crucial for model performance: the higher the data quality, the more accurate and precise the
model’s predictions will be [1]. A key aspect of assessing data quality is ensuring that models are not trained on
irrelevant features. One common approach is systematically training models while excluding one feature at a
time [2]. This can allow the identification of features that have the most significant impact, as well as those that
might cause overfitting. Such methods are computationally expensive due to the need for retraining multiple
models.
One of the most widely used techniques for evaluating feature importance and data relevance is Shapley values,
a game-theoretic approach that fairly attributes contributions to individual features [3]. This method addresses
the “black-box” challenge accompanied with deep learning models, enhancing data and model explainability
while also supporting tasks such as feature importance analysis and bias detection, which are crucial in the
context of AI-DAPT.

DATA VALUATION IN AI-DAPT

The AI-DAPT framework is based on the principle of understanding that data quality plays a critical role in the
performance of ML models. In domains where data quality and interpretability are particularly crucial, robust
data valuation methods are essential. Additionally, AI-DAPT emphasizes computational efficiency and
scalability, recognizing that effective data valuation must account for the resources required to process, analyze,
and interpret large datasets in real-time. By combining computational techniques, statistical methods, and

domain-specific considerations, AI-DAPT employs a multifaceted approach that ensures both the integrity and
effectiveness of its models, while optimizing for performance and scalability in complex, data-driven
environments.

reweighting or resampling techniques will be used when necessary. Finally, we will utilize state-of-the-
art open-source tools like IBM AI Fairness 360 [5], which offers metrics and algorithms that detect and

mitigate biases in data, helping ensure fairness across different attributes.
Our findings will be instrumental in optimizing data collection strategies in the field of our demonstrators.
Therefore, the datasets that will be collected and utilized in the context of AI-DAPT in the domains of healthcare,
robotics, energy, and manufacturing, will contribute to unbiased, high-quality findings and support future
research in these emerging domains.

CONCLUSION

This aspect of data valuation that AI-DAPT focuses on shows one very simple yet important reality of AI: whatever
comes out is only as good as what goes in. By improving the methodology behind data valuation, AI-DAPT is
setting the stage for any future AI applications to ensure not just that they will be powerful and efficient but also
that they are reliable and fair.

REFERENCES

[1] K. Jiang, W. Liang, J. Y. Zou and Y. Kwon, “Opendataval: a unified benchmark for data valuation,” Advances in Neural
Information Processing systems, vol. 36, 2023.
[2] Ghorbani, Amirata, and James Zou. “Data shapley: Equitable valuation of data for machine learning.” International
conference on machine learning. PMLR, 2019.
[3] Jia, Ruoxi, et al. “Towards efficient data valuation based on the shapley value.” The 22nd International Conference
on Artificial Intelligence and Statistics. PMLR, 2019.
[4] M. Huang and R. Rust, “A strategic framework for artificial intelligence in marketing,” Journal of the Academy of
Marketing Science, vol. 49, pp. 30-50, 2021.
[5] “AI Fairness 360 – IBM,” [Online]. Available: https://aif360.res.ibm.com/
[6] Miller, Russell, et al. “A Framework for Current and New Data Quality Dimensions: An Overview.” Data 9.12 (2024):
151.

Leave a Reply

Your email address will not be published. Required fields are marked *