Using Model Behaviour to Develop an Explainability-Driven Strategy for Responsible Data Auditing
By: Paul McInnis
When applying AI to an enterprise, developers will often select a popular benchmark dataset to train their models, assuming that these datasets are high quality. This is not always the case and issues in the datasets can negatively affect the real-world application of the resulting models. The methods generally employed to audit the data usually only consider the data’s characteristics and do not take model behavior into account, relying heavily on human intuition and missing key factors that affect the model’s results. Our research team has taken a different approach where they examined an explainability-driven strategy for data auditing, providing actionable insights based on what a model does.
To conduct this explainability-driven data auditing, the team created a dummy model prototype that was created and trained with the data in question. We then fed the data back into the prototype and applied DarwinAI’s quantitative explainability technology to identify the critical factors driving the behaviour of the prototype across the data. Finally, we studied those critical factors and found some problems with the dataset. For this study, the team used the OSIC Pulmonary Fibrosis Progression dataset and the CNCB COVID-19 CT dataset. These datasets are healthcare related, but the results of the study can be extrapolated to any field using image-based datasets.
Explainability-driven data auditing workﬂow
Discovering Hidden Data Quality Issues
We audited data by randomly selecting correct and incorrect predictions and applying explainability to mask areas of the images critical to the prediction. We sanity-checked the masks for plausible explanations involving lung tissue. The explainability-driven data auditing led to the discovery of several hidden data quality issues that caused the dummy model prototypes to make predictions for the wrong reasons, even if the overall performance was high based on scalar metrics. These include:
- Incorrect calibration metadata led to the model using the data dynamic range erroneously to make predictions.
- Synthetic padding introduced during data curation erroneously guided predictions.
- Circular artifacts erroneously guided predictions.
- The model used imaging of the patient table surface to make predictions.
Data quality issues with critical factors highlighted:
1) Synthetic padding 2) Circular artifacts 3) Table surface detection
Once we determined these things were happening, we were able to take concrete actions to remedy these data issues: removing incorrect calibration metadata, providing domain-specific artifact mitigation (e.g., circular artifacts and synthetic padding), and automatically removing tables (via table-specific HU thresholding).
Actionable insights are taken to improve data quality
Results And Conclusion
Cleaning the data led to the creation of a deep CNN regression model with state-of-the-art performance on the OSIC dataset, making predictions for the right reasons.
We hope this explainability-driven strategy can complement data-driven strategies to facilitate more responsible machine learning-driven computer vision development.
Training on cleaned data yields a model with explainability masks showing predictions based on relevant visual anomalies