An Explainable and Interactive Distributed Machine Learning System
Artificial Intelligence is facing some of its most relevant challenges to date, which are now not technical but ethical. Indeed, as models become more abstract and complex, and the detachment between human experts and data increases, mostly due to the automation of most tasks, it becomes increasingly difficult to understand and predict model behavior. As a consequence, models often fail in production, or exhibit bias or discriminant behavior.
The PI of the proposal has been tackling these challenges in project CEDEs – Continuously Evolving Distributed Ensembles (EXPL/CCI-COM/0706/2021), in which a distributed learning system with a focus on explainability, observability and transparency is being developed. These ethical principles are achieved especially through:
1) a general-purpose explainable model that provides explanations for predictions. Explanations are provided through a specifically developed proxy model, that is trained for each predictive model in the platform. Thus, explanations are independent from the internal structure of the model being explained, which is seen as a black box. Moreover, explanations may include text, images, measures of quality/certainty and counterfactuals. The workplan of the CEDEs project also foresees the development of a chatbot for interacting with the explainable model, so that explanations can be provided in an interactive way, adjusted to the needs and wishes of the user;
2) the use of a blockchain that records all the interactions of the stakeholders with the system. This results in a federated vision of a digital data/model marketplace, in which participants can contribute with and use predictive models, which can be aggregated to build powerful higher-level ensembles, while maintaining data ownership and privacy;
3) an observability module that constantly monitors specific business and ethical indicators (e.g. data freshness, data bias, data quality) and allows the organization to detect unwanted states in the system as soon as possible;
The main challenge faced so far in the CEDEs project is one of infrastructure: the outcomes of the project ought to be tested in a streaming scenario, at scale. However, this requires the availability of significant computational resources given the number of models that are trained by CEDEs.
The main goal of this project is to test CEDEs at scale, in a real scenario. To this end, the team will collaborate with the startup AnyBrain, co-founded by the PI, which will provide the data and case-study for a real-life validation. Specifically, AnyBrain develops software for e-sports athlete identification and fraud detection (e.g. cheating), based on data collected from the behavioral analysis of the interaction of athletes with their gaming devices. Infringing athletes may face significant consequences, such as being banned for a long time or permanently. It is therefore crucial that the models used by AnyBrain are transparent and able to explain the main reasons of its decisions to human decision-makers (i.e. right to explanation, in line with the GDPR).
The outcomes of this project are thus twofold:
1) to validate, from a technical and business perspective, the developments of the CEDEs project;
2) to assess, from an ethical perspective, the quality and usefulness of the explanations generated by CEDEs in a real-life application.