Machine learning (ML) is a driving force for many successful applications in Artificial Intelligence. ML pipelines ensure guarantees on the entirety of the system (i.e., horizontal certification) as well as on each single component (i.e., vertical certification). The horizontal certification covers the full pipeline from data acquisition to data visualization or to model deployment. These pipelines start with data acquisition. While scientific experiments are designed for the analysis, companies normally store their data for usages other than analysis. Production and communication have streaming data from distributed sensors which need to be synchronized and combined. ML and database theory both investigate methods of data description, data compression, feature extraction and selection, as well as sampling. Data impurities may travel in an ML pipeline from data acquisition to other consecutive components, and impact the quality of the pipeline downstream. Some approaches optimize the overall process of data analytics. In case errors get diffused in an ML pipeline, data-driven debugging explanation techniques (aka data provenance) are required to describe where the errors originate from.
The vertical certification exploits the theory of ML to guarantee error bounds, sampling complexity, energy consumption, execution time, and memory and communication demands. Many approaches are based on statistical theory. While methods are implemented in a particular programming paradigm and hardware architecture, which testing procedures are readily available for the certification of a particular implementation of a method, and which need yet to be developed?
The robustness of algorithms refers to the relationship between changes in the data and changes in the learning outcome. How can this be measured and tested, efficiently?
The fairness relates to properties of the data, not only the properties of the learned model, but also to our knowledge of what is possible (e.g., females being leaders) although counterfactual.
Explainability of a ML process can be regarded from user and system perspectives. From the user perspective, we are interested in knowing what can be done to help users comprehend learned models and inspect their applications. From the system perspective, we are interested in knowing how can the learned models be characterized and finally certified. This is the key part of ML and theory underlying vertical certification.
The issue of responsibility for the data and the services built upon the data refers to the overall pipeline. Companies need a clear policy governing the overall ML pipeline. The policy introduces quality measures together with their testing routines. It also rules data rights. What are best-practice procedures for companies, and how can they be made easy? Following regulatory digital privacy legislations (e.g., GDPR in Europe), the donors of data have the right to be informed about their data storage and use. The High Level Expert Group on AI has delivered Ethics Guidelines on Artificial Intelligence and Policy and Investment Recommendations Policy and Investment Recommendations. An approach to the horizontal certification of AI applications is under development at KI.NRW by Fraunhofer IAIS.
Date: Tuesdays, 16:15 - 17:45 h, online
Literature (excluding books):