Contributed Talk - Splinter EScience

Thursday, 15 September 2022, 17:10   (SFG 1040 / virtual eScience)

Exploring the Provenance of Astronomical Workflows

Michael Johnson, David J. Champion, Marta Dembska, Hans-Rainer Klöckner, Kristen Lackeos, Albina Muzafarova, Marcus Paradies, Sirko Schindler
Max Planck Institute for Radio Astronomy & DLR Institute of Data Science

We present VAMPIRA, a tool designed to automatically generate provenance for data-intensive astronomical pipelines. The generated provenance describes the record of the processing, data and associated metadata, infrastructure, and the users involved within a pipeline as well as the relations between each aforementioned item. With access to this information, astronomers will be able to make informed decisions on the trustworthiness of data products, pipelines, or pipeline components - therefore helping to solve the so-called “black box problem” prevalent within artificial intelligence (AI) research. Modern AI has an intricate and complex architecture which can reduce understandability and in turn may then raise concerns over their trustworthiness and reliability. The importance of provenance within astronomical AI applications is exacerbated by the up to exabyte scale datasets expected from the next generation of astronomical survey telescopes.