eprintid: 35749 rev_number: 17 eprint_status: archive userid: 8616 dir: disk0/00/03/57/49 datestamp: 2024-12-04 14:47:54 lastmod: 2025-01-07 14:54:12 status_changed: 2024-12-04 14:47:54 type: doctoralThesis metadata_visibility: show creators_name: Gierend, Kerstin title: Collection and modeling of data provenance with an integrated metadata concept in the context of biomedical workflows in Data Integration Centers subjects: ddc-004 subjects: ddc-600 subjects: ddc-610 divisions: i-65300 adv_faculty: af-06 keywords: Data provenance; secondary use; data integration center; research data management cterms_swd: Biomedizin, Informatik cterms_swd: Herkunft , Daten , Wiederverwendung abstract: In the context of the Medical Informatics Initiative funded by the German government, medical data Integration centers have implemented complex data flows to load routine health care data into research data repositories for secondary use. Data management practices to (sensitive) medical data elements are of key importance throughout these processes, but less scientific work has so far been undertaken to examine and enforce the data provenance aspects in this specific medical use case. Insufficient knowledge about these medical data and processes can lead to validity risks and weaken the quality of the extracted data. This cumulative dissertation presents the combination of a two-stage methodological approach to facilitate extensive provenance information enrichment in the data Integration pipelines. A MIRACUM wide mixed-method study investigated both, the data management maturity Status and provenance readiness and presented recommendations. The subsequent proof-of-concept study took up this outcome to model and implement an algorithm gathering, storing and extracting continuously relevant provenance information on medical data element level and achieved satisfying pipeline execution times. Overall, the implemented provenance tracking solution indicates a high degree of traceability, accuracy, and reliability of the transformed medical data elements, with which a data Integration center can meet any accountability obligations. In addition, this dissertation serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as a key factor for quality and FAIR sustained health and research data. This thesis enabled for the first-time extensive provenance information enrichment in the data Integration pipelines in a German medical data Integration center. The dissertation anticipates recommendations enforce quality of patient data dissemination and guide the implementation of auditable and measurable provenance approaches. This development has a potentially broad application since it contributes as initial work to the envisioned European Health Data Space. date: 2024 id_scheme: DOI id_number: 10.11588/heidok.00035749 ppn_swb: 1913724514 own_urn: urn:nbn:de:bsz:16-heidok-357497 date_accepted: 2024-11-18 advisor: HASH(0x55e83af4d190) language: eng bibsort: GIERENDKERCOLLECTION full_text_status: public place_of_pub: Heidelberg citation: Gierend, Kerstin (2024) Collection and modeling of data provenance with an integrated metadata concept in the context of biomedical workflows in Data Integration Centers. [Dissertation] document_url: https://archiv.ub.uni-heidelberg.de/volltextserver/35749/1/KerstinGierend_KumulativeDissertation_pdfa.pdf