Directly to content
  1. Publishing |
  2. Search |
  3. Browse |
  4. Recent items rss |
  5. Open Access |
  6. Jur. Issues |
  7. DeutschClear Cookie - decide language by browser settings

A standardization concept for machine actionable and reusable scientific data

Wilbertz, Axel

[thumbnail of Dissertation_AxelWilbertz_PDFA.pdf]
Preview
PDF, English - main document
Download (16MB) | Lizenz: In Copyright - Rights reserved

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

Abstract

The reuse of scientific data through sophisticated algorithms has the potential to advance drug development (Mak & Pichika, 2019) (Narayanan et al., 2021) (Paul et al., 2021). For this, data standardization is required (Kush et al., 2020). Data standardization is the conversion of data into an agreed-upon format or against a reference. For biologics such as therapeutic monoclonal antibodies (mAbs), this is challenging because data is diverse, analytical methods are complex, and the data inherently suffers from ground noise (Taylor, 2021).

Laboratory automation and high-throughput concepts enable process standardization to cope with the challenges on the lowest level, which allows for fast and reliable generation of standardized data sets. Unfortunately, manual data integration is still required to determine which data sets are of sufficient quality to be reused through advanced analytics. Therefore, human scientists rely on their scientific expertise to make a decision on data set comparability, which is the capability to relate data sets to one another based on appropriate criteria with the goal of selecting only those data relevant for answering a specific question. However, machines are incapable of substituting the human factor because standardization concepts for biologics, which allow for data standardization, are either missing or unsuited. The limited existing concepts do not offer relevant metadata for assessing the quality of biologics data sets, which involve sophisticated analytical techniques like liquid chromatography - a critical method in drug development.

The Findable, Accessible, Interoperable and Reusable (FAIR) principles are such a data standardization concept, making scientific data machine actionable for machine and human reuse is the goal of these guiding principles (Wilkinson et al., 2016). Semantic web technologies are established in the FAIR community. In many cases, these are used to enable FAIR data. The question is whether these concepts are sufficient to render biologics analytical data machine actionable enough so that a machine has the same capability to act on the data as a human does. To achieve this, the machine requires human-like knowledge to determine the comparability of biologics data sets. FAIR and existing semantic concepts only partially address the problem because the comparability of data sets is not explicitly covered by FAIR. Additional steps are required to enable automated data set comparability for the machine.

In this thesis, the current level of biologics standardization is reviewed. For this, the suitability of standardization concepts like Allotrope and FAIR to standardize biologics data is elaborated. Furthermore, it has been identified that, for biologics data, these concepts are only partially usable to enable comparable data. As a result, a new standardization concept using semantic technologies that enables the automatic decision on biologics data set comparability, similar to a human scientist, is developed. The concept can be applied to other domains that face similar challenges with complex data integration and lack of standardization.

Document type: Dissertation
Supervisor: Stallkamp, Prof. Dr.-Ing Jan
Place of Publication: Heidelberg
Date of thesis defense: 11 December 2024
Date Deposited: 26 Feb 2025 14:21
Date: 2025
Faculties / Institutes: Medizinische Fakultät Mannheim > Mannheim Institute for Intelligent Systems in Medicine (MIISM)
DDC-classification: 000 Generalities, Science
004 Data processing Computer science
Uncontrolled Keywords: FAIR data, Formulation development, Reuseable data, Machine actionable data
About | FAQ | Contact | Imprint |
OA-LogoDINI certificate 2013Logo der Open-Archives-Initiative