Directly to content
  1. Publishing |
  2. Search |
  3. Browse |
  4. Recent items rss |
  5. Open Access |
  6. Jur. Issues |
  7. DeutschClear Cookie - decide language by browser settings

Visual and Interactive Exploration of Omics Data

Ovchinnikova, Svetlana

[thumbnail of Visual_and_Interactive_Exploration_of_Omics_Data_pdfa.pdf]
Preview
PDF, English - main document
Download (7MB) | Terms of use

Citation of documents: Please do not cite the URL that is displayed in your browser location input, instead use the DOI, URN or the persistent URL below, as we can guarantee their long-time accessibility.

Abstract

Today many biological studies rely on high-throughput techniques that yield data on thousands of samples or cells with tens of thousands of measured features. Exploring such an amount of data poses a visualisation challenge, that can be solved by switching from static plots to interactive ones. This provides a way of intuitive navigation through large datasets in a manner that helps the user to grasp the bigger picture visually. The field of interactive visualisation of biological data is an actively developing one. However, it is more often used only for data presentation and is still not so common during a research project's early, exploratory stages. This project is aimed to explore and propose solutions to fill this “visualisation gap”.

I investigate the possible benefits for biological studies of the combination of JavaScript and R programming languages. R is one of the most common tools in biostatistics and provides a wide variety of implemented libraries for processing omics data. JavaScript is a language for enabling user's interaction with a web page and nowadays is used by most available web resources. Thus, the two languages are very effective in their own application fields, and interactive visualisation of omics data lies precisely in their combination. As an outcome of the project, I present three R packages (one of which also provides a purely JavaScript interface) for data visualisation.

The first one, "jrc", is intended for package developers and serves as a foothold for further project steps. "jrc" provides direct communication between a web page and a running R session. It allows the user to run R code from a web page and execute JavaScript code from an R session. In addition, it provides a basis for publicly available interactive apps deployed on a server. With this, "jrc" can be used as a foundation for the packages that use JavaScript to visualise data stored and processed in an R session.

The second one is "sleepwalk". It is a simple but effective tool to explore distortions introduced by dimensionality reduction techniques when visualising biological data. These approaches (such as MDS, t-SNE, UMAP, etc.) are particularly but not exclusively popular in single-cell studies. There, researchers commonly visualise cells as points on 2D embedding and then study the obtained clusters and trajectories. "sleepwalk" helps one explore the underlying patterns of such plots by interactively comparing the displayed neighbourhoods to the original distances in the high-dimensional feature space.

Finally, "rlc" (or LinkedCharts) is a plotting library that allows one to construct one's interactive app with minimal effort and coding skills. It is designed in a way that does not require users to learn special complex syntax. Instead, I adjusted it to routinely used practices in the omics data exploration. I also left a broad space for customisation so that users could adapt the apps to their particular tasks rather than trying to fit their data into the predefined templates. With all this, the "rlc" can be a powerful tool to facilitate exploratory data analysis by interactive visualisation. It centers on but is not limited to the idea of linking multiple charts when a user's manipulation with one plot affect another one (for example, a click on a point shows more specific information on the thus selected sample).

Overall, the packages presented here can be helpful when applied in everyday analyses and serve as a basis and inspiration for new solutions in the interactive visualisation of biological data.

Document type: Dissertation
Supervisor: Anders, Prof. Dr. Simon
Place of Publication: Heidelberg
Date of thesis defense: 26 November 2021
Date Deposited: 18 Mar 2022 08:51
Date: 2022
Faculties / Institutes: The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences
DDC-classification: 004 Data processing Computer science
500 Natural sciences and mathematics
570 Life sciences
About | FAQ | Contact | Imprint |
OA-LogoDINI certificate 2013Logo der Open-Archives-Initiative