title: Exploring Graph Structures: Learning and Analysis creator: Fita Sanmartin, Enrique subject: ddc-004 subject: 004 Data processing Computer science subject: ddc-510 subject: 510 Mathematics description: Graphs are versatile data structures with great flexibility to represent various kinds of information due to their capability to model relationships between entities. In this work, we explore diverse applications of graphs in machine learning, with a particular focus on spanning trees. First, we introduce the ``Directed Probabilistic Watershed'' algorithm, a graph-based semi-supervised learning method for classification on directed graphs. This algorithm builds a probability distribution over directed forests to infer the label of a node. It does this by considering the probability of a directed forest connecting the node to one of the given labeled nodes within the same subtree. Next, building on a similar probabilistic framework, we investigate the node degree distribution in a random spanning tree. We provide analytical expressions for both the expected value and variance of the degree of a node in a spanning tree. Subsequently, we analyze different node metrics in graphs, focusing on distances computed based on the paths connecting nodes. We frame several popular distance metrics, namely the shortest path distance, the commute cost distance, the minimax distance and the potential distance, as instances of the ``algebraic path problem'', which is a generalization of the shortest path problem. We introduce the ``log-norm'' distance, a node metric that interpolates between the aforementioned metrics. Furthermore, we establish certain conditions that are sufficient and necessary for an algebraic path problem to define a metric across any given graph. Finally, returning to the study of spanning trees, we focus on the geometrical stability. We introduce the ``central spanning tree,''a parameterized family of spanning trees embedded in Euclidean space. By conducting empirical tests, we showcase its resilience against perturbations such as noise in node coordinates. We formulate the central spanning tree as a minimization problem, establishing connections to other well-known spanning tree problems such as the minimum spanning tree or the Euclidean Steiner tree. Additionally, we explore a variant of the central spanning tree, referred to as ``branched central spanning tree'', that allows for the inclusion of Steiner points. We demonstrate the efficacy of this variant in modeling the skeleton of 3D point clouds representing plants. date: 2024 type: Dissertation type: info:eu-repo/semantics/doctoralThesis type: NonPeerReviewed format: application/pdf identifier: https://archiv.ub.uni-heidelberg.de/volltextserverhttps://archiv.ub.uni-heidelberg.de/volltextserver/34851/1/PhD_thesis.pdf identifier: DOI:10.11588/heidok.00034851 identifier: urn:nbn:de:bsz:16-heidok-348510 identifier: Fita Sanmartin, Enrique (2024) Exploring Graph Structures: Learning and Analysis. [Dissertation] relation: https://archiv.ub.uni-heidelberg.de/volltextserver/34851/ rights: info:eu-repo/semantics/openAccess rights: http://archiv.ub.uni-heidelberg.de/volltextserver/help/license_urhg.html language: eng