Morphism-Based Learning for Structured Data
In mathematics, morphism is a term that indicates structure-preserving mappings between mathematical structures of the same type. Linear transformations for linear spaces, homomorphisms for algebraic structures and continuous functions for topological spaces are examples. Many data researched in machine learning, on the other hand, can include mathematical structures in them. Strings are totally ordered sets, and trees can be understood not only as graphs but also as partially ordered sets with respect to an ancestor-to-descendent order and semigroups with respect to the binary operation to determine nearest common ancestor. In this paper, we propose a generic and theoretic framework to investigate similarity of structured data through structure-preserving one-to-one partial mappings, which we call morphisms. Through morphisms, useful and important methods studied in the literature can be abstracted into common concepts, although they have been studied separately. When we study new structures of data, we will be able to extend the legacy methods for the purpose of studying the new structure, if we can define morphisms properly. Also, this view reveals hidden relations between methods known in the literature and can let us understand them more clearly. For example, we see that the center star algorithm, which was originally developed to compute sequential multiple alignments, can be abstracted so that it not only applies to data structures other than strings but also can be used to solve problems of pattern extraction. The methods that we study in this paper include edit distance, multiple alignment, pattern extraction and kernel, but it is sure that there exist much more methods that can be abstracted within our framework.