Enumeration and symmetry of edit metric spaces
Is Version Of
This thesis addresses the problem of enumerating the size of the k-spheres of a fixed word for the edit metric as well as discussing the symmetry group of the edit graph in order to create an edit metric error correcting code. The application of such error correcting codes is to the correction of sequencing errors in DNA barcodes, short stretches of DNA incorporated into genomic libraries to identify source tissue or other information about the source of a given DNA sequence. As sequencing errors not only substitute characters but potentially insert and delete them, the traditional Hamming metric used in standard error correcting codes is not useful. The natural metric, the edit distance, is algebraically complex as compared to the Hamming metric. In this thesis the exact size of edit spheres of radius 1 and 2 are computed for any binary or q-ary string. The number of edit distance d neighbors of two special types of strings is also presented. Structural information about the edit metric space, in particular a representation as a pyramid of hypercubes and the symmetry group of the space, is given. This result begins the process of reconstructing the theory of error correcting codes for the edit metric and yields practical advice for the design of heuristic algorithms, e.g. evolutionary algorithms, used in practice to create error correcting DNA barcodes.