Problem-3 This problem is referred to as Exon chaining in bioinformatics. Each gene corresponds to a...
Problem-3 This problem is referred to as Exon chaining in bioinformatics. Each gene corresponds to a subre- gion of the overall genome (the DNA sequence); however, part of this region might be "junk DNA”. Frequently, a gene consists of several pieces called exons, which are separated by junk fragments called introns. This complicates the process of identifying genes in a newly sequenced genome. Suppose we have a new DNA sequence and we want to check whether a certain gene (a string) is present in it. Because we cannot hope that the gene will be a contiguous subsequence, we look for partial matches, fragments of DNA that are also present in the gene (actually, even these partial matches will be approximate, not perfect). We then attempt to assemble these fragments. Let x[1...n] denote the DNA sequence. Each partial match can be represented by a triple (li; ri;wi), where x[li... ri] is the fragment and wi is a weight representing the strength of the match it might be a local alignment score or some other statistical quantity). Many of these potential matches could be false, so the goal is to find a subset of the triples that are consistent (nonoverlapping) and have a maximum total weight. Show how to do this in O(n + m), where m is the number of partial matches.