Question

3. [Bonus Problem] DNA Subsequence A DNA sequence is a sequence of some combination of the characters A (adenine), C (cytosine), G (guanine), and T (thymine) which correspond to the four nucleobases that make up DNA. Given a long DNA sequence, it is often necessary to compute the number of instance:s of a certain subsequence. For this exercise, you will develop a program that processes a DNA sequence from a file and, given a subsequences, searches the DNA sequence and counts the number of times s appears. As an example, consider the following sequence GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC If we were to search for the subsequence GTA, it appears twice. You will write a program (place your source in a file named dnaSearch.c) that takes, as command line inputs, an input file name and a valid DNA (sub)sequence. That is, it should be callable from the command line as follows: /dnaSearch dna01.txt GTA GTA appears 2 timesdna01.txt

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC
CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGC
CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGG
AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCC
CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAG
TTTAATTACAGACCTGAA

0 0
Add a comment Improve this question Transcribed image text
Answer #1

The answer is as follows:

The code is as follows:

#include <stdio.h>
#include <string.h>

void main( int argc, char *argv[] ) {

   char *filename;
   char *data;
   char *sequence;
   FILE *fp;
   int count = 0;

   filename = argv[1];
   data = argv[2];
   fp = fopen(filename, "r");
   if (fp != NULL){
      fscanf(fp, "%s",sequence);
      fclose(fp);
      while(sequence = strstr(sequence, data))
      {
         count++;
         sequence++;
      }
      printf("%s appears %d times\n", data, count);
   }
   else {
       printf("Error in opening file\n");
   }
  
}

Add a comment
Know the answer?
Add Answer to:
dna01.txt ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGC CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGG AAGCTCGGGAGGTGGCCAGGCGGCAGGA
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Question #2 A DNA molecule can be specified using a string of the characters , ‘g...

    Question #2 A DNA molecule can be specified using a string of the characters , ‘g 'a', 't'. Each of these characters represents one of the four nucleobases Cytosine, Guanine, Adenine, and Thymine. Consult the wikipedia page on DNA for more details. A codon consists of a sequence of three DNA nucle- obases and can be represented by a string of length 3 consisting of characters from the set ‘c', ‘g', ‘a", ‘t'. So "cga" and "ttg" are examples of...

  • please help!! In-lab Activity on DNA, DNA Replication and Pro Directions: Complete the color key belowing...

    please help!! In-lab Activity on DNA, DNA Replication and Pro Directions: Complete the color key belowing the Key to colors Adenine - blue Replication and Protein Synthesis Name the DNA modelin class. then awer the questions about the mode ThymineONOMI Deoxyribose Oangcor Red 1. Using the nitrogen base letters to repre your color key and this sequence of colors for the sen blue-blue-orange-yellow-orange-yello Cytosine yellow Guanine - Green COLOR CHOICES Yellow Purple Red Green Orange u to represent nucleotides, wide...

  • The contents of a file named, anput txt Is character strings, separated by one or more...

    The contents of a file named, anput txt Is character strings, separated by one or more white space characters (SPACE, TAB, NEWLINE) Write a program that reads the input file and counts the number of words in the input file Display the number of words on the screen For example If this is the contents of file anput txt Hello world! It asa bright day today How are you? The following will be displayed’ There are 11 words in the...

  • The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development...

    The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development and functioning of all known living organisms. The basic double-helix structure of the DNA was co-discovered by Prof. Francis Crick, a long-time faculty member at UCSD 0 The DNA molecule consists of a long sequence of four nucleotide bases: adenine (A), cytosine (C), gua- nine (G) and thymine (T). Since this molecule contains all the genetic information of a living organism, geneticists are interested...

  • Unwinds DNA strand to make replication fork. Adds free nucleotides to the growing daughter DNA strands...

    Unwinds DNA strand to make replication fork. Adds free nucleotides to the growing daughter DNA strands Adds short pieces of RNA to help DNA polymerase start Removes RNA and replaces with DNA Fuses or "glues" fragments of DNA together Proofreads or edits the DNA, checking for mistakes Given the following, DNA Sequence, what is the new daughter strand? (Did you label the 5' and 3' ends?) What is the name of the "fragments" of DNA on the lagging strand after...

  • Need Part C answered only. The table shows where different restriction endonucleases (restriction enzymes) cleave DNA....

    Need Part C answered only. The table shows where different restriction endonucleases (restriction enzymes) cleave DNA. The abbreviation R represents the purines (adenine and guanine). The pyrimidines (cytosine, thymine, and uracil) are abbreviated as Y. The abbreviation W represents adenine or thymine. Enzyme EcoRI EcoRV Target sequence 5' GAATTC 3' 3' CTTAAG 5 5' GATATC 3 3' CTATAG 5 5' GGCC 3 3' CCGG 5 5' AAGCTT 3' 3' TTCGAA 5 5'RGGWCCY 3 3' YCCWGGR 5 Cleavage 5'G AATTC 3'...

  • The table shows where different restriction endonucleases (restriction enzymes) cleave DNA. The abbreviation R represents the...

    The table shows where different restriction endonucleases (restriction enzymes) cleave DNA. The abbreviation R represents the purines (adenine and guanine). The pyrimidines (cytosine, thymine, and uracil) are abbreviated as Y. The abbreviation W represents adenine or thymine. Enzyme EcoRI EcoRV Target sequence 5' GAATTC 3 3' CTTAAG 5 5' GATATC 3 3' CTATAG 5 5' GGCC 3' 3' CCGG 5 5' AAGCTT 3 3' TTCGAA 5 5' RGGWCCY 3 3' YCCWGGR 5 Cleavage 5G AATTC 3' 3' CTTAA G 5'...

  • Write a program that reads each word from A1.txt and check if it's a palindrome or...

    Write a program that reads each word from A1.txt and check if it's a palindrome or not. Show your output in the file Bl.txt. The total number of words in the file can change. You must use c-string or character arrays. Using String datatype and strrev() function are not allowed in this problem. "A palindrome is a word, phrase, number, or other sequence of characters which reads the same backward or forward." (Wikipedia) Sample Input: series madam Sample Output: yes

  • LANGUAGE = JAVA (IDE Eclipse preferred) Lab 21 AEIOU Counter Objective: Write a program that reads...

    LANGUAGE = JAVA (IDE Eclipse preferred) Lab 21 AEIOU Counter Objective: Write a program that reads a file and counts the number of times the vowels ‘A’, ‘E’, ‘I’, ‘O’, ‘U’ occurs exactly in that order. It ignores consonants It ignores any type of space It ignores case The only thing it cannot ignore is if another vowel occurs out of order These count AEIOU aeiou hahehihohu Take it out These do not AEIuO Taco is good Take it over...

  • 12.13 (Count characters, words, and lines in a file) Write a program that will count the...

    12.13 (Count characters, words, and lines in a file) Write a program that will count the number of characters, words, and lines in a file. Words are separated by whitespace characters. The file name should be passed as a command-line argument, as shown in Figure 12.13 D Command Prompt exercise java Exercise12.13 Loan.java ile Loan.jaua has 1919 characters 10 words 71 lines lexercise Figure 12.13 The program displays the number of characters, words, and lines in the given file. This...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT