Question

The file genomic_dna.txt contains a section of genomic DNA, and the file exons.txt contains a list...

The file genomic_dna.txt contains a section of genomic DNA, and the file exons.txt contains a list of
start/stop positions of exons. Each exon is on a separate line and the start and stop positions are
separated by a comma. The start and stop positions follow Python conventions; they start from zero
and are inclusive at the start and exclusive at the end. Write a program that will extract the exon
segments, concatenate them, and write them to a new file.
This is a tricky exercise with several parts. Before starting, think about how to divide the complexity of
the problem to easier tasks.
Your program will have to:
 read the exon file line by line
 split each exon line into two numbers
 turn those numbers into integers
 extract the matching part of the genomic DNA sequence
 concatenate all the exon sequences together

FILES:

exon.txt

5,58
72,133
190,276
340,398

gennomic_dna.txt

TCGATCGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGACTGATCGATCGATCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGATCGATCATATGTCAGTCGATGCATCGTAGCATCGTATAGTAGCTACGTAGCTACGATCGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTAGCTAGTACGATCGCGTAGCTAGCATGCTACGTAGATCGATCGATGCATGCTAGCTAGCTAGCTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCGATGCTACGTAGATCGATCGCTAGTAGATCGATCGCTAGCTAGCTGACTAGTACGCTGCTAGTAGTCAGCTAGATCGATGCTAGTCA

python
0 0
Add a comment Improve this question Transcribed image text
Answer #1

# dna.py

# Open file genomic_data.txt and store into fin file object
fin = open("genonmic_dna.txt","r")
# Read the contents of the file and store to dna_sequence
dna_sequence = fin.read()
# Close the file object
fin.close()

# Open file exons.txt and store into fin file object
fin = open("exon.txt","r")
# Read the contents of the file and split it by the lines and store to a list called exons
exons = fin.read().splitlines()
# Close the file object
fin.close()

# String to store the final sequence
final_sequence = ""

# Iterate through each range in the exons list (every value is in the form 'number,number' )
for exon in exons:
# Split it by the comma and store the two values to starting_position and ending_position
starting_position, ending_position = exon.split(',')
# Currently starting_position and ending_position are simply strings that represent a number
# Convert it to integer using int()
starting_position = int(starting_position)
ending_position = int(ending_position)
# Extract the matching part from the dna_sequence
final_sequence = final_sequence + dna_sequence[starting_position:ending_position]

# Open file object for multiple.txt in write mode
fout = open("multiple.txt","w")
# Write the final_sequence
fout.write(final_sequence)
# Close the file object
fout.close()

Sample Input/Output

Suppose our genomic_dna.txt is as follows:

TCGATCGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGACTGATCGATCGATCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGATCGATCATATGTCAGTCGATGCATCGTAGCATCGTATAGTAGCTACGTAGCTACGATCGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTAGCTAGTACGATCGCGTAGCTAGCATGCTACGTAGATCGATCGATGCATGCTAGCTAGCTAGCTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCGATGCTACGTAGATCGATCGCTAGTAGATCGATCGCTAGCTAGCTGACTAGTACGCTGCTAGTAGTCAGCTAGATCGATGCTAGTCA

And exons.txt as follows:

5,58
72,133
190,276
340,398

then by running the python program, we get a new file multiple.txt whose content is as follows:

ACTTACGTACTA

Add a comment
Know the answer?
Add Answer to:
The file genomic_dna.txt contains a section of genomic DNA, and the file exons.txt contains a list...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Write a Python function makeDict() that takes a filename as a parameter. The file contains DNA...

    Write a Python function makeDict() that takes a filename as a parameter. The file contains DNA sequences in Fasta format, for example: >human ATACA >mouse AAAAAACT The function returns a dictionary of key:value pairs, where the key is the taxon name and the valus is the total number of 'A' and 'T' occurrences. For example, for if the file contains the data from the example above the function should return: {'human':4,'mouse':7}

  • Python programming with file handling. File.csv contains a list of numbers, one integer on each line....

    Python programming with file handling. File.csv contains a list of numbers, one integer on each line. Calculate the average of every 10 lines of integers and write them into another file. For example, the first 10 lines that is line 1 to line 10 contains 61, 70, 71, 90, 81, 82, 85, 90, 61, 51 each on every line

  • The following genomic DNA sequence comes from the first exon of a human gene and contains...

    The following genomic DNA sequence comes from the first exon of a human gene and contains the 3'-end of the 5'-untranslated region and the start of a long open reading frame that codes for 200 amino acids (a.k.a. coding sequence). Note: There are no introns in this short portion and only one strand of the genomic DNA is shown. Which of the following answers lists the first three amino acids of the translated protein correctly? Seconed Position tyr ser leu...

  • PYTHON The text file motifFinding.txt contains two strings of DNA code, separated by a new-line character....

    PYTHON The text file motifFinding.txt contains two strings of DNA code, separated by a new-line character. Write a program that opens the file, and stores its contents into two strings, s and t, respectively.   Write code that will find all instances of the string t within the string s. At the end, your program should output the number of times the sub-string t occurs within s, along with the index of the starting position of each occurrence of t within...

  • In python Attached is a file called sequences.txt, it contains 3 sequences (one sequence per line)....

    In python Attached is a file called sequences.txt, it contains 3 sequences (one sequence per line). Also attached is a file called AccessionNumbers.txt. Write a program that reads in those files and produces 3 separate FATSA files. Each accession number in the AccessionNumbers.txt file corresponds to a sequence in the sequences.txt file. Remember a FASTA formatted sequence looks like this: >ABCD1234 ATGCTTTACGTCTACTGTCGTATGCTTTACGTCTACTGACTGTCGTATGCTTACGTCTACTGTCG The file name should match the accession numbers, so for 1st one it should be called ABCD1234.txt. Note:...

  • In python Attached is a file called sequences.txt, it contains 3 sequences (one sequence per line)....

    In python Attached is a file called sequences.txt, it contains 3 sequences (one sequence per line). Also attached is a file called AccessionNumbers.txt. Write a program that reads in those files and produces 3 separate FATSA files. Each accession number in the AccessionNumbers.txt file corresponds to a sequence in the sequences.txt file. Remember a FASTA formatted sequence looks like this: >ABCD1234 ATGCTTTACGTCTACTGTCGTATGCTTTACGTCTACTGACTGTCGTATGCTTACGTCTACTGTCG The file name should match the accession numbers, so for 1st one it should be called ABCD1234.txt. Note:...

  • Write a Python program that converts an input file in FASTA format, called "fasta.txt", to an...

    Write a Python program that converts an input file in FASTA format, called "fasta.txt", to an output file in PHYLIP format called "phylip.txt". For example, if the input file contains: >human ACCGTTATAC CGATCTCGCA >chimp ACGGTTATAC CGTACGATCG >monkey ACCTCTATAC CGATCGATCC >gorilla ATCTATATAC CGATCGATCG Then the output file should be human ACCGTTATACCGATCTCGCA chimp ACGGTTATACCGTACGATCG monkey ACCTCTATACCGATCGATCC gorilla ATCTATATACCGATCGATCG FASTA format has a description (indicated with a '>') followed by 1 or more lines of a DNA sequence. PHYLIP format has a description...

  • The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development...

    The deoxyribonucleic acid (DNA) is a molecule that contains the genetic instructions required for the development and functioning of all known living organisms. The basic double-helix structure of the DNA was co-discovered by Prof. Francis Crick, a long-time faculty member at UCSD 0 The DNA molecule consists of a long sequence of four nucleotide bases: adenine (A), cytosine (C), gua- nine (G) and thymine (T). Since this molecule contains all the genetic information of a living organism, geneticists are interested...

  • Random accesses to a file. A file contains a formatted list of 9999 integers that are...

    Random accesses to a file. A file contains a formatted list of 9999 integers that are randomly generated in the range of [1,9999]. Each integer occupies one single line and takes 4 characters' space per line. Alternatively, you can think that each number takes 5 characters' space, four for the number and one for the newline character. Write a C++ program using the seekg() and seekp() functions to insert the numbers 7777 through 7781 between the 6000-th and 6001-st numbers...

  • Write a Java program called EqualSubsets that reads a text file, in.txt, that contains a list...

    Write a Java program called EqualSubsets that reads a text file, in.txt, that contains a list of positive and negative integers (duplicates are possible) separated by spaces and/or line breaks. Zero may be included. After reading the integers, the program saves them in a singly linked list in the same order in which they appear in the input file. Then, without changing the linked list, the program should print whether there exists two subsets of the list whose sums are...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT