Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...

Question

Question

Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with yo 4. The next step is to write a function named forward_frames that takes one argument seq. This function will identify all the Identify ORFs in the provided genomic segment 1. Run gene_finder on the human_chr9_segment.fasta file with these arguments: m 5 def read_one_seq_fasta(fasta file): Read a FASTA file that contains one sequence. *** seg = with open(fasta_file, r) as $54 # Tests for one_frame function. Should print True in all cases. 55 print(\none_frame Tests) 56 print (one_frame(ATGTGAA$

Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...

Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with your code. Notice that we included test cases under every function. If you run the project_01_template.py file at this point it should print False for each test. After you write the correct code for each function, and then run the file, it should print True for each test. 1. Write a function named gc_content that takes one argument sed and performs the following tasks: a. Counts the number of G and C characters in the sequence and adds these counts. b. Divides the GC count total by the length of the sequence and returns this fraction, 2. Write a function named get_orf that takes one argument seq and performs the following tasks: a. The function assumes that the given DNA sequence begins with a start codon ATG. b. Finds the first in-frame stop codon, and returns the sequence from the start to that stop codon. The sequence that is returned should include the start codon but not the stop codon. c. If there is no in-frame stop codon, get_orf should assume that the reading frame extends through the end of the sequence and simply return the entire sequence. Note: This function is very similar (but not identical) to the one you wrote for Assignment 5. 3. Write a function named one_frame that takes one argument seq and performs the following tasks: a. The function searches given DNA string from left to right in multiples of three nucleotides (in a single reading frame). b. When it hits a start codon ATG it calls get_orf on the slice of the string beginning at that start codon. c. The ORF returned by get_orf is added to a list of ORFS. d. The function skips ahead in the DNA string to the point right after the ORF that we just found and starts looking for the next ORF. e. Steps a through d are repeated until we have traversed the entire DNA string. The function should return a list of all ORFs it has found. Hint: A while loop that uses a call to get_orf will be very convenient here. A function could be written using a for loop, but it may be more difficult. Here is one example of one frame function in action: >>> one_frame('AATGCCATGTGAATGCCCTAA) ['ATG', 'ATGCCC'] Note that the first ATG (index [1:4]) was not returned. This is because the first ATG is not in the frame that one frame is searching. We will search those other frames by additional calls to one_frame in the next steps of this project. Here is another example of one_frame function in action: >>> one_frame ("ATGOCCATGGGGAAATTTTGACCC') ['ATGCCCATGGGGAAATTT'] Note that, in this case, there is a second ATG in the sequence (index [6:9]). This ATG is part of an ORF which ends with the same stop codon as the first ORF (that is, it is a smaller nested open reading frame). one_frame skipped this second ORF when it jumped ahead to the end of the first ORF, while looking for a stop codon. This is exactly what we want one_frame to do; we are focusing on large open reading frames, and will skip small nested ones.

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Code implemented in python:

Note: Comments are written, minimal tests are performed to check if code is working

Code:

def read_one_seq_fasta(fasta_file):
seq=''
with open(fasta_file,'r') as f:
f.readline()
for line in f.readlines():
seq = seq + line[:-1]
return seq
def get_orf(seq):
'''this func finds orf when seq starts with ATG and ends in but doesnt include stop codon'''
cod = -3 #started with - 3 to account for the early counter change
while cod < len(seq):
cod +=3
codon = seq[cod:cod+3]
if codon in ['TGA','TAG','TAA']: #cuts off the stop codon from final seq
return seq[:cod]
return(seq)

def one_frame(seq):
'''This func outputs a list of the orfs in inputted sequence'''
nuc = -3 #started with -3 to account for early counter change
orf_list = [] #template for final list
while nuc < len(seq):
nuc += 3
if seq[nuc:nuc+3] == 'ATG':
orf_list.append(get_orf(seq[nuc:])) #calls get_orf when finds 'ATG'
nuc = nuc+len(get_orf(seq[nuc:])) #this length accounts for the length of the orf and adds to origanl
return orf_list

def forward_frames(seq):
'''This func finds all the possible orfs in a sequence places them all in one list'''
total_list = [] #created to be used as template for final list
slic = 0
while slic < 3:
total_list.extend(one_frame(seq[slic:])) #used extend to have only one list of all the orfs
slic += 1
return total_list

# copy and pasted this function from lab#5
def gc_content(seq):
'''This func returns the fraction of G and C in DNA'''
num_g = seq.count('G')
num_c = seq.count('C')
tot_gc = num_c + num_g
fract_gc = tot_gc / len(seq)
return fract_gc

def gene_finder(file_name, min_len, minGC):
'''this func takes all the orfs in a given file with the given requirements'''
final_list = []
sal = open(file_name, 'r')
contents = sal.read()
orf = find_all_orfs(contents)
index = 0
for seq in orf: #for each sequence in that list
if (len(orf[index]) >= min_len) and (gc_content(orf[index]) >= minGC): #parameter requirments
one_list = [] #created to be added in the final list
one_list.append(seq)
one_list.append(len(seq))
one_list.append(gc_content(seq))
final_list.append(one_list)
#print(index)
index += 1
sal.close()
print(final_list)
print(gc_content('ATGTGAA'))
print(get_orf('ATGTGAA'))
print(forward_frames('ATGATGAGATGAACCATGGGGTAA'))

Code Screenshots:

au AWN cadenen en ek na kina re A RAR 1 def read_one_seq_fasta(fasta file): seq= with open(fasta_file, r) as f: f.readlin UIT 115L.dppenugel Ullsey UL. #DIIS gel UIT Wien TITUS AIG nuc = nuc+len(get_orf(seq[nuc:])) #this length accounts for the le

Code Output (Few tests):

0.2857142857142857
ATG
['ATGATGAGA', 'ATGGGG', 'ATGAACCATGGGGTAA']

Working code output screenshot:

0.2857142857142857 ATG [ATGATGAGA, ATGGGG, ATGAACCATGGGGTAA ]

If you like my answer, hit thumbs up . Thank you.

Add a comment

Answer 2

Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...

Homework Answers

Add Answer to:
Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...

Post as a guest

Earn Coins

Please develop a Java program to read in a piece of DNA sequence from a FASTA format sequence fil...

python 2..fundamentals of python 1.Package Newton’s method for approximating square roots (Case Study 3.6) in a...

Please Complete the following C Code with Comments explaining your solution and post a screenshot of...

In this problem, you should write one function named copy and increment. This function will have...

# DISCUSSION SECTION WORK: # # 1. STUDENTS: download this file, ds4.py, and wordsMany.txt, from #...

I'm a bit confused on how to get this program to run right. Here are the...

Copy the following Python fuction discussed in class into your file: from random import * def...

+ Run C Code IMPORTANT: • Run the following code cell to create the input file,...

C++: Translating mRNA sequence help Homework Description Codon 1 You are working in a bioinformatics lab...

2D Lists + File I/O In a comma-separated input file named results.txt, you have been given...

Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...

Homework Answers

Add Answer to: Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...

Post as a guest

Earn Coins

Add Answer to:
Roadmap To start, use the provided template file (on Blackboard): project_01_template.py. Replace the pass statements with...