QUESTION : Look at the entire length of the
sequences. What do you notice about the electropherogram peaks and
quality scores at nucleotide positions labeled “N”? Describe the
quality scores at these “N”. Where do you find more Ns? (at the
5’and 3’ends of the sequence or in the middle of the
sequence). Why is it important to remove excess N’s from the
sequences?
At "N" positions, peaks represent different nucleotides have similar amplitudes (heights) and overlap, or no single peak rises above the background of lower amplitude peaks. Hence, quality scores are very low at "N" positions.
Mostly, more Ns observed at the 5’ and 3’ ends because the sequence quality at the ends is poor.
Each "N" is scored as a misalignment, causing experimental sequences to appear to be less related to reference sequences than they actually are. This will significantly impact tree building, potentially placing related sequences in different clades.
QUESTION : Look at the entire length of the sequences. What do you notice about the...