This question is actually for DNA codon analysis, to put it in a simple way, let's say I have a file like this:
atgaaaccaaag...
and I want to count the number of 'aaa' triplet present in this file. Importantly, the triplets start from the very beginning (which means atg,aaa,cca,aag,...) So the result should be 1 instead of 2 'aaa' in this example.
Is there any Python or Shellscript methods to do this? Thanks!
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- How to get the return code of a shell script in lu
- Evil ctypes hack in python
first readin the file
then split it into 3's
then count em
like so
using a simple shell, assuming your fasta only contains one sequence.
The obvious solution is to split the string into 3-character pieces and then count the number of occurrences of "aaa":
If the string is really long then this solution will chew up some memory unnecessarily creating the list of substrings.
This uses a generator expression instead of creating a temporary list, so it will be more memory efficient. It takes advantage of the fact that
True == 1
, i.e.True + True == 2
.You could first break the string into triples, using something like:
Then check for "aaa", and sum it up: