Moving some code from Python to C++.
BASEPAIRS = { "T": "A", "A": "T", "G": "C", "C": "G" }
Thinking maps might be overkill? What would you use?
Moving some code from Python to C++.
BASEPAIRS = { "T": "A", "A": "T", "G": "C", "C": "G" }
Thinking maps might be overkill? What would you use?
A table out of char array:
While using a
std::map
is fine or using a 256-sized char table would be fine, you could save yourself an enormous amount of space agony by simply using anenum
. If you have C++11 features, you can useenum class
for strong-typing:Usage becomes simple:
If this is too much for you, you can define some helpers to get human-readable ASCII characters and also to get the base pair compliment so you're not doing
(int)
casts all the time:It's clean, it's simple, and its efficient.
Now, suddenly, you don't have a 256 byte table. You're also not storing characters (1 byte each), and thus if you're writing this to a file, you can write 2 bits per Base pair instead of 1 byte (8 bits) per base pair. I had to work with Bioinformatics Files that stored data as 1 character each. The benefit is it was human-readable. The con is that what should have been a 250 MB file ended up taking 1 GB of space. Movement and storage and usage was a nightmare. Of coursse, 250 MB is being generous when accounting for even Worm DNA. No human is going to read through 1 GB worth of base pairs anyhow.