Simple dictionary in C++

2019-03-11 00:28发布

Moving some code from Python to C++.

BASEPAIRS = { "T": "A", "A": "T", "G": "C", "C": "G" }

Thinking maps might be overkill? What would you use?

8条回答
我只想做你的唯一
2楼-- · 2019-03-11 00:54

Here's the map solution:

#include <iostream>
#include <map>

typedef std::map<char, char> BasePairMap;

int main()
{
    BasePairMap m;
    m['A'] = 'T';
    m['T'] = 'A';
    m['C'] = 'G';
    m['G'] = 'C';

    std::cout << "A:" << m['A'] << std::endl;
    std::cout << "T:" << m['T'] << std::endl;
    std::cout << "C:" << m['C'] << std::endl;
    std::cout << "G:" << m['G'] << std::endl;

    return 0;
}
查看更多
淡お忘
3楼-- · 2019-03-11 00:56

BASEPAIRS = { "T": "A", "A": "T", "G": "C", "C": "G" } What would you use?

Maybe:

static const char basepairs[] = "ATAGCG";
// lookup:
if (const char* p = strchr(basepairs, c))
    // use p[1]

;-)

查看更多
▲ chillily
4楼-- · 2019-03-11 01:04

If you are into optimization, and assuming the input is always one of the four characters, the function below might be worth a try as a replacement for the map:

char map(const char in)
{ return ((in & 2) ? '\x8a' - in : '\x95' - in); }

It works based on the fact that you are dealing with two symmetric pairs. The conditional works to tell apart the A/T pair from the G/C one ('G' and 'C' happen to have the second-least-significant bit in common). The remaining arithmetics performs the symmetric mapping. It's based on the fact that a = (a + b) - b is true for any a,b.

查看更多
姐就是有狂的资本
5楼-- · 2019-03-11 01:04

This is the fastest, simplest, smallest space solution I can think of. A good optimizing compiler will even remove the cost of accessing the pair and name arrays. This solution works equally well in C.

#include <iostream>

enum Base_enum { A, C, T, G };
typedef enum Base_enum Base;
static const Base pair[4] = { T, G, A, C };
static const char name[4] = { 'A', 'C', 'T', 'G' };
static const Base base[85] = 
  { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
    -1, -1, -1, -1, -1,  A, -1,  C, -1, -1,
    -1,  G, -1, -1, -1, -1, -1, -1, -1, -1, 
    -1, -1, -1, -1,  T };

const Base
base2 (const char b)
{
  switch (b)
    {
    case 'A': return A;
    case 'C': return C;
    case 'T': return T;
    case 'G': return G;
    default: abort ();
    }
}

int
main (int argc, char *args) 
{
  for (Base b = A; b <= G; b++)
    {
      std::cout << name[b] << ":" 
                << name[pair[b]] << std::endl;
    }
  for (Base b = A; b <= G; b++)
    {
      std::cout << name[base[name[b]]] << ":" 
                << name[pair[base[name[b]]]] << std::endl;
    }
  for (Base b = A; b <= G; b++)
    {
      std::cout << name[base2(name[b])] << ":" 
                << name[pair[base2(name[b])]] << std::endl;
    }
};

base[] is a fast ascii char to Base (i.e. int between 0 and 3 inclusive) lookup that is a bit ugly. A good optimizing compiler should be able to handle base2() but I'm not sure if any do.

查看更多
放荡不羁爱自由
6楼-- · 2019-03-11 01:09

You can use the following syntax:

std::map<char, char> my_map = {
    { 'A', '1' },
    { 'B', '2' },
    { 'C', '3' }
};
查看更多
三岁会撩人
7楼-- · 2019-03-11 01:11

Until I was really concerned about performance, I would use a function, that takes a base and returns its match:

char base_pair(char base)
{
    switch(base) {
        case 'T': return 'A';
        ... etc
        default: // handle error
    }
}

If I was concerned about performance, I would define a base as one fourth of a byte. 0 would represent A, 1 would represent G, 2 would represent C, and 3 would represent T. Then I would pack 4 bases into a byte, and to get their pairs, I would simply take the complement.

查看更多
登录 后发表回答