Short introduction to my questions: i'm trying to implement a "sort of" relational database using stl containers. This is just for fun/educational purpose, so no need for answers like "use this library", "this is absolutely useless" and so on. I know title is a little bit confusing at this point, but we will reach the point (suggestions for improvement to title are really welcome).
I proceeded with little steps:
- i can build table as vector of maps from columns name to their values =>
std::vector<std::map<std::string, some_variant>>
. It's simple and it represents what i need. - wait, i can just store column's names once and access values with their index. =>
std::vector<std::vector<some_variant>>
.As simple as point 1, but faster than that. - wait wait, in a database a table is literrally a sequence of tuple =>
std::vector<std::tuple<args...>>
. This is cool, it represents exactly what i'm doing, correct type without variant and even faster than the other.
Note: the "faster than" was measured for 1000000 records with a simple loop like this:
std::random_device dev;
std::mt19937 gen(dev());
std::uniform_int_distribution<long> rand1_1000(1, 1000);
std::uniform_real_distribution<double> rand1_10(1.0, 10.0);
void fill_1()
{
using my_variant = std::variant<long, long long, double, std::string>;
using values = std::map<std::string, my_variant>;
using table = std::vector<values>;
table t;
for (int i = 0; i < 1000000; ++i)
t.push_back({ {"col_1", rand1_1000(gen)}, {"col_2", rand1_1000(gen)}, {"col_3", rand1_10(gen)} });
std::cout << "size:" << t.size() << "\n";//just to prevent optimization
}
2234101600ns - avg:2234
446344100ns - avg:446
132075400ns - avg:132
INSERT: No problem with any of these solutions, insert are as simple as pushing back elements as in the example.
SELECT: 1 and 2 are simple, but 3 is tricky.
So, finally, questions:
Memory usage: there is a lot of overhead using solution 1 and 2 in term of used memory. So, 3 seems to be again the right choice here. For the example with 1 million records of 2
long
s and adouble
i was expecteing something near 4MB*2 for longs and 8MB for doubles plus some overhead for vectors, maps and variants where used. Instead we have (measured with windows task manager, not extremely accurate, i know):1.340 MB
2.120 MB
3.31 MB
Am i missing something? Other than reserving the right size in advance or
shrink_to_fit
after the insert loop?Is there a way to run-time retrieve some tuple field as in the case of a select statement?
using my_tuple = std::tuple<long, long, string, double>;
std::vector<my_tuple> table;
int to_select;//this could be a vector of columns to select obviosly
std::cin>>to_select;
auto result = select (table, to_select);
Do you see any chance to implement this last line in any way? We have two problem for what i see: the result type should take the the type from the starting tuple and then, actually perform the selection of desired fields.
I read a lot of answers about that, they all talk about contiguous indexes using make_index_sequence
or complile-time known index.
I also found this article, very interesting, but not really useful for this case.