I have a struct that looks like this:
struct rtok {
char type;
std::string val;
bool term;
};
I'm writing a simple interpreter and this "rtok" struct is how I represent a token. I have a vector of "rtoks" that I iterate through to generate the parse tree.
My question is, if I have 3 members in my struct and I only give a value to 1 member, will the other members still take up memory?
What I mean is, if I set "val" equal to "test" would my token take up just 4 bytes or would it take up 6 bytes? (4 bytes for "val", 1 byte for type, 1 byte for term)
Assuming you don't have additional members or virtual functions, your struct will always occupy sizeof(char) + sizeof(string) + sizeof(bool) + possible padding
. The string
part allocates itself a chunk of memory, which it deallocates at destruction. However, this memory is not technically part of the one allocated for the struct
.
So no matter of the values you give (or omit) for the members, the struct will always have the same size.
Do not worry, it would take considerably more than you think.
There are two factors: data alignment and internal type implementation.
First, about data alignment: all the types in your structure are naturally aligned, which means that char
can be at any address, but void*
may require alignment of 4 or 8 depending on architecture.
So, if we guess, that std::string uses simply char*
internally to keep you string the layout on x32 would be:
struct rtok {
char type;
char* val; // here char * for simplicity
bool term;
};
The sizeof(rtok)
operator would give 12 bytes, not 6, and the memory footprint would look like:
00: type (one byte)
01: padding
02: padding
03: padding
04-07: char * (4 bytes)
08: term (one byte)
09-0a: padding (3 bytes)
Now, if we replace char*
with std::string
, we would find that the structure size has grown, as sizeof(std::string)
is typically larger, than 4 bytes.
BUT, we have not computed the string value itself... And here we get into area of heap management and allocation.
The memory for storing the value is allocated on heap, and the code usually requests as much as it needs, so for string of 10 characters it would be 11 bytes (10 characters plus 1 byte for null terminator).
And heap has own complex structure with small-block heap etc. In practice, it means, that the minimum amount consumed is something like 16 bytes or more. This amount is not what you can use, this amount is for managing heap internal structures, and the only usable amount can be as little as 1 byte.
If you add up everything, you would find out that even when you plan to use only two characters plus type, the amount of memory consumed would be much-much-much larger.
A given type of struct
always has the same size. This is a guarantee from the Standard. When you define a struct
, you are saying "I have an object of this size (sum of sizes of members + possible padding for alignment for each member), and they are to be in this order in memory (same order of member definitions in containing struct
definition)":
(N4296)
9.2
/12 [ Example: A simple example of a class definition is
struct tnode {
char tword[20];
int count;
tnode* left;
tnode* right;
};
which contains an array of twenty characters, an integer, and two pointers to objects of the same type. [...] -end example
/13 Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (Clause 11). Implementation alignment requirements
might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).
Note the "with the same access control" qualifier. If your structure has a mix of data members with different access specifiers, the layout may not be what you might expect, other than the guarantee that given something like:
public:
some_type public_1;
private:
some_type private_1;
public:
some_type public_2;
public_2
will be at a higher address than public_1
. Beyond that - unspecified. private_1
could be at a lower or higher address.
Regarding your other question (asked in comments):
Would it be better to use a class instead of a struct then?
In C++, a struct
and class
are essentially the same, the only difference being that members (and inheritance) of a struct
are public
by default, whereas with a class
they are private
by default. This is made even clearer in a note and example from the Standard:
§3.1 Declarations and definitions [basic.def]
/3 [ Note: In some circumstances, C ++
implementations implicitly define the default constructor (12.1), copy
constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move assignment operator (12.8), or destructor (12.4) member functions. —end note ] [ Example: given
#include <string>
struct C {
std::string s; // std::string is the standard library class (Clause 21)
};
int main() {
C a;
C b = a;
b = a;
}
the implementation will implicitly define functions to make the definition of C equivalent to
struct C {
std::string s;
C() : s() { }
C(const C& x): s(x.s) { }
C(C&& x): s(static_cast<std::string&&>(x.s)) { }
// : s(std::move(x.s)) { }
C& operator=(const C& x) { s = x.s; return *this; }
C& operator=(C&& x) { s = static_cast<std::string&&>(x.s); return *this; }
// { s = std::move(x.s); return *this; }
~C() { }
};
—end example ]
Note that the example from the Standard uses a struct
rather than a class
to illustrate this point for non-POD structs
. This is even more clear when you consider that the definition of a struct
in the Standard is in Section 9 - "Classes."
As it is said earlier, struct
is always fixed-size.
There are several ways to overcome this limitation:
- Store the pointer and allocate heap memory for it.
- Use "unbound" array of
char[1]
as the last member, and allocate memory for the struct
itself on the heap.
- Use
union
to save some space for overlapping members.