Memory efficient AI objects for physics game

I am creating a physics game in java using box2d.

I am writing an AI class and want to make sure my data is being stored as efficiently as possible, taking into consideration memory alignment.

The tiniest increase will likely make a huge difference because I am literally running 'as many AI objects as I can' until the system slows down.The program is already using a lot of memory on the collision detection, because once again, I want to be able to support as many agents as possible.

What I understand so far, is that the smallest Java type is 8bytes, and that objects are padded into multiples of 8. I have structured my AI controls in boolean arrays, representing movement: x+/-1, y+/-1, and clockwise/CCW rotations for certain fixtures.

Since Java doesn't have null settings for Booleans, I nested the controls in command objects with the bool values on_off, and pos_neg. With the movements and rotations, I'm dealing with about 7 command objects per one 'default' action, (such as move right). So I create Command arrays for each action.

My question is: Am I doing this efficiently?

I haven't finalized the design, so I'm not certain about the size of each array. But, given memory-alignment requirements, I'm guessing I will have at least some padding, which is ultimately wasted memory.I'm considering doing something like cutting the object size to fit the padding limit and then pushing the remaining data from multiple objects into an 'overflow' object... or something like that.

Will this speed things up? Why or why not?

I'm also considering using bitsets, though I think my command objects may have achieved a similar result, and I've been told bit-shifting is slow.

public class Command {

        boolean on_off  = false;
        boolean pos_neg = false;
}

public class DefaultMoves {

    //Note: these arrays are 2d in the event multiples are necessary
    //to complete a single action, I may change this later.
    Command[][] mvRight =    
        { 
              {     new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(true, true), //moveX
                    new Command(false, false)  //   
              },   
        };
    Command[][] mvLeft =    
        { 
              {     new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(false, false), //
                    new Command(true, false), //moveX
                    new Command(false, false)  //   
              },   
        };
}

This was going to be just a comment, but got a bit lengthy, and I didn't feel like writing it as 3 comments.

Since this is a follow-up question of another, I would start with "Stop worrying about padding". Worry about how you store YOUR data.

And, if you are going to worry about how much space your things take, allocate an array of 7 objects instead of 7 individual objects. I'm sure Java has overhead in each allocation. In a typical C or C++ implementation, each allocation with new or malloc takes up 16-32 bytes above and beyond the size of the actual data allocated, and the size is rounded to 16 or 32 bytes. In java there is a suggestion here that the memory overhead of an object is 8 bytes - this may not be true for ALL Java VMs and implementations.

Further, all space&time optimisation is a compromise between space and time [nearly always at least], so storing your data in a more compact form will cost in time for the saving in space. I can for example think that having pairs of on_off and pos_neg as two bits in a larger integer structuer. So your 7 commands would be stored in one integer. However, now you have to do shifts and masking to get the right data out. Likewise, shifting and oring if you are going to store something. (I'm writing this as C, since I don't know Java well enough).

/* This is big enough for 16 pairs of on_off and pos_neg */
/* In a pair of bits, bit 0 = on_off, bit 1 = pos_neg */
uint32_t cmdData;

/* Get values of on_off and pos_neg for element number n */
void getData(int n, bool& on_off, bool& pos_neg)
{
    uint32_t shifted = cmdData >> (2 * n);
    on_off = (shifted & 1) != 0;
    pos_neg = (shifted & 2) != 0;
}

/* Store values for element n */
void setData(int n, bool on_off, bool pos_neg)
{
    uint32_t bits = (int)on_off + (2 * (int)pos_neg); 
    uint32_t mask = 3 << (n * 2);
    cmdData &= ~mask; /* Clear bits */
    cmdData |= bits << (n * 2);
}

As you can see, this stores the data more efficiently, as we can store 16 pairs of {on_off, pos_neg} in 4 bytes, rather than each (probably) taking up a byte each. But to get to each, you have to do a few extra operations each time (and the code gets messier). Whether this is "worth having" or not highly depends on the situation, how often you access these versus how low on memory the system is (assuming the memory usage for THESE objects is what is causing the problem - if you have 100 command structures and 40000000 objects that USE the commands, then the commands aren't going to be the issue).

The way I would store the commands for direction/movement is probably as two integer values (int8_t [byte in java] if space is tight), holding a +1 for moving for example right or down, -1 for moving left or up. This is not the most compact form, but it makes for easy access and easy calculation of a new position.

This pair can then be used to describe all possible directions:

struct direction
{
     int x, y;
};

direction directions[] =
{
     { 0, 0 },    // Don't move.
     { 0, 1 },    // Right.
     { 0, -1 },   // Left.
     { 1, 0 },    // Down.
     { -1, 0 },    // Up.
 };

If you want to move diagonally too, you'll have to add another four pairs with the combinations of { 1, 1 }, {-1, 1}, etc.

The same can be applied to an object that can move, as a pair of xDir, yDir values.

But the key here is that you want to first of all come up with a good understanding of what is more important: space or computation. From that, find out what objects are taking up most of the space (what has the highest count). Fiddling with the size of objects that you have one or a few dozen of won't make any big different, something you have millions of will). If space is not an issue (and to be fair, it's really hard to write code that efficiently uses large enough amounts of data in a system with gigabytes of RAM - it is usually the CPU that runs out of speed before memory is exhausted if you do something to each object for every frame).

Suitcase analogy:

Imagine that you have a suitcase that can hold exactly 4 little boxes in it's width [and any number at the length - it's a weird suitcase!], and you have larger boxes that are 1, 2, 3 or 4 units of little boxes. The boxes are made with "velcro", so they stick together and can be split as you like, but you have to keep track of which belong together, and each time you "split" or "put back together" the units, it takes extra time.

If you want to be lazy and make it simple, you just stick your three-box ones into the suitcase with one empty space next to each.

 1 2 3 4
 a a a x
 b b b x
 c c c x
 d d d x

and so on.

If you want to pack it tightly, you take one 3 unit box, and then cut one unit of the next one, and stick it next to the first one, then the remnant two units on the next space, then cut a two unit piece from the next one, and stick it next to packet 2, and so on.

1 2 3 4
a a a b
b b c c
c d d d

Now, you have used 25% less space to store them, but you spent time split them, and you have to spend time again to fetch them out into units of three when you later need to use the data.

Now, imagine that you get paid to put things into the suitcase, and you get paid per item you put, which method do you choose?

Then consider that you instead of being paid per item, you have to pay for suitcase space (because you are the owner of the company). Now you want to squeeze as much into the space as possible. But it takes extra time, right? If suitcase is expensive, then it may be worth it. If it's not so expensive, you may prefer to save time over space. It's a compromise.

[We could do the same thing in more realistic units of 8, 32 or 64, but I think that would just make it harder to read and definitely make it harder to type]