I'm trying to create a byte array whose size is of type long
. For example, think of it as:
long x = _________;
byte[] b = new byte[x];
Apparently you can only specify an int
for the size of a byte array.
Before anyone asks why I would need a byte array so large, I'll say I need to encapsulate data of message formats that I am not writing, and one of these message types has a length of an unsigned int (long
in Java).
Is there a way to create this byte array?
I am thinking if there's no way around it, I can create a byte array output stream and keep feeding it bytes, but I don't know if there's any restriction on a size of a byte array...
(It is probably a bit late for the OP, but it might still be useful for others)
Unfortunately Java does not support arrays with more than 231−1 elements. The maximum consumption is 2 GiB of space for a byte[]
array, or 16 GiB of space for a long[]
array.
While it is probably not applicable in this case, if the array is going to be sparse, you might be able to get away with using an associative data structure like a Map
to match each used offset to the appropriate value. In addition, Trove provides an more memory-efficient implementation for storing primitive values than standard Java collections.
If the array is not sparse and you really, really do need the whole blob in memory, you will probably have to use a two-dimensional structure, e.g. with a Map
matching offsets modulo 1024 to the proper 1024-byte array. This approach might be be more memory efficient even for sparse arrays, since adjacent filled cells can share the same Map
entry.
A byte[]
with size of the maximum 32-bit signed integer would require 2GB of contiguous address space. You shouldn't try to create such an array. Otherwise, if the size is not really that large (and it's just a larger type), you could safely cast it to an int
and use it to create the array.
You should probably be using a stream to read your data in and another to write it out. If you are gong to need access to data later on in the file, save it. If you need access to something you haven't ran into yet, you need a two-pass system where you run through once and store the "stuff you'll need for the second pass, then run through again".
Compilers work this way.
The only case for loading in the entire array at once is if you have to repeatedly randomly access many locations throughout the array. If this is the case, I suggest you load it into multiple byte arrays all stored in a single container class.
The container class would have an array of byte arrays, but from outside all the accesses would seem contiguous. You would just ask for byte 49874329128714391837 and your class would divide your Long by the size of each byte array to calculate which array to access, then use the remainder to determine the byte.
It could also have methods to store and retrieve "Chunks" that could span byte-array boundaries that would require creating a temporary copy--but the cost of creating a few temporary arrays would be more than made up for by the fact that you don't have a locked 2gb space allocated which I think could just destroy your performance.
Edit: ps. If you really need the random access and can't use streams then implementing a containing class is a Very Good Idea. It will let you change the implementation on the fly from a single byte array to a group of byte arrays to a file-based system without any change to the rest of your code.
It's not of immediate help but creating arrays with larger sizes (via longs) is a proposed language change for Java 7. Check out the Project Coin proposals for more info
One way to "store" the array is to write it to a file and then access it (if you need to access it like an array) using a RandomAccessFile. The api for that file uses long as an index into file instead of int. It will be slower, but much less hard on the memory.
This is when you can't extract what you need during the initial input scan.