I've got text file that contains 1 000 002
numbers in following formation:
123 456
1 2 3 4 5 6 .... 999999 100000
Now I need to read that data and allocate it to int
variables (the very first two numbers) and all the rest (1 000 000 numbers) to an array int[]
.
It's not a hard task, but - it's horrible slow.
My first attempt was java.util.Scanner
:
Scanner stdin = new Scanner(new File("./path"));
int n = stdin.nextInt();
int t = stdin.nextInt();
int array[] = new array[n];
for (int i = 0; i < n; i++) {
array[i] = stdin.nextInt();
}
It works as excepted but it takes about 7500 ms to execute. I need to fetch that data in up to several hundred of milliseconds.
Then I tried java.io.BufferedReader
:
Using BufferedReader.readLine()
and String.split()
I got the same results in about 1700 ms, but it's still too many.
How can I read that amount of data in less that 1 second? The final result should be equal to:
int n = 123;
int t = 456;
int array[] = { 1, 2, 3, 4, ..., 999999, 100000 };
According to trashgod answer:
StreamTokenizer
solution is fast (takes about 1400 ms) but it's still too slow:
StreamTokenizer st = new StreamTokenizer(new FileReader("./test_grz"));
st.nextToken();
int n = (int) st.nval;
st.nextToken();
int t = (int) st.nval;
int array[] = new int[n];
for (int i = 0; st.nextToken() != StreamTokenizer.TT_EOF; i++) {
array[i] = (int) st.nval;
}
PS. There is no need for validation. I'm 100% sure that data in ./test_grz
file is correct.
Thanks for every answer but I've already found a method that meets my criteria:
It requires only about 300 ms to read 1 mln of integers!
How much memory do you have in the computer? You could be running into GC issues.
The best thing to do is to process the data one line at a time if possible. Don't load it into an array. Load what you need, process, write it out, and continue.
This will reduce your memory footprint and still use the same amount of File IO
You can reduce the time for the
StreamTokenizer
result by using aBufferedReader
:Also, don't forget to close your files, as I've shown here.
You can also shave some more time off by using a custom tokenizer just for your purposes:
Remember to use a
BufferedReader
for this. This custom tokenizer assumes the input data is always completely valid and contains only spaces, new lines, and digits.If you read these results a lot and those results do not change much, you should probably save the array and keep track of the last file modified time. Then, if the file has not changed just use the cached copy of the array and this will speed up the results significantly. For example:
I would extend FilterReader and parse the string as it is read in the read() method. Have a getNextNumber method return the numbers. Code left as an exercise for the reader.
It it's possible to reformat the input so that each integer is on a separate line (instead of one long line with one million integers), you should be seeing much improved performance using
Integer.parseInt(BufferedReader.readLine())
due to smarter buffering by line and not having to split the long string into a separate array of Strings.Edit: I tested this and managed to read the output produced by
seq 1 1000000
into an array ofint
well under half a second, but of course this depends on the machine.Use a StreamTokenizer on a BufferedReader will give you quite good performance already. You shouldn't need to write your own readInt() function.
Here is the code I used to do some local performance testing:
Results I got: