Does anyone happen to know if there is any difference with regards to performance between the two methods of reading input file below? Thanks.
1) Reading a file with Scanner and File
Scanner input = new Scanner(new File("foo.txt"));
2) Reading a file with InputStreamReader and FileInputStream
InputStreamReader input = new InputStreamReader(new FileInputStream("foo.txt"));
In terms of performance, Scanner is definitely the slower one, at least from my experience. It's made for parsing, not reading huge blocks of data. InputStreamReader, with a large enough buffer, can perform on par with BufferedReader, which I remember to be a few times faster than Scanner for reading from a dictionary list. Here's a comparison between BufferedReader and InputStreamReader. Remember that BufferedReader is a few times faster than Scanner.
The first point is that neither of those code samples read a file. This may sound fatuous or incorrect, but it is true. What they actually do is open a file for reading. And in terms of what they actually do, there's probably not a huge difference in their respective efficiency.
When it comes to actually reading the file, the best approach to use will depend on what the file contains, what form the data has to be in for your in-memory algorithms, etc. This will determine whether it is better to use
Scanner
or a rawReader
, from a performance perspective and more importantly from the perspective of making your code reliable and maintainable.Finally, the chances are that this won't make a significant difference to the overall performance of your code. What I'm saying is that you are optimizing your application prematurely. You are better of ignoring performance for now and choosing the version that will make the rest of your code simpler. When the application is working, profile it with some representative input data. The profiling will tell you the time is spent reading the file, in absolute terms, and relative to the rest of the application. This will tell you whether it is worth the effort to try to optimize the file reading.
The only bit of performance advice I'd give is that character by character reading from an unbuffered input stream or reader is inefficient. If the file needs to be read that way, you should add a BufferedReader to the stack.
A difference, and the principal, I guess, is that with the BufferedReader/InputStreamReader you can read the whole document character by character, if you want. With scanner this is no possible. It means that with the InputStreamReader you can have more control about the content of the document. ;)