I have nearly 500 text files with 10 million words. I have to index those words. What is the fastest way to read from a text file character by character? Here is my initial attempt:
InputStream ist = new FileInputStream(this.path+"/"+doc);
BufferedReader in = new BufferedReader(new InputStreamReader(ist));
String line;
while((line = in.readLine()) != null){
line = line.toUpperCase(Locale.ENGLISH);
String word = "";
for (int j = 0; j <= line.length(); j++) {
char c= line.charAt(j);
// OPERATIONS
}
read()
will not give considerable difference in performance.Read more: Peter Lawery's comparison of read() and readLine()
Now, coming back to your original question:
Input string:
hello how are you?
So you need to index the words of the line, i.e.:
Note: The pattern
\\s+
will put delimiter in the string as any whitespace like tab, space etc.Don't read lines and then rescan the lines char by char. That way you are processing every character twice. Just read chars via BufferedReader.read().
InputStreamReader's read() method can read a character at a time.
You can wrap it around FileReader or a BufferedReader or example.
Hope this helps!