I have nearly 500 text files with 10 million words. I have to index those words. What is the fastest way to read from a text file character by character? Here is my initial attempt:
InputStream ist = new FileInputStream(this.path+"/"+doc);
BufferedReader in = new BufferedReader(new InputStreamReader(ist));
String line;
while((line = in.readLine()) != null){
line = line.toUpperCase(Locale.ENGLISH);
String word = "";
for (int j = 0; j <= line.length(); j++) {
char c= line.charAt(j);
// OPERATIONS
}
read()
will not give considerable difference in performance.
Read more: Peter Lawery's comparison of read() and readLine()
Now, coming back to your original question:
Input string: hello how are you?
So you need to index the words of the line, i.e.:
BufferedReader r = new BufferedReader(new InputStreamReader(inputStream));
String line;
while ((line = r.readLine()) != null) {
String[] splitString = line.split("\\s+");
//Do stuff with the array here, i.e. construct the index.
}
Note: The pattern \\s+
will put delimiter in the string as any whitespace like tab, space etc.
InputStreamReader's read() method can read a character at a time.
You can wrap it around FileReader or a BufferedReader or example.
Hope this helps!
Don't read lines and then rescan the lines char by char. That way you are processing every character twice. Just read chars via BufferedReader.read().