Java - Fastest Way to Reading Text Files Char by C

2020-06-21 12:57发布

I have nearly 500 text files with 10 million words. I have to index those words. What is the fastest way to read from a text file character by character? Here is my initial attempt:

InputStream ist = new FileInputStream(this.path+"/"+doc);
BufferedReader in = new BufferedReader(new InputStreamReader(ist));

String line;

while((line = in.readLine()) != null){


   line = line.toUpperCase(Locale.ENGLISH);
    String word = "";

    for (int j = 0; j <= line.length(); j++) {
         char  c= line.charAt(j);
     // OPERATIONS

}

标签: java file-io
3条回答
啃猪蹄的小仙女
2楼-- · 2020-06-21 13:14

read() will not give considerable difference in performance.

Read more: Peter Lawery's comparison of read() and readLine()

Now, coming back to your original question:
Input string: hello how are you?
So you need to index the words of the line, i.e.:

BufferedReader r = new BufferedReader(new InputStreamReader(inputStream));
String line;
while ((line = r.readLine()) != null) {
   String[] splitString = line.split("\\s+");
   //Do stuff with the array here, i.e. construct the index.
}

Note: The pattern \\s+ will put delimiter in the string as any whitespace like tab, space etc.

查看更多
放我归山
3楼-- · 2020-06-21 13:17

Don't read lines and then rescan the lines char by char. That way you are processing every character twice. Just read chars via BufferedReader.read().

查看更多
手持菜刀,她持情操
4楼-- · 2020-06-21 13:37

InputStreamReader's read() method can read a character at a time.

You can wrap it around FileReader or a BufferedReader or example.

Hope this helps!

查看更多
登录 后发表回答