how determining file size in term of number of cha

2019-07-16 14:08发布

Reading file using java and jcifs on windows. I need to determine size of file, which contains multi-byte as well as ASCII characters.

how can i achieve it efficiently OR any existing API in java?

Thanks,

标签: java jcifs
2条回答
爷、活的狠高调
2楼-- · 2019-07-16 14:34

No doubts, to get exact number of characters you have to read it with proper encoding. The question is how to read files efficiently. Java NIO is fastest known way to do that.

FileChannel fChannel = new FileInputStream(f).getChannel();
    byte[] barray = new byte[(int) f.length()];
    ByteBuffer bb = ByteBuffer.wrap(barray);
    fChannel.read(bb);

then

String str = new String(barray, charsetName);
str.length();

Reading into byte buffer is done with a speed near to maximum available ( for me it was like 60 Mb/sec while disk speed test gives about 70-75 Mb/sec)

查看更多
迷人小祖宗
3楼-- · 2019-07-16 14:43

To get the character count, you'll have to read the file. By specifying the correct file encoding, you ensure that Java correctly reads each character in your file.

BufferedReader.read() returns the Unicode character read (as an int in the range 0 to 65535). So the simple way to do it would be like this:

int countCharsSimple(File f, String charsetName) throws IOException {
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f), charsetName));
    int charCount = 0;
    while(reader.read() > -1) {
        charCount++;
    }
    reader.close();
    return charCount;
}

You will get faster performance using Reader.read(char[]):

int countCharsBuffer(File f, String charsetName) throws IOException {
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f), charsetName));
    int charCount = 0;
    char[] cbuf = new char[1024];
    int read = 0;
    while((read = reader.read(cbuf)) > -1) {
        charCount += read;
    }
    reader.close();
    return charCount;
}

For interest, I benchmarked these two and the nio version suggested in Andrey's answer. I found the second example above (countCharsBuffer) to be the fastest.

(Note that all these examples include line separator characters in their counts.)

查看更多
登录 后发表回答