Reading/writing a BINARY File with Strings?

2019-02-16 01:59发布

How can I write/read a string from a binary file?

I've tried using writeUTF / readUTF (DataOutputStream/DataInputStream) but it was too much of a hassle.

Thanks.

2条回答
叛逆
2楼-- · 2019-02-16 02:38

If you want to write text you can use Writers and Readers.

You can use Data*Stream writeUTF/readUTF, but the strings have to be less than 64K characters long.


public static void main(String... args) throws IOException {
    // generate a million random words.
    List<String> words = new ArrayList<String>();
    for (int i = 0; i < 1000000; i++)
        words.add(Long.toHexString(System.nanoTime()));

    writeStrings("words", words);
    List<String> words2 = readWords("words");
    System.out.println("Words are the same is " + words.equals(words2));
}

public static List<String> readWords(String filename) throws IOException {
    DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
    int count = dis.readInt();
    List<String> words = new ArrayList<String>(count);
    while (words.size() < count)
        words.add(dis.readUTF());
    return words;
}

public static void writeStrings(String filename, List<String> words) throws IOException {
    DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));
    dos.writeInt(words.size());
    for (String word : words)
        dos.writeUTF(word);
    dos.close();
}

prints

Words are the same is true
查看更多
一夜七次
3楼-- · 2019-02-16 02:58

Forget about FileWriter, DataOutputStream for a moment.

  • For binary data one uses OutputStream and InputStream classes. They handle byte[].
  • For text data one uses Reader and Writer classes. They handle String which can store all kind of text, as it internally uses Unicode.

The crossover from text to binary data can be done by specifying the encoding, which defaults to the OS encoding.

  • new OutputStreamWriter(outputStream, encoding)
  • string.getBytes(encoding)

So if you want to avoid byte[] and use String you must abuse an encoding which covers all 256 byte values in any order. So no "UTF-8", but maybe "windows-1252" (also named "Cp1252").

But internally there is a conversion, and in very rare cases problems might happen. For instance é can in Unicode be one code, or two, e + combining diacritical mark right-accent '. There exists a conversion function (java.text.Normalizer) for that.

One case where this already led to problems is file names in different operating systems; MacOS has another Unicode normalisation than Windows, and hence in version control system need special attention.

So on principle it is better to use the more cumbersome byte arrays, or ByteArrayInputStream, or java.nio buffers. Mind also that String chars are 16 bit.

查看更多
登录 后发表回答