-->

Why is FileInputStream read() method wrongly readi

2019-03-05 06:59发布

问题:

There are some similar questions on the site but all have been used in different scenario. So, I'm asking it here:

package Assign6B;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class FileOpsDemo {
    public static void main(String[] args) throws IOException 
    {

        FileInputStream inputFile = null;
        FileOutputStream outputFile = null;

        try
        {
            inputFile = new FileInputStream("s:/inputFile.txt");
            outputFile = new FileOutputStream("s:/outputFile.txt");
            char c;
            while(( c = (char) inputFile.read()) != -1)
            {
                System.out.println((char)c);
                outputFile.write(c);
            }

            System.out.println("File transfer complete!");
        }

        finally
        {
            if (inputFile != null)
                inputFile.close();

            if (outputFile != null)
                outputFile.close();
        }
    }
}

This is my code. In the while loop condition, first I had set it to typecast the int output by read() into a char. The result was that it went into an infinite loop with all characters being converted to '?' (ascii: 63). Then I realized my mistake of char conversion and changed it.

But when I changed my while condition to "=-2" (without the char conversion) (this condition will never occur and hence put it into infinite loop). Here too, even if there is no char conversion the first (say, 10) valid characters of the file are still being converted into '?'. (After it reaches EOF, all invalid chars become '?' - I'm assuming this is given).

Why is this happening? At least the valid characters of the file should be read properly until it encounters the EOF and starts feeding on invalid chars!

回答1:

Why is this happening?

The problem is in this line:

 while(( c = (char) inputFile.read()) != -1)

You are doing the following:

  1. Reading a byte from the file. This gives you an int which is either a byte in the range 0 to 255, or -1.

  2. You are casting that value to a char. For the byte, that gives a char value in the range 0 to 255. For -1 the cast will give you '\uffff'.

  3. You assign that value to c.

  4. You then test the value against -1. This is where it goes wrong. In the case where read returned -1, you will now be evaluating this '\uffff' == -1. The LHS is converted to an int value ... 0x0000ffff ... and that is compared to 0xffffffff. They are different.

Then you print 'uffff' ... which is being converted to a '?' when output as a character in your default charset.


There are two major mistakes in the code. First, the conversion int -> char -> int is not going to work; see above.

Second, and more important:

  • you should not be trying to use an InputStream (which is byte oriented) to read data as characters, and

  • you should be trying to write character data to an OutputStream.

Depending on what you actually trying to achieve here, you should either:

  • read and write bytes ... without a spurious "conversion" to char in the middle, OR

  • use a FileReader and FileWriter to do the conversions properly for the platform default characterset.

(There are some other points that could be made about buffering, choosing an alternate charset, etc, but this Answer is already getting too long.)



回答2:

Just change this section of code - once you convert to a char, you cannot compare it to an integer successfully, so your while exit condition is never met.

int c;
while ((c = inputFile.read()) != -1) {
    System.out.println((char) c);
    outputFile.write(c);
}

Also using the java 8 java.nio and java.io packages is much simpler

public static void main(String[] args) throws IOException {
    List<String> lines = Files.readAllLines(Paths.get("s:/inputFile.txt"));
    Files.write(Paths.get("s:/outputFile.txt"), lines);
}


回答3:

Typecasting a the result of in.read() to char is bad style. Characters should only read from a Reader - in your case you could use an InputStreamReader:

    inputFile = new FileInputStream("s:/inputFile.txt");
    outputFile = new FileOutputStream("s:/outputFile.txt");
    Reader inputReader = InputStreamReader(inputFile, StandardCharsets.UTF_8);
    Writer outputWriter = OutputStreamWriter(outputFile, StandardCharsets.UTF_8);
    char[] cbuf = new char[4096];
    int read;
    while( (read = inputReader.read(cbuf)) >= 0)
    {
        System.out.println(new String(cbuf, 0, read));
        outputWriter.write(cbuf, 0, read);
    }

This example furthermore does not copy byte-by-byte (massive speed improvement) and it applies UTF-8 as charset.