Why is my String returning “\ufffd\ufffdN a m e”

This is my method

public void readFile3()throws IOException
{
    try
    {
        FileReader fr = new FileReader(Path3);
        BufferedReader br = new BufferedReader(fr);
        String s = br.readLine();
        int a =1;
        while( a != 2)
        {
            s = br.readLine();
            a ++; 

        }
        Storage.add(s);

        br.close();

    }
    catch(IOException e)
    {
        System.out.println(e.getMessage());
    }
}

For some reason I am unable to read the file which only contains this " Name Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz "

When i debug the code the String s is being returned as "\ufffd\ufffdN a m e" and i have no clue as to where those extra characters are coming from.. This is preventing me from properly reading the file.

标签： java bufferedreader filereader

3条回答

forever°为你锁心

2楼-- · 2020-02-12 09:29

You must specify the encoding when reading the file, in your case probably is UTF-16.

Reader reader = new InputStreamReader(new FileInputStream(fileName), "UTF-16");
BufferedReader br = new BufferedReader(reader);

Check the documentation for more details: InputStreamReader class.

0人赞添加讨论(0) 举报

Juvenile、少年°

3楼-- · 2020-02-12 09:38

Check to see if the file is .odt, .rtf, or something other than .txt. This may be what's causing the extra UTF-16 characters to appear. Also, make sure that (even if it is a .txt file) your file is encoded in UTF-8 characters.

Perhaps you have UTF-16 characters such as '®' in your document.

0人赞添加讨论(0) 举报

祖国的老花朵

4楼-- · 2020-02-12 09:39

\ufffd is the replacement character in unicode, it is used when you try to read a code that has no representation in unicode. I suppose you are on a Windows platform (or at least the file you read was created on Windows). Windows supports many formats for text files, the most common is Ansi : each character is represented but its ansi code.

But Windows can directly use UTF16, where each character is represented by its unicode code as a 16bits integer so with 2 bytes per character. Those files uses special markers (Byte Order Mark in Windows dialect) to say :

that the file is encoded with 2 (or even 4) bytes per character
the encoding is little or big endian

(Reference : Using Byte Order Marks on MSDN)

As you write after the first two replacement characters N a m e and not Name, I suppose you have an UTF16 encoded text file. Notepad can transparently edit those files (without even saying you the actual format) but other tools do have problems with those ... The excellent vim can read files with different encodings and convert between them.

If you want to use directly this kind of file in java, you have to use the UTF-16 charset. From JaveSE 7 javadoc on Charset : UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

0人赞添加讨论(0) 举报

Why is my String returning “\ufffd\ufffdN a m e”

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间