Why is Java BufferedReader() not reading Arabic an

I'm trying to read a file which contain English & Arabic characters on each line and another file which contains English & Chinese characters on each line. However the characters of the Arabic and Chinese fail to show correctly - they just appear as question marks. Any idea how I can solve this problem?

Here is the code I use for reading:

try {
        String sCurrentLine;
        BufferedReader br = new BufferedReader(new FileReader(directionOfTargetFile));
        int counter = 0;

        while ((sCurrentLine = br.readLine()) != null) {
            String lineFixedHolder = converter.fixParsedParagraph(sCurrentLine);
            System.out.println("The line number "+ counter
                               + " contain : " + sCurrentLine);
            counter++;
        }
    }

Edition 01

After reading the line and getting the Arabic and Chinese word I use a function to translate them by simply searching for Given Arabic Text in an ArrayList (which contain all expected words) (using indexOf(); method). Then when the word's index is found it's used to call the English word which has the same index in another Arraylist. However this search always returns false because it fails when searching the question marks instead of the Arabic and Chinese characters. So my System.out.println print shows me nulls, one for each failure to translate.

*I'm using Netbeans 6.8 Mac version IDE

Edition 02

Here is the code which search for translation:

        int testColor = dbColorArb.indexOf(wordToTranslate);
        int testBrand = -1;
        if ( testColor != -1 ) {
            String result = (String)dbColorEng.get(testColor);
            return result;
        } else {
            testBrand = dbBrandArb.indexOf(wordToTranslate);
        }
        //System.out.println ("The testBrand is : " + testBrand);
        if ( testBrand != -1 ) {
            String result = (String)dbBrandEng.get(testBrand);
            return result;
        } else {
            //System.out.println ("The first null");
            return null;
        }

I'm actually searching 2 Arraylists which might contain the the desired word to translate. If it fails to find them in both ArrayLists, then null is returned.

Edition 03

When I debug I found that lines being read are stored in my String variable as the following:

 "3;0000000000;0000001001;1996-06-22;;2010-01-27;����;;01989;������;"

Edition 03

The file I'm reading has been given to me after it has been modified by another program (which I know nothing about beside it's made in VB) the program made the Arabic letters that are not appearing correctly to appear. When I checked the encoding of the file on Notepad++ it showed that it's ANSI. however when I convert it to UTF8 (which replaced the Arabic letter with other English one) and then convert it back to ANSI the Arabic become question marks!

标签： java encoding utf-8 arabic

3条回答

Rolldiameter

2楼-- · 2019-01-09 04:38

IT is most likely Reading the information in correctly, however your output stream is probably not UTF-8, and so any character that cannot be shown in your output character set is being replaced with the '?'.

You can confirm this by getting each character out and printing the character ordinal.

0人赞添加讨论(0) 举报

别忘想泡老子

3楼-- · 2019-01-09 04:40

public void writeTiFile(String fileName,String str){
    try {
        FileOutputStream out = new FileOutputStream(fileName);
        out.write(str.getBytes("windows-1256"));
    } catch (Exception ex) {
        ex.printStackTrace();
    }
}

0人赞添加讨论(0) 举报

可以哭但决不认输i

4楼-- · 2019-01-09 04:49

FileReader javadoc:

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

So:

Reader reader = new InputStreamReader(new FileInputStream(fileName), "utf-8");
BufferedReader br = new BufferedReader(reader);

If this still doesn't work, then perhaps your console is not set to properly display UTF-8 characters. Configuration depends on the IDE used and is rather simple.

Update : In the above code replace utf-8 with cp1256. This works fine for me (WinXP, JDK6)

But I'd recommend that you insist on the file being generated using UTF-8. Because cp1256 won't work for Chinese and you'll have similar problems again.

0人赞添加讨论(0) 举报

Why is Java BufferedReader() not reading Arabic an

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间