-->

Changing the Character set of a .dbf file

2019-05-31 05:08发布

问题:

I have this java application that should load and print data with french special characters from a .dbf or dBase3 file but it doesn't work; the characters are not showing.

I asked this question thinking that the problem was related only to the printing, but if you see the comments you can understand that i figured out that the problem was related to the database and not to the printing, since when adding a special character to my JTextPane, it prints normally... and i tried changing the character set of the textPane but still the same problem.

Also, to complicate even more the question for those out there that love solving difficult problems, when i use MS Access to open my .dbf file, the characters are there. So i'm thinking, the error probably happens while loading the data from the database... By the way, to do data fetching, i'm using this API called xBaseJ that doesn't use sql, but it's own implementation.

I hope i have given all the necessary details and also i'd really appreciate any help, really.. any idea could help me figure out the solution (and the problem too).

Edit Now, with the Answer of Ethan Furman, we know that the problem is related to the encoding of database wich is Plain old Ascii and it's not related to the xBaseJ API.

Now, the question should be: Is it possible to change the encoding of a dBase database? And how can i do it? Thank you @Ethan Furman, And thanks in advance for any help related to this question.

回答1:

Finally, i found the answer...

First of all and as mentioned, thanks to Ethan Furman, i figured out that the problem was related to the encoding of the dbf Database and not to the xBaseJ API.

Then i had to search for hours for a tool that can help me change the charset of the database which is Ascii. I found out that OpenOffice from Apache does that but the problem is that i don't have OpenOffice on my windows, and i tried to download it 5 or 6 times but every time it is interrupted since my internet connection is really really bad (it downloads at the speed of 6 to 7Kbs) and the .exe file is 209 mB. So i had to search even more for another software to do the needed task.. And i don't how i found this DBF Commander that does more than just changing the charset. Anyways, downloaded the trial version that does everything but shows a window telling you to buy it everytime you do anything :D.

Finally, i changed the charset from Ascii (850 International MS-DOS or something) to 1252 Windows Ansi... aaaaand boom! it works!

I still think that there's a difference between the terms "codePage" "Charset" and "encoding" and i'm using them the same.. But at least now i know they exist, and that's a new thing that i learned.

Anyways, thank you again Ethan Furman, and i'd like to thank Google also for making this possible :D!



回答2:

dbf files all use encodings, and not utf-8. Which encoding was used is a part of the metadata stored in the first few bytes of the file. You are facing one of two scenarios:

  • The encoding is stored properly in the dbf file

    If this is happening then MS Acess is properly using that information to decode the raw dbf data into unicode, and xBaseJ is not.

  • The encoding is not stored properly in the file

    If this is happening then MS Access is getting a lucky guess on the encoding, and xBaseJ is refusing to guess.

You need to find a tool that will examine the dbf file and tell you which encoding was stored in it. If you don't know of any, and you don't mind having Python on your machine, you can use a dbf module I wrote to figure it out:

import dbf

table = dbf.Table('/path/to/some_table.dbf')
print(table)

which will print out the encoding, number of fields, size of a record, field names, etc.

Note on installing (which can be such a pain)

Ideally, you should be able to install pip, and then do a pip install enum34 dbf --upgrade which will put the latest versions of those two libraries in the correct spot on your system.

Failing that, you'll want to grab both enum34 and dbf from PyPI and put enum.py and dbf.py in your Python's site-packages folder:

c:\python27\lib\site-packages  # I think, it's been a while since I used Windows

Update

If, after doing all that, you discover that the codepage/encoding was never set in the file (it's amazing how often this happens), then you can also use dbf to change it (if you know what it should be):

table.open()
table.codepage = dbf.CodePage('cp1252') # for example
table.close()


回答3:

I could be wrong but try setting your database to UTF-8. I'm guessing this problem has to do with character encoding.



回答4:

You can try this library: xbase4j. As I learned, in many DBF files the "language" flag is set incorrectly or is not set at all. To solve this problem, just specify the the proper language before opening the DBF file. Something like this:

new XBase().withLanguage(Language.WinANSI).open(new File("..."));

Feel free to contact me if you need some help.

Regards,