In my Grails application I use Groovy's XmlParser to parse an XML file. The value of one of the attributes in my XML file is a string that equals a character hex code. I want to save that string in my database:
Ñ
Unfortunately the attribute method returns the Ñ character and what actually gets stored in the database is c391
. When the field is read back out I also get the Ñ character which is undesired.
How can I store the hex code as a string in my database and make sure it gets read back out as a hex code as well?
Update #1:
The reason this is a problem for me is that once I read the XML file into my database I must be able to reconstruct it exactly as it was. An additional problem is that the field in question isn't always a character hex code. It could just be some arbitrary string.
Update #2:
I guess it doesn't matter how the character is stored in the database, so long as I can write it back out in its expanded hex code format. I am using Groovy MarkupBuilder to reconstruct my XML file from the database and I am unclear why this isn't happening by default.
Update #3:
I overrode getTableTypeString
in my custom MySQL dialect and that seems to have helped things some what. At least now the value I pass to MySQL is the value that gets stored in the database.
class CustomMySQL5InnoDBDialect extends MySQL5InnoDBDialect {
@Override
public String getTableTypeString() {
return " ENGINE=InnoDB DEFAULT CHARSET=utf8"
}
}
I also created my own version of groovy.util.XmlParser. My version is pretty much an exact duplicate of groovy.util.XmlParser
except that in the startElement
method I changed:
String value = list.getValue(i)
to this:
def value = list.fAttributes.fAttributes[i].nonNormalizedValue
if(value ==~ /&#x([0-9A-F]+?);/) {
value = list.fAttributes.fAttributes[i].nonNormalizedValue
}
This allows the exact text of hex code elements to be stored in the database.
Now there are two new problems, possibly three.
Recreating a file with the exact values stored in the database. Up till now I had been using
MarkupBuilder
, but that is doing extra encoding on ampersands, causing the valueÑ
to be written out asÑ
I can probably get around this by abandoningMarkupBuilder
and building my XML strings manually, but I would rather not.Running an XSLT transform on an XML file using the Saxon-HE 9.4 processor causes some hex code values such as
ÿ
to be changed to something like ÿ, yet others like™
are left unchanged.I'm not sure if this is going to be a problem yet or not, but when I recreate the file I would like it to be in
ANSI
encoding since that is the encoding used for the original file.