Which are the valid xml encoding strings? For instance, what is the way of specifying UTF-8:
encoding="utf8"
encoding="utf8"
- etc
Or Windows 1251:
encoding="windows-1251"
encoding="windows1251"
encoding="cp-1251"
- etc.
I am making a character decoder as well as a xml parser. Thus, I need to be able to set the encoding of my StreamReader based on the value from the encoding attribute.
Any ideas where I could find a list of the official encoding string?
The best I could find is this, but it seems to be IE specific.
Thanks!
If all fails, read the spec :-).
Source: http://www.w3.org/TR/REC-xml/
So UTF-8 is written as
encoding="UTF-8"
.For other character sets not listed above, use the names given in the IANA character set list.
Case of the letters in the character set name is not significant: "However, no distinction is made between use of upper and lower case letters." (IANA character set list). So you could also write
encoding="uTf-8"
if you feel like it ;-).BTW: Are you really, really certain you want to write your own XML parser? This sounds suspiciously like reinventing the wheel.
Use command locale -A to see all the encodings: http://dwbitechguru.blogspot.ca/2014/07/check-foreign-characters-support-on.html
Option A: To add encoding using the below tags:
You can edit the encoding attribute in the the dtd using XML spy.
Related links: http://dwbitechguru.blogspot.ca/2014/07/issue-xml-reader-error.html
should be fine for utf-8.