Hello I am looking for a way to detect if a string has being encoded
For example
String name = "Hellä world";
String encoded = new String(name.getBytes("utf-8"), "iso8859-1");
The output of this encoded
variable is:
Hellä world
As you can see there is an A with grave and another symbol. Is there a way to check if the output contains encoded characters?
Your question doesn't make sense. A java
String
is a list of characters. They don't have an encoding until you convert them into bytes, at which point you need to specify one (although you will see a lot of code that uses the platform default, which is what e.g.String.getBytes()
with no argument does).I suggest you read this http://kunststube.net/encoding/.
Sounds like you want to check if a string that was decoded from bytes in latin1 could have been decoded in UTF-8, too. That's easy because illegal byte sequences are replaced by the character \ufffd:
I'm not really sure what are you trying to do or what is your problem.
This line doesn't make any sense:
You are encoding your
name
into "UTF-8" and then trying to decode as "iso8859-1".If you what to encode your
name
as "iso8859-1" just doname.getBytes("iso8859-1")
.Please tell us what is the problem you encountered so that we can help more.
You can check that your string is encoded or not by this code
If I correctly understood your question, this code may help you. The function isEncoded check if its parameter could be encoded as ascii or if it contains non ascii-chars.
You can also check for other charset changing charset var or moving it to a parameter.
This code is just a character corruption bug. You take a UTF-16 string, transcode it to UTF-8, pretend it is ISO-8859-1 and transcode it back to UTF-16, resulting in incorrectly encoded characters.