I am working to extract response charset in a java web app, where I am using Apache HTTP Client.
For example, one possible value obtained from "Content-Type" header is
text/html; charset=UTF-8
Then my code will extract all text after the "=" sign...
So the charset as extracted will be
UTF-8
I just wanted to know, is the above method for obtaining response charset correct? Or is there some scenario where the above code will not work? Is there something I am missing here?
Doesn't httpclient (or http core) already provide that functionality? Something like this:
HttpResponse response = ...
String charset = EntityUtils.getContentCharSet(response.getEntity());
The method provided by forty-two can work. But the method is deprecated, I find out that this website has a good example of method to find the charset.
HttpEntity entity = response.getEntity();
ContentType contentType = ContentType.getOrDefault(entity);
Charset charset = contentType.getCharset();
System.out.println("Charset = " + charset.toString());
Well, that approach will fail when
- the charset value is quoted
- when the quoted value uses escapes
- when there are parameters other than "charset"