One of my requirement says "Text Box Name should accept only UTF-8 Character set". I want to perform a negative test by entering a non UTF-8 character set. How can I do this?
相关问题
- WebElement.getText() function and utf8
- How to check if a string contain only UTF-8 charac
- Emoji are not being encoded correctly for output w
- Django FileField encoding
- Windows Python: Changing encoding using the locale
相关文章
- Spanish Characters in HTML Page Title
- Base64 Encoding: Illegal base64 character 3c
- How to read the Content Type header and convert in
- Is it possible to have SQL Server convert collatio
- Python Saving JSON Files as UTF-8
- WebClient DownloadString UTF-8 not displaying inte
- German Umlauts in strftime date-formatting - corre
- What is exactly an overlong form/encoding?
If you are asking how to construct a non-UTF-8 character, that should be easy from this definition from Wikipedia:
For code points U+0000 through U+007F, each codepoint is one byte long and looks like this:
For code points U+0080 through U+07FF, each codepoint is two bytes long and look like this:
And so on.
So, to construct an illegal UTF-8 character that is one byte long, the highest bit must be 1 (to be different from pattern a) and the second highest bit must be 0 (to be different from pattern b):
or
Which also differs from both patterns.
With the same logic, you can construct illegal codeunit sequences which are more than two bytes long.
You did not tag a language, but I had to test it, so I used Java:
0 to 31 are non-printable characters, then 32 is space, followed by printable characters:
delete
is0x7f
and after it, from 128 inclusively up to 254 no valid characters are printed. You can see from the UTF-8 chartable also:Codepoint
U+007F
is represented with one byte0x7F
(bits01111111
), while codepointU+0080
is represented with two bytes0xC2 0x80
(bits11000010 10000000
).If you are not familiar with UTF-8 I strongly recommend reading this excellent article:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)