Difference between and  

2019-02-01 12:23发布

问题:

Can any one explain me difference between   and   ?

I have html data stored in database in binary form and space in that can be either of   or   or sometimes  .

Also issue is when I convert this HTML to plain text using JSoup lib it is converting it properly but if I use String.contains(my string) method of java. It looks like the HTML data which is having   is different from which is having  . String is not found in either vice versa.

Example:

HTML1 : This is my test string

HTML2 : This is my test string

If I convert it to plain text using JSoup. It returns

HTML 1 : This is my test string

HTML 2 : This is my test string

But still both string are not same. Why is it so?

回答1:

  is the classic space, the one you get when you hit your spacebar, represented by his HTML entity equivalent.

  and   represents the non-breaking space, often used to prevent collapse of multiple spaces togethers by the browser :

"    " => " " (collapsed into only one space)

"    " => "    " (not collapsed)

If you are parsing a string containing both classic and non-breaking spaces, you can safely replace one by the other.



回答2:

&#32 is the character for the space key.

&#160 and &nbsp are both the characters for Non breaking space.

If your data has come from different sources it may be possible that the space symbols have been encoded differently.

In direct comparison they will likely be shown as being different.



回答3:

 , is just a space character nothing more. Regular occurrence of this character will collapse to one space character at the end.

Where as &#160 and   both represent non-breaking space character and if they occur continuously one after another, they will be collapse or break to one space character.

Only, difference between them is that &#160 is the HTML number and   is a HTML name.

Basically all of these are HTML entities. You can learn and know about them, seeing the following links.

  1. Link 1
  2. Link 2


回答4:

Java 8 onwards following should work:

string.replace("\\h", " ");

where \h is a horizontal whitespace character as described here