Is there a known XSS or other attack that makes it past a
$content = "some HTML code";
$content = strip_tags($content);
echo $content;
?
The manual has a warning:
This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.
but that is related to using the allowable_tags
parameter only.
With no allowed tags set, is strip_tags()
vulnerable to any attack?
Chris Shiflett seems to say it's safe:
Use Mature Solutions
When possible, use mature, existing solutions instead of trying to create your own. Functions like strip_tags() and htmlentities() are good choices.
is this correct? Please if possible, quote sources.
I know about HTML purifier, htmlspecialchars() etc.- I am not looking for the best method to sanitize HTML. I just want to know about this specific issue. This is a theoretical question that came up here.
Reference: strip_tags()
implementation in the PHP source code
According to this online tool, this string will be "perfectly" escaped, but the result is another malicious one!
In the string the "real" tags are
<a>
and</a>
, since<
andscript>
alone aren't tags.I hope I'm wrong or that it's just because of an old version of PHP, but it's better to check in your environment.
Strip tags is perfectly safe - if all that you are doing is outputting the text to the html body.
It is not necessarily safe to put it into mysql or url attributes.
As its name may suggest,
strip_tags
should remove all HTML tags. The only way we can proof it is by analyzing the source code. The next analysis applies to astrip_tags('...')
call, without a second argument for whitelisted tags.First at all, some theory about HTML tags: a tag starts with a
<
followed by non-whitespace characters. If this string starts with a?
, it should not be parsed. If this string starts with a!--
, it's considered a comment and the following text should neither be parsed. A comment is terminated with a-->
, inside such a comment, characters like<
and>
are allowed. Attributes can occur in tags, their values may optionally be surrounded by a quote character ('
or"
). If such a quote exist, it must be closed, otherwise if a>
is encountered, the tag is not closed.The code
<a href="example>xxx</a><a href="second">text</a>
is interpreted in Firefox as:The PHP function
strip_tags
is referenced in line 4036 of ext/standard/string.c. That function calls the internal function php_strip_tags_ex.Two buffers exist, one for the output, the other for "inside HTML tags". A counter named
depth
holds the number of open angle brackets (<
).The variable
in_q
contains the quote character ('
or"
) if any, and0
otherwise. The last character is stored in the variablelc
.The functions holds five states, three are mentioned in the description above the function. Based on this information and the function body, the following states can be derived:
<
)<
and!
characters (the tag buffer contains<!
)We need just to be careful that no tag can be inserted. That is,
<
followed by a non-whitespace character. Line 4326 checks an case with the<
character which is described below:<a href="inside quotes">
), the<
character is ignored (removed from the output).<
is added to the output buffer.1
("inside HTML tag") and the last characterlc
is set to<
depth
is incremented and the character ignored.If
>
is met while the tag is open (state == 1
),in_q
becomes0
("not in a quote") andstate
becomes0
("not in a tag"). The tag buffer is discarded.Attribute checks (for characters like
'
and"
) are done on the tag buffer which is discarded. So the conclusion is:By "outside tags", I mean not in tags as in
<a href="in tag">outside tag</a>
. Text may contain<
and>
though, as in>< a>>
. The result is not valid HTML though,<
,>
and&
need still to be escaped, especially the&
. That can be done withhtmlspecialchars()
.The description for
strip_tags
without an whitelist argument would be:I cannot predict future exploits, especially since I haven't looked at the PHP source code for this. However, there have been exploits in the past due to browsers accepting seemingly invalid tags (like
<s\0cript>
). So it's possible that in the future someone might be able to exploit odd browser behavior.That aside, sending the output directly to the browser as a full block of HTML should never be insecure:
However, this is not safe:
because one could easily end the quote via
"
and insert a script handler.I think it's much safer to always convert stray
<
into<
(and the same with quotes).