Is htmlspecialchars() required on ALL output?

2019-03-06 10:16发布

问题:

I am writing some scripts for Expression Engine and have been told that every single piece of data which we output to the page, requires 'sanitizing', to prevent XSS.

For example here, I am fetching all Categories from the database, sorting into an array and returning to Expression Engine.

PHP Function

public function categories()
{
    $query = $this->crm_db->select('name, url_name')
        ->order_by("name", "asc")
        ->get_where('activities_categories', array('active'=>1));

    foreach($query->result() as $row)
    {
        $activityCategories[0]['cats'][] = array(
                    'categoryName' => $row->name,
                    'categoryURL' => $row->url_name,
                );
    }   
    return $this->EE->TMPL->parse_variables($this->EE->TMPL->tagdata, $activityCategories);
}

Template Code

            {exp:activities:categories}
                {cats}
                    <a href="/{categoryURL}">{categoryName}</a>
                {/cats}
            {/exp:activities:categories}

I am being told, that I need to use htmlspecialchars() function on every single piece of data which is being outputted.

Is this necessary?

Is this correct?

Example:

foreach($query->result() as $row)
{
    $activityCategories[0]['cats'][] = array(
                'categoryName' => htmlspecialchars($row->name),
                'categoryURL' => htmlspecialchars($row->url_name),
            );
}   

Many thanks! :)

回答1:

htmlspecialchars() required on ALL HTML output unless told otherwise.

Other output media (such as JS, JSON etc.) require their own escaping.



回答2:

Whether htmlspecialchars suffices or not depends on the context the data is put into. Because it only escapes certain characters using character references which are only a mitigation in certain contexts:

  • If it’s HTML text (outside of HTML tags), it suffices:

    <p>❌</p>
    
  • If it’s inside a quoted HTML attribute value, it suffices (see also flags parameter for single quoted attributes):

    <span title="❌"> … </span>
    

    However, there are certain attributes which still can be used for XSS, e.g., attributes for URIs.

Any other context may require the escaping of other characters. For example, an unquoted attribute value would require any whitespace character also being escaped as it otherwise would end the attribute value.

Also note that a context may require different types of encodings. For example, if you want to print a JavaScript value embedded into <script>, you have to obey both JavaScript and HTML syntax rules.



回答3:

Well, if you don't want XSS problems, then using htmlspecialchars() is a good idea. If you didn't, someone could store a malicious <iframe>, or <script> in your code.

Now, you don't necessarily need to protect the output. If you sanitize the data coming in, you can store the sanitized data instead. Doing it this way allows you to purposefully store non-sanitized data for maybe styling purposes which would be ready for output.

As far as your example is concerned, it is correct. That is one way you can do it.

EDIT: You only need to apply htmlspecialchars() to strings.