Is htmlspecialchars() required on ALL output?

2019-03-06 10:08发布

I am writing some scripts for Expression Engine and have been told that every single piece of data which we output to the page, requires 'sanitizing', to prevent XSS.

For example here, I am fetching all Categories from the database, sorting into an array and returning to Expression Engine.

PHP Function

public function categories()
{
    $query = $this->crm_db->select('name, url_name')
        ->order_by("name", "asc")
        ->get_where('activities_categories', array('active'=>1));

    foreach($query->result() as $row)
    {
        $activityCategories[0]['cats'][] = array(
                    'categoryName' => $row->name,
                    'categoryURL' => $row->url_name,
                );
    }   
    return $this->EE->TMPL->parse_variables($this->EE->TMPL->tagdata, $activityCategories);
}

Template Code

            {exp:activities:categories}
                {cats}
                    <a href="/{categoryURL}">{categoryName}</a>
                {/cats}
            {/exp:activities:categories}

I am being told, that I need to use htmlspecialchars() function on every single piece of data which is being outputted.

Is this necessary?

Is this correct?

Example:

foreach($query->result() as $row)
{
    $activityCategories[0]['cats'][] = array(
                'categoryName' => htmlspecialchars($row->name),
                'categoryURL' => htmlspecialchars($row->url_name),
            );
}   

Many thanks! :)

3条回答
Rolldiameter
2楼-- · 2019-03-06 10:21

htmlspecialchars() required on ALL HTML output unless told otherwise.

Other output media (such as JS, JSON etc.) require their own escaping.

查看更多
小情绪 Triste *
3楼-- · 2019-03-06 10:22

Whether htmlspecialchars suffices or not depends on the context the data is put into. Because it only escapes certain characters using character references which are only a mitigation in certain contexts:

  • If it’s HTML text (outside of HTML tags), it suffices:

    <p>❌</p>
    
  • If it’s inside a quoted HTML attribute value, it suffices (see also flags parameter for single quoted attributes):

    <span title="❌"> … </span>
    

    However, there are certain attributes which still can be used for XSS, e.g., attributes for URIs.

Any other context may require the escaping of other characters. For example, an unquoted attribute value would require any whitespace character also being escaped as it otherwise would end the attribute value.

Also note that a context may require different types of encodings. For example, if you want to print a JavaScript value embedded into <script>, you have to obey both JavaScript and HTML syntax rules.

查看更多
我想做一个坏孩纸
4楼-- · 2019-03-06 10:37

Well, if you don't want XSS problems, then using htmlspecialchars() is a good idea. If you didn't, someone could store a malicious <iframe>, or <script> in your code.

Now, you don't necessarily need to protect the output. If you sanitize the data coming in, you can store the sanitized data instead. Doing it this way allows you to purposefully store non-sanitized data for maybe styling purposes which would be ready for output.

As far as your example is concerned, it is correct. That is one way you can do it.

EDIT: You only need to apply htmlspecialchars() to strings.

查看更多
登录 后发表回答