Extend PHP regex to cover “srcset” and “style” att

2019-05-24 12:49发布

I've created a WordPress plugin that turn all links into protocol-relative URLs (removing http: and https:) based off the tags and attributes that I list in the $tag and $attribute variables. This is part of the function. To save space, the rest of the code can be found here.

$content_type = NULL;
# Check for 'Content-Type' headers only
foreach ( headers_list() as $header ) {
    if ( strpos( strtolower( $header ), 'content-type:' ) === 0 ) {
        $pieces = explode( ':', strtolower( $header ) );
        $content_type = trim( $pieces[1] );
        break;
    }
}
# If the content-type is 'NULL' or 'text/html', apply rewrite
if ( is_null( $content_type ) || substr( $content_type, 0, 9 ) === 'text/html' ) {
    $tag = 'a|base|div|form|iframe|img|link|meta|script|svg';
    $attribute = 'action|content|data-project-file|href|src|srcset|style';
    # If 'Protocol Relative URL' option is checked, only apply change to internal links
    if ( $this->option == 1 ) {
        # Remove protocol from home URL
        $website = preg_replace( '/https?:\/\//', '', home_url() );
        # Remove protocol from internal links
        $links = preg_replace( '/(<(' . $tag . ')([^>]*)(' . $attribute . ')=["\'])https?:\/\/' . $website . '/i', '$1//' . $website, $links );
    }
    # Else, remove protocols from all links
    else {
        $links = preg_replace( '/(<(' . $tag . ')([^>]*)(' . $attribute . ')=["\'])https?:\/\//i', '$1//', $links );
    }
}
# Return protocol relative links
return $links;

This works as intended, but it doesn't work on these examples:

<!-- Within the 'style' attribute -->
<div class="some-class" style='background-color:rgba(255,255,255,0);background-image:url("http://placehold.it/300x200");background-position:center center;background-repeat:no-repeat'>
<!-- Within the 'srcset' attribute -->
<img src="http://placehold.it/600x300" srcset="http://placehold.it/500 500x, http://placehold.it/100 100w">

However, the code partially works for these examples.

<div class="some-class" style='background-color:rgba(255,255,255,0);background-image:url("http://placehold.it/300x200");background-position:center center;background-repeat:no-repeat'>
<img src="http://placehold.it/600x300" srcset="//placehold.it/500 500x, http://placehold.it/100 100w">

I've played around with adding additional values to the $tag and $attribute variables, but that didn't help. I'd assume I need to update the rest of my regex to cover these two additional tags? Or is there is a different way to approach it, such as DOMDocument?

1条回答
对你真心纯属浪费
2楼-- · 2019-05-24 13:26

I was able to simplify the code by doing the following:

$content_type = NULL;
# Check for 'Content-Type' headers only
foreach ( headers_list() as $header ) {
    if ( strpos( strtolower( $header ), 'content-type:' ) === 0 ) {
        $pieces = explode( ':', strtolower( $header ) );
        $content_type = trim( $pieces[1] );
        break;
    }
}
# If the content-type is 'NULL' or 'text/html', apply rewrite
if ( is_null( $content_type ) || substr( $content_type, 0, 9 ) === 'text/html' ) {
    # Remove protocol from home URL
    $website = $_SERVER['HTTP_HOST'];
    $links = str_replace( 'https?://' . $website, '//' . $website, $links );
    $links = preg_replace( '|https?://(.*?)|', '//$1', $links );
}
# Return protocol relative links
return $links;
查看更多
登录 后发表回答