PHP code to generate safe URL?

2019-02-03 18:53发布

We need to generate a unique URL from the title of a book - where the title can contain any character. How can we search-replace all the 'invalid' characters so that a valid and neat lookoing URL is generated?

For instance:

"The Great Book of PHP"

www.mysite.com/book/12345/the-great-book-of-php

"The Greatest !@#$ Book of PHP"

www.mysite.com/book/12345/the-greatest-book-of-php

"Funny title     "

www.mysite.com/book/12345/funny-title

9条回答
姐就是有狂的资本
2楼-- · 2019-02-03 19:21

You can use a simple regular expression for this purpose:

<?php
    function safeurl( $v )
    {
        $v = strtolower( $v );
        $v = preg_replace( "/[^a-z0-9]+/", "-", $v );
        $v = trim( $v, "-" );
        return $v;
    }
    echo "<br>www.mysite.com/book/12345/" . safeurl( "The Great Book of PHP" );
    echo "<br>www.mysite.com/book/12345/" . safeurl( "The Greatest !@#$ Book of PHP" );
    echo "<br>www.mysite.com/book/12345/" . safeurl( "  Funny title  " );
    echo "<br>www.mysite.com/book/12345/" . safeurl( "!!Even Funnier title!!" );
?>
查看更多
Luminary・发光体
3楼-- · 2019-02-03 19:25

Use a regex replace to remove all non word characters. For example:

str_replace('[^a-zA-Z]+', '-', $input)

查看更多
爷的心禁止访问
4楼-- · 2019-02-03 19:29

Sanitizing special characters not an easy task imho. Take a look at WordPress awesome sanitize_title function, also look it's source.

Update: Sorry guys, i should downvote every answer which isn't dealing with accented characters. Do you understand what "the title can contain any character" means?

Update 2: Go, guys, go! Please downvote me as many as you can!

Note: and please don't get surprised when you meet a special character. Just eliminate it with str_replace!

查看更多
可以哭但决不认输i
5楼-- · 2019-02-03 19:30

This code comes from CodeIgniter's url helper. It should do the trick.

function url_title($str, $separator = 'dash', $lowercase = FALSE)
    {
        if ($separator == 'dash')
        {
            $search     = '_';
            $replace    = '-';
        }
        else
        {
            $search     = '-';
            $replace    = '_';
        }

        $trans = array(
                        '&\#\d+?;'              => '',
                        '&\S+?;'                => '',
                        '\s+'                   => $replace,
                        '[^a-z0-9\-\._]'        => '',
                        $replace.'+'            => $replace,
                        $replace.'$'            => $replace,
                        '^'.$replace            => $replace,
                        '\.+$'                  => ''
                      );

        $str = strip_tags($str);

        foreach ($trans as $key => $val)
        {
            $str = preg_replace("#".$key."#i", $val, $str);
        }

        if ($lowercase === TRUE)
        {
            $str = strtolower($str);
        }

        return trim(stripslashes($str));
    }
查看更多
女痞
6楼-- · 2019-02-03 19:35

If “invalid” means non-alphanumeric, you can do this:

function foo($str) {
    return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($str)), '-');
}

This will turn $str into lowercase, replace any sequence of one or more non-alphanumeric characters by one hyphen, and then remove leading and trailing hyphens.

var_dump(foo("The Great Book of PHP") === 'the-great-book-of-php');
var_dump(foo("The Greatest !@#$ Book of PHP") === 'the-greatest-book-of-php');
var_dump(foo("Funny title     ") === 'funny-title');
查看更多
叛逆
7楼-- · 2019-02-03 19:37

If you want to allow only letters, digits and underscore (usual word characters) you can do:

$str = strtolower(preg_replace(array('/\W/','/-+/','/^-|-$/'),array('-','-',''),$str));

It first replaces any non-word character(\W) with a -.
Next it replaces any consecutive - with a single -
Next it deletes any leading or trailing -.

Working link

查看更多
登录 后发表回答