可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
We need to generate a unique URL from the title of a book - where the title can contain any character. How can we search-replace all the 'invalid' characters so that a valid and neat lookoing URL is generated?
For instance:
"The Great Book of PHP"
www.mysite.com/book/12345/the-great-book-of-php
"The Greatest !@#$ Book of PHP"
www.mysite.com/book/12345/the-greatest-book-of-php
"Funny title "
www.mysite.com/book/12345/funny-title
回答1:
Ah, slugification
// This function expects the input to be UTF-8 encoded.
function slugify($text)
{
// Swap out Non "Letters" with a -
$text = preg_replace('/[^\\pL\d]+/u', '-', $text);
// Trim out extra -'s
$text = trim($text, '-');
// Convert letters that we have left to the closest ASCII representation
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// Make text lowercase
$text = strtolower($text);
// Strip out anything we haven't been able to convert
$text = preg_replace('/[^-\w]+/', '', $text);
return $text;
}
This works fairly well, as it first uses the unicode properties of each character to determine if it's a letter (or \d against a number) - then it converts those that aren't to -'s - then it transliterates to ascii, does another replacement for anything else, and then cleans up after itself. (Fabrik's test returns "arvizturo-tukorfurogep")
I also tend to add in a list of stop words - so that those are removed from the slug. "the" "of" "or" "a", etc (but don't do it on length, or you strip out stuff like "php")
回答2:
If “invalid” means non-alphanumeric, you can do this:
function foo($str) {
return trim(preg_replace('/[^a-z0-9]+/', '-', strtolower($str)), '-');
}
This will turn $str
into lowercase, replace any sequence of one or more non-alphanumeric characters by one hyphen, and then remove leading and trailing hyphens.
var_dump(foo("The Great Book of PHP") === 'the-great-book-of-php');
var_dump(foo("The Greatest !@#$ Book of PHP") === 'the-greatest-book-of-php');
var_dump(foo("Funny title ") === 'funny-title');
回答3:
You can use a simple regular expression for this purpose:
<?php
function safeurl( $v )
{
$v = strtolower( $v );
$v = preg_replace( "/[^a-z0-9]+/", "-", $v );
$v = trim( $v, "-" );
return $v;
}
echo "<br>www.mysite.com/book/12345/" . safeurl( "The Great Book of PHP" );
echo "<br>www.mysite.com/book/12345/" . safeurl( "The Greatest !@#$ Book of PHP" );
echo "<br>www.mysite.com/book/12345/" . safeurl( " Funny title " );
echo "<br>www.mysite.com/book/12345/" . safeurl( "!!Even Funnier title!!" );
?>
回答4:
If you want to allow only letters, digits and underscore (usual word characters) you can do:
$str = strtolower(preg_replace(array('/\W/','/-+/','/^-|-$/'),array('-','-',''),$str));
It first replaces any non-word character(\W
) with a -
.
Next it replaces any consecutive -
with a single -
Next it deletes any leading or trailing -
.
Working link
回答5:
This code comes from CodeIgniter's url helper. It should do the trick.
function url_title($str, $separator = 'dash', $lowercase = FALSE)
{
if ($separator == 'dash')
{
$search = '_';
$replace = '-';
}
else
{
$search = '-';
$replace = '_';
}
$trans = array(
'&\#\d+?;' => '',
'&\S+?;' => '',
'\s+' => $replace,
'[^a-z0-9\-\._]' => '',
$replace.'+' => $replace,
$replace.'$' => $replace,
'^'.$replace => $replace,
'\.+$' => ''
);
$str = strip_tags($str);
foreach ($trans as $key => $val)
{
$str = preg_replace("#".$key."#i", $val, $str);
}
if ($lowercase === TRUE)
{
$str = strtolower($str);
}
return trim(stripslashes($str));
}
回答6:
Replace special chars for white spaces and then replace white spaces for "-". str_replace?
回答7:
Use a regex replace to remove all non word characters. For example:
str_replace('[^a-zA-Z]+', '-', $input)
回答8:
<?php
$input = " The Great Book's of PHP ";
$output = trim(preg_replace(array("`'`", "`[^a-z]+`"), array("", "-"), strtolower($input)), "-");
echo $output; // the-great-books-of-php
This trims trailing dashes and doesn't do things like "it's raining" -> "it-s-raining"
as most solutions tend to do.
回答9:
Sanitizing special characters not an easy task imho. Take a look at WordPress awesome sanitize_title function, also look it's source.
Update:
Sorry guys, i should downvote every answer which isn't dealing with accented characters. Do you understand what "the title can contain any character" means?
Update 2:
Go, guys, go! Please downvote me as many as you can!
Note: and please don't get surprised when you meet a special character. Just eliminate it with str_replace!