PHP function to make slug (URL string)

2019-01-01 10:24发布

问题:

function gen_slug($str){
    # special accents
    $a = array(\'À\',\'Á\',\'Â\',\'Ã\',\'Ä\',\'Å\',\'Æ\',\'Ç\',\'È\',\'É\',\'Ê\',\'Ë\',\'Ì\',\'Í\',\'Î\',\'Ï\',\'Ð\',\'Ñ\',\'Ò\',\'Ó\',\'Ô\',\'Õ\',\'Ö\',\'Ø\',\'Ù\',\'Ú\',\'Û\',\'Ü\',\'Ý\',\'ß\',\'à\',\'á\',\'â\',\'ã\',\'ä\',\'å\',\'æ\',\'ç\',\'è\',\'é\',\'ê\',\'ë\',\'ì\',\'í\',\'î\',\'ï\',\'ñ\',\'ò\',\'ó\',\'ô\',\'õ\',\'ö\',\'ø\',\'ù\',\'ú\',\'û\',\'ü\',\'ý\',\'ÿ\',\'A\',\'a\',\'A\',\'a\',\'A\',\'a\',\'C\',\'c\',\'C\',\'c\',\'C\',\'c\',\'C\',\'c\',\'D\',\'d\',\'Ð\',\'d\',\'E\',\'e\',\'E\',\'e\',\'E\',\'e\',\'E\',\'e\',\'E\',\'e\',\'G\',\'g\',\'G\',\'g\',\'G\',\'g\',\'G\',\'g\',\'H\',\'h\',\'H\',\'h\',\'I\',\'i\',\'I\',\'i\',\'I\',\'i\',\'I\',\'i\',\'I\',\'i\',\'?\',\'?\',\'J\',\'j\',\'K\',\'k\',\'L\',\'l\',\'L\',\'l\',\'L\',\'l\',\'?\',\'?\',\'L\',\'l\',\'N\',\'n\',\'N\',\'n\',\'N\',\'n\',\'?\',\'O\',\'o\',\'O\',\'o\',\'O\',\'o\',\'Œ\',\'œ\',\'R\',\'r\',\'R\',\'r\',\'R\',\'r\',\'S\',\'s\',\'S\',\'s\',\'S\',\'s\',\'Š\',\'š\',\'T\',\'t\',\'T\',\'t\',\'T\',\'t\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'W\',\'w\',\'Y\',\'y\',\'Ÿ\',\'Z\',\'z\',\'Z\',\'z\',\'Ž\',\'ž\',\'?\',\'ƒ\',\'O\',\'o\',\'U\',\'u\',\'A\',\'a\',\'I\',\'i\',\'O\',\'o\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'?\',\'?\',\'?\',\'?\',\'?\',\'?\');
    $b = array(\'A\',\'A\',\'A\',\'A\',\'A\',\'A\',\'AE\',\'C\',\'E\',\'E\',\'E\',\'E\',\'I\',\'I\',\'I\',\'I\',\'D\',\'N\',\'O\',\'O\',\'O\',\'O\',\'O\',\'O\',\'U\',\'U\',\'U\',\'U\',\'Y\',\'s\',\'a\',\'a\',\'a\',\'a\',\'a\',\'a\',\'ae\',\'c\',\'e\',\'e\',\'e\',\'e\',\'i\',\'i\',\'i\',\'i\',\'n\',\'o\',\'o\',\'o\',\'o\',\'o\',\'o\',\'u\',\'u\',\'u\',\'u\',\'y\',\'y\',\'A\',\'a\',\'A\',\'a\',\'A\',\'a\',\'C\',\'c\',\'C\',\'c\',\'C\',\'c\',\'C\',\'c\',\'D\',\'d\',\'D\',\'d\',\'E\',\'e\',\'E\',\'e\',\'E\',\'e\',\'E\',\'e\',\'E\',\'e\',\'G\',\'g\',\'G\',\'g\',\'G\',\'g\',\'G\',\'g\',\'H\',\'h\',\'H\',\'h\',\'I\',\'i\',\'I\',\'i\',\'I\',\'i\',\'I\',\'i\',\'I\',\'i\',\'IJ\',\'ij\',\'J\',\'j\',\'K\',\'k\',\'L\',\'l\',\'L\',\'l\',\'L\',\'l\',\'L\',\'l\',\'l\',\'l\',\'N\',\'n\',\'N\',\'n\',\'N\',\'n\',\'n\',\'O\',\'o\',\'O\',\'o\',\'O\',\'o\',\'OE\',\'oe\',\'R\',\'r\',\'R\',\'r\',\'R\',\'r\',\'S\',\'s\',\'S\',\'s\',\'S\',\'s\',\'S\',\'s\',\'T\',\'t\',\'T\',\'t\',\'T\',\'t\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'W\',\'w\',\'Y\',\'y\',\'Y\',\'Z\',\'z\',\'Z\',\'z\',\'Z\',\'z\',\'s\',\'f\',\'O\',\'o\',\'U\',\'u\',\'A\',\'a\',\'I\',\'i\',\'O\',\'o\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'U\',\'u\',\'A\',\'a\',\'AE\',\'ae\',\'O\',\'o\');
    return strtolower(preg_replace(array(\'/[^a-zA-Z0-9 -]/\',\'/[ -]+/\',\'/^-|-$/\'),array(\'\',\'-\',\'\'),str_replace($a,$b,$str)));
}

Works great, but I\'ve found some cases in which it fails:

gen_slug(\'andrés\') returns andras instead of andres

Why? Any ideas on the preg_replace parameters?

回答1:

Instead of a lengthy replace, try this one:

public static function slugify($text)
{
  // replace non letter or digits by -
  $text = preg_replace(\'~[^\\pL\\d]+~u\', \'-\', $text);

  // transliterate
  $text = iconv(\'utf-8\', \'us-ascii//TRANSLIT\', $text);

  // remove unwanted characters
  $text = preg_replace(\'~[^-\\w]+~\', \'\', $text);

  // trim
  $text = trim($text, \'-\');

  // remove duplicate -
  $text = preg_replace(\'~-+~\', \'-\', $text);

  // lowercase
  $text = strtolower($text);

  if (empty($text)) {
    return \'n-a\';
  }

  return $text;
}

This was based off the one in Symfony\'s Jobeet tutorial.



回答2:

How about...

$slug = strtolower(trim(preg_replace(\'/[^A-Za-z0-9-]+/\', \'-\', $string)));

?



回答3:

If you have intl extension installed, you can use transliterator_transliterate function to create a slug easily.

You can replace spaces with dashes later to make it more like a slug.

<?php
$string = \"andrés\";
$string = transliterator_transliterate(\"Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();\", $string);
echo $string;
?>


回答4:

Note: I have taken this from wordpress and it works!!

Use it like this:

echo sanitize(\'testing this link\');

Code

//taken from wordpress
function utf8_uri_encode( $utf8_string, $length = 0 ) {
    $unicode = \'\';
    $values = array();
    $num_octets = 1;
    $unicode_length = 0;

    $string_length = strlen( $utf8_string );
    for ($i = 0; $i < $string_length; $i++ ) {

        $value = ord( $utf8_string[ $i ] );

        if ( $value < 128 ) {
            if ( $length && ( $unicode_length >= $length ) )
                break;
            $unicode .= chr($value);
            $unicode_length++;
        } else {
            if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;

            $values[] = $value;

            if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
                break;
            if ( count( $values ) == $num_octets ) {
                if ($num_octets == 3) {
                    $unicode .= \'%\' . dechex($values[0]) . \'%\' . dechex($values[1]) . \'%\' . dechex($values[2]);
                    $unicode_length += 9;
                } else {
                    $unicode .= \'%\' . dechex($values[0]) . \'%\' . dechex($values[1]);
                    $unicode_length += 6;
                }

                $values = array();
                $num_octets = 1;
            }
        }
    }

    return $unicode;
}

//taken from wordpress
function seems_utf8($str) {
    $length = strlen($str);
    for ($i=0; $i < $length; $i++) {
        $c = ord($str[$i]);
        if ($c < 0x80) $n = 0; # 0bbbbbbb
        elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
        elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
        elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
        elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
        elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
        else return false; # Does not match any model
        for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
            if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
                return false;
        }
    }
    return true;
}

//function sanitize_title_with_dashes taken from wordpress
function sanitize($title) {
    $title = strip_tags($title);
    // Preserve escaped octets.
    $title = preg_replace(\'|%([a-fA-F0-9][a-fA-F0-9])|\', \'---$1---\', $title);
    // Remove percent signs that are not part of an octet.
    $title = str_replace(\'%\', \'\', $title);
    // Restore octets.
    $title = preg_replace(\'|---([a-fA-F0-9][a-fA-F0-9])---|\', \'%$1\', $title);

    if (seems_utf8($title)) {
        if (function_exists(\'mb_strtolower\')) {
            $title = mb_strtolower($title, \'UTF-8\');
        }
        $title = utf8_uri_encode($title, 200);
    }

    $title = strtolower($title);
    $title = preg_replace(\'/&.+?;/\', \'\', $title); // kill entities
    $title = str_replace(\'.\', \'-\', $title);
    $title = preg_replace(\'/[^%a-z0-9 _-]/\', \'\', $title);
    $title = preg_replace(\'/\\s+/\', \'-\', $title);
    $title = preg_replace(\'|-+|\', \'-\', $title);
    $title = trim($title, \'-\');

    return $title;
}


回答5:

Here is an other one, for example \" Title with strange characters ééé A X Z\" becomes \"title-with-strange-characters-eee-a-x-z\".

/**
 * Function used to create a slug associated to an \"ugly\" string.
 *
 * @param string $string the string to transform.
 *
 * @return string the resulting slug.
 */
public static function createSlug($string) {

    $table = array(
            \'Š\'=>\'S\', \'š\'=>\'s\', \'Đ\'=>\'Dj\', \'đ\'=>\'dj\', \'Ž\'=>\'Z\', \'ž\'=>\'z\', \'Č\'=>\'C\', \'č\'=>\'c\', \'Ć\'=>\'C\', \'ć\'=>\'c\',
            \'À\'=>\'A\', \'Á\'=>\'A\', \'Â\'=>\'A\', \'Ã\'=>\'A\', \'Ä\'=>\'A\', \'Å\'=>\'A\', \'Æ\'=>\'A\', \'Ç\'=>\'C\', \'È\'=>\'E\', \'É\'=>\'E\',
            \'Ê\'=>\'E\', \'Ë\'=>\'E\', \'Ì\'=>\'I\', \'Í\'=>\'I\', \'Î\'=>\'I\', \'Ï\'=>\'I\', \'Ñ\'=>\'N\', \'Ò\'=>\'O\', \'Ó\'=>\'O\', \'Ô\'=>\'O\',
            \'Õ\'=>\'O\', \'Ö\'=>\'O\', \'Ø\'=>\'O\', \'Ù\'=>\'U\', \'Ú\'=>\'U\', \'Û\'=>\'U\', \'Ü\'=>\'U\', \'Ý\'=>\'Y\', \'Þ\'=>\'B\', \'ß\'=>\'Ss\',
            \'à\'=>\'a\', \'á\'=>\'a\', \'â\'=>\'a\', \'ã\'=>\'a\', \'ä\'=>\'a\', \'å\'=>\'a\', \'æ\'=>\'a\', \'ç\'=>\'c\', \'è\'=>\'e\', \'é\'=>\'e\',
            \'ê\'=>\'e\', \'ë\'=>\'e\', \'ì\'=>\'i\', \'í\'=>\'i\', \'î\'=>\'i\', \'ï\'=>\'i\', \'ð\'=>\'o\', \'ñ\'=>\'n\', \'ò\'=>\'o\', \'ó\'=>\'o\',
            \'ô\'=>\'o\', \'õ\'=>\'o\', \'ö\'=>\'o\', \'ø\'=>\'o\', \'ù\'=>\'u\', \'ú\'=>\'u\', \'û\'=>\'u\', \'ý\'=>\'y\', \'ý\'=>\'y\', \'þ\'=>\'b\',
            \'ÿ\'=>\'y\', \'Ŕ\'=>\'R\', \'ŕ\'=>\'r\', \'/\' => \'-\', \' \' => \'-\'
    );

    // -- Remove duplicated spaces
    $stripped = preg_replace(array(\'/\\s{2,}/\', \'/[\\t\\n]/\'), \' \', $string);

    // -- Returns the slug
    return strtolower(strtr($string, $table));


}


回答6:

An updated version of @Imran Omar Bukhsh code (from the latest Wordpress (4.0) branch):

<?php

// Add methods to slugify taken from Wordpress:
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php 
// - https://github.com/WordPress/WordPress/blob/master/wp-includes/functions.php

/**
 * Set the mbstring internal encoding to a binary safe encoding when func_overload
 * is enabled.
 *
 * When mbstring.func_overload is in use for multi-byte encodings, the results from
 * strlen() and similar functions respect the utf8 characters, causing binary data
 * to return incorrect lengths.
 *
 * This function overrides the mbstring encoding to a binary-safe encoding, and
 * resets it to the users expected encoding afterwards through the
 * `reset_mbstring_encoding` function.
 *
 * It is safe to recursively call this function, however each
 * `mbstring_binary_safe_encoding()` call must be followed up with an equal number
 * of `reset_mbstring_encoding()` calls.
 *
 * @since 3.7.0
 *
 * @see reset_mbstring_encoding()
 *
 * @param bool $reset Optional. Whether to reset the encoding back to a previously-set encoding.
 *                    Default false.
 */
function mbstring_binary_safe_encoding( $reset = false ) {
  static $encodings = array();
  static $overloaded = null;

  if ( is_null( $overloaded ) )
    $overloaded = function_exists( \'mb_internal_encoding\' ) && ( ini_get( \'mbstring.func_overload\' ) & 2 );

  if ( false === $overloaded )
    return;

  if ( ! $reset ) {
    $encoding = mb_internal_encoding();
    array_push( $encodings, $encoding );
    mb_internal_encoding( \'ISO-8859-1\' );
  }

  if ( $reset && $encodings ) {
    $encoding = array_pop( $encodings );
    mb_internal_encoding( $encoding );
  }
}

/**
 * Reset the mbstring internal encoding to a users previously set encoding.
 *
 * @see mbstring_binary_safe_encoding()
 *
 * @since 3.7.0
 */
function reset_mbstring_encoding() {
  mbstring_binary_safe_encoding( true );
}


/**
 * Checks to see if a string is utf8 encoded.
 *
 * NOTE: This function checks for 5-Byte sequences, UTF8
 *       has Bytes Sequences with a maximum length of 4.
 *
 * @author bmorel at ssi dot fr (modified)
 * @since 1.2.1
 *
 * @param string $str The string to be checked
 * @return bool True if $str fits a UTF-8 model, false otherwise.
 */
function seems_utf8($str) {
  mbstring_binary_safe_encoding();
  $length = strlen($str);
  reset_mbstring_encoding();
  for ($i=0; $i < $length; $i++) {
    $c = ord($str[$i]);
    if ($c < 0x80) $n = 0; # 0bbbbbbb
    elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
    elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
    elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
    elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
    elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
    else return false; # Does not match any model
    for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
      if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
        return false;
    }
  }
  return true;
}


/**
 * Encode the Unicode values to be used in the URI.
 *
 * @since 1.5.0
 *
 * @param string $utf8_string
 * @param int $length Max length of the string
 * @return string String with Unicode encoded for URI.
 */
function utf8_uri_encode( $utf8_string, $length = 0 ) {
  $unicode = \'\';
  $values = array();
  $num_octets = 1;
  $unicode_length = 0;

  mbstring_binary_safe_encoding();
  $string_length = strlen( $utf8_string );
  reset_mbstring_encoding();

  for ($i = 0; $i < $string_length; $i++ ) {

    $value = ord( $utf8_string[ $i ] );

    if ( $value < 128 ) {
      if ( $length && ( $unicode_length >= $length ) )
        break;
      $unicode .= chr($value);
      $unicode_length++;
    } else {
      if ( count( $values ) == 0 ) $num_octets = ( $value < 224 ) ? 2 : 3;

      $values[] = $value;

      if ( $length && ( $unicode_length + ($num_octets * 3) ) > $length )
        break;
      if ( count( $values ) == $num_octets ) {
        if ($num_octets == 3) {
          $unicode .= \'%\' . dechex($values[0]) . \'%\' . dechex($values[1]) . \'%\' . dechex($values[2]);
          $unicode_length += 9;
        } else {
          $unicode .= \'%\' . dechex($values[0]) . \'%\' . dechex($values[1]);
          $unicode_length += 6;
        }

        $values = array();
        $num_octets = 1;
      }
    }
  }

  return $unicode;
}


/**
 * Sanitizes a title, replacing whitespace and a few other characters with dashes.
 *
 * Limits the output to alphanumeric characters, underscore (_) and dash (-).
 * Whitespace becomes a dash.
 *
 * @since 1.2.0
 *
 * @param string $title The title to be sanitized.
 * @param string $raw_title Optional. Not used.
 * @param string $context Optional. The operation for which the string is sanitized.
 * @return string The sanitized title.
 */
function sanitize_title_with_dashes( $title, $raw_title = \'\', $context = \'display\' ) {
  $title = strip_tags($title);
  // Preserve escaped octets.
  $title = preg_replace(\'|%([a-fA-F0-9][a-fA-F0-9])|\', \'---$1---\', $title);
  // Remove percent signs that are not part of an octet.
  $title = str_replace(\'%\', \'\', $title);
  // Restore octets.
  $title = preg_replace(\'|---([a-fA-F0-9][a-fA-F0-9])---|\', \'%$1\', $title);

  if (seems_utf8($title)) {
    if (function_exists(\'mb_strtolower\')) {
      $title = mb_strtolower($title, \'UTF-8\');
    }
    $title = utf8_uri_encode($title, 200);
  }

  $title = strtolower($title);
  $title = preg_replace(\'/&.+?;/\', \'\', $title); // kill entities
  $title = str_replace(\'.\', \'-\', $title);

  if ( \'save\' == $context ) {
    // Convert nbsp, ndash and mdash to hyphens
    $title = str_replace( array( \'%c2%a0\', \'%e2%80%93\', \'%e2%80%94\' ), \'-\', $title );

    // Strip these characters entirely
    $title = str_replace( array(
      // iexcl and iquest
      \'%c2%a1\', \'%c2%bf\',
      // angle quotes
      \'%c2%ab\', \'%c2%bb\', \'%e2%80%b9\', \'%e2%80%ba\',
      // curly quotes
      \'%e2%80%98\', \'%e2%80%99\', \'%e2%80%9c\', \'%e2%80%9d\',
      \'%e2%80%9a\', \'%e2%80%9b\', \'%e2%80%9e\', \'%e2%80%9f\',
      // copy, reg, deg, hellip and trade
      \'%c2%a9\', \'%c2%ae\', \'%c2%b0\', \'%e2%80%a6\', \'%e2%84%a2\',
      // acute accents
      \'%c2%b4\', \'%cb%8a\', \'%cc%81\', \'%cd%81\',
      // grave accent, macron, caron
      \'%cc%80\', \'%cc%84\', \'%cc%8c\',
    ), \'\', $title );

    // Convert times to x
    $title = str_replace( \'%c3%97\', \'x\', $title );
  }

  $title = preg_replace(\'/[^%a-z0-9 _-]/\', \'\', $title);
  $title = preg_replace(\'/\\s+/\', \'-\', $title);
  $title = preg_replace(\'|-+|\', \'-\', $title);
  $title = trim($title, \'-\');

  return $title;
}

$title = \'#PFW Alexander McQueen Spring/Summer 2015\';
echo \"title -> slug: \\n\". $title .\" -> \". sanitize_title_with_dashes($title);
echo \"\\n\\n\";
$title = \'«GQ»: Elyas M\\\'Barek gehört zu Männern des Jahres\';
echo \"title -> slug: \\n\". $title .\" -> \". sanitize_title_with_dashes($title);

View online example.



回答7:

Don\'t use preg_replace for this. There\'s a php function built just for the task: strtr() http://php.net/manual/en/function.strtr.php

Taken from the comments in the above link (and I tested it myself; it works:

function normalize ($string) {
    $table = array(
        \'Š\'=>\'S\', \'š\'=>\'s\', \'Đ\'=>\'Dj\', \'đ\'=>\'dj\', \'Ž\'=>\'Z\', \'ž\'=>\'z\', \'Č\'=>\'C\', \'č\'=>\'c\', \'Ć\'=>\'C\', \'ć\'=>\'c\',
        \'À\'=>\'A\', \'Á\'=>\'A\', \'Â\'=>\'A\', \'Ã\'=>\'A\', \'Ä\'=>\'A\', \'Å\'=>\'A\', \'Æ\'=>\'A\', \'Ç\'=>\'C\', \'È\'=>\'E\', \'É\'=>\'E\',
        \'Ê\'=>\'E\', \'Ë\'=>\'E\', \'Ì\'=>\'I\', \'Í\'=>\'I\', \'Î\'=>\'I\', \'Ï\'=>\'I\', \'Ñ\'=>\'N\', \'Ò\'=>\'O\', \'Ó\'=>\'O\', \'Ô\'=>\'O\',
        \'Õ\'=>\'O\', \'Ö\'=>\'O\', \'Ø\'=>\'O\', \'Ù\'=>\'U\', \'Ú\'=>\'U\', \'Û\'=>\'U\', \'Ü\'=>\'U\', \'Ý\'=>\'Y\', \'Þ\'=>\'B\', \'ß\'=>\'Ss\',
        \'à\'=>\'a\', \'á\'=>\'a\', \'â\'=>\'a\', \'ã\'=>\'a\', \'ä\'=>\'a\', \'å\'=>\'a\', \'æ\'=>\'a\', \'ç\'=>\'c\', \'è\'=>\'e\', \'é\'=>\'e\',
        \'ê\'=>\'e\', \'ë\'=>\'e\', \'ì\'=>\'i\', \'í\'=>\'i\', \'î\'=>\'i\', \'ï\'=>\'i\', \'ð\'=>\'o\', \'ñ\'=>\'n\', \'ò\'=>\'o\', \'ó\'=>\'o\',
        \'ô\'=>\'o\', \'õ\'=>\'o\', \'ö\'=>\'o\', \'ø\'=>\'o\', \'ù\'=>\'u\', \'ú\'=>\'u\', \'û\'=>\'u\', \'ý\'=>\'y\', \'ý\'=>\'y\', \'þ\'=>\'b\',
        \'ÿ\'=>\'y\', \'Ŕ\'=>\'R\', \'ŕ\'=>\'r\',
    );

    return strtr($string, $table);
}


回答8:

I am using:

function slugify($text)
{ 
    $text = iconv(\'utf-8\', \'us-ascii//TRANSLIT\', $text);
    return strtolower(preg_replace(\'/[^A-Za-z0-9-]+/\', \'-\', $text));
}

Only fallback is that Cyrillic characters will not be converted, and I am searching now for solution that is not long str_replace for every single Cyrillic character.



回答9:

it is always good idea to use existing solutions that are being supported by a lot of high level developers. The most popular one is https://github.com/cocur/slugify . First of all it supports more than one language and it is being updated.

If you do not want to use whole package you can just copy the part that you need.



回答10:

public static function slugify ($text) {

    $replace = [
        \'&lt;\' => \'\', \'&gt;\' => \'\', \'&#039;\' => \'\', \'&amp;\' => \'\',
        \'&quot;\' => \'\', \'À\' => \'A\', \'Á\' => \'A\', \'Â\' => \'A\', \'Ã\' => \'A\', \'Ä\'=> \'Ae\',
        \'&Auml;\' => \'A\', \'Å\' => \'A\', \'Ā\' => \'A\', \'Ą\' => \'A\', \'Ă\' => \'A\', \'Æ\' => \'Ae\',
        \'Ç\' => \'C\', \'Ć\' => \'C\', \'Č\' => \'C\', \'Ĉ\' => \'C\', \'Ċ\' => \'C\', \'Ď\' => \'D\', \'Đ\' => \'D\',
        \'Ð\' => \'D\', \'È\' => \'E\', \'É\' => \'E\', \'Ê\' => \'E\', \'Ë\' => \'E\', \'Ē\' => \'E\',
        \'Ę\' => \'E\', \'Ě\' => \'E\', \'Ĕ\' => \'E\', \'Ė\' => \'E\', \'Ĝ\' => \'G\', \'Ğ\' => \'G\',
        \'Ġ\' => \'G\', \'Ģ\' => \'G\', \'Ĥ\' => \'H\', \'Ħ\' => \'H\', \'Ì\' => \'I\', \'Í\' => \'I\',
        \'Î\' => \'I\', \'Ï\' => \'I\', \'Ī\' => \'I\', \'Ĩ\' => \'I\', \'Ĭ\' => \'I\', \'Į\' => \'I\',
        \'İ\' => \'I\', \'IJ\' => \'IJ\', \'Ĵ\' => \'J\', \'Ķ\' => \'K\', \'Ł\' => \'K\', \'Ľ\' => \'K\',
        \'Ĺ\' => \'K\', \'Ļ\' => \'K\', \'Ŀ\' => \'K\', \'Ñ\' => \'N\', \'Ń\' => \'N\', \'Ň\' => \'N\',
        \'Ņ\' => \'N\', \'Ŋ\' => \'N\', \'Ò\' => \'O\', \'Ó\' => \'O\', \'Ô\' => \'O\', \'Õ\' => \'O\',
        \'Ö\' => \'Oe\', \'&Ouml;\' => \'Oe\', \'Ø\' => \'O\', \'Ō\' => \'O\', \'Ő\' => \'O\', \'Ŏ\' => \'O\',
        \'Œ\' => \'OE\', \'Ŕ\' => \'R\', \'Ř\' => \'R\', \'Ŗ\' => \'R\', \'Ś\' => \'S\', \'Š\' => \'S\',
        \'Ş\' => \'S\', \'Ŝ\' => \'S\', \'Ș\' => \'S\', \'Ť\' => \'T\', \'Ţ\' => \'T\', \'Ŧ\' => \'T\',
        \'Ț\' => \'T\', \'Ù\' => \'U\', \'Ú\' => \'U\', \'Û\' => \'U\', \'Ü\' => \'Ue\', \'Ū\' => \'U\',
        \'&Uuml;\' => \'Ue\', \'Ů\' => \'U\', \'Ű\' => \'U\', \'Ŭ\' => \'U\', \'Ũ\' => \'U\', \'Ų\' => \'U\',
        \'Ŵ\' => \'W\', \'Ý\' => \'Y\', \'Ŷ\' => \'Y\', \'Ÿ\' => \'Y\', \'Ź\' => \'Z\', \'Ž\' => \'Z\',
        \'Ż\' => \'Z\', \'Þ\' => \'T\', \'à\' => \'a\', \'á\' => \'a\', \'â\' => \'a\', \'ã\' => \'a\',
        \'ä\' => \'ae\', \'&auml;\' => \'ae\', \'å\' => \'a\', \'ā\' => \'a\', \'ą\' => \'a\', \'ă\' => \'a\',
        \'æ\' => \'ae\', \'ç\' => \'c\', \'ć\' => \'c\', \'č\' => \'c\', \'ĉ\' => \'c\', \'ċ\' => \'c\',
        \'ď\' => \'d\', \'đ\' => \'d\', \'ð\' => \'d\', \'è\' => \'e\', \'é\' => \'e\', \'ê\' => \'e\',
        \'ë\' => \'e\', \'ē\' => \'e\', \'ę\' => \'e\', \'ě\' => \'e\', \'ĕ\' => \'e\', \'ė\' => \'e\',
        \'ƒ\' => \'f\', \'ĝ\' => \'g\', \'ğ\' => \'g\', \'ġ\' => \'g\', \'ģ\' => \'g\', \'ĥ\' => \'h\',
        \'ħ\' => \'h\', \'ì\' => \'i\', \'í\' => \'i\', \'î\' => \'i\', \'ï\' => \'i\', \'ī\' => \'i\',
        \'ĩ\' => \'i\', \'ĭ\' => \'i\', \'į\' => \'i\', \'ı\' => \'i\', \'ij\' => \'ij\', \'ĵ\' => \'j\',
        \'ķ\' => \'k\', \'ĸ\' => \'k\', \'ł\' => \'l\', \'ľ\' => \'l\', \'ĺ\' => \'l\', \'ļ\' => \'l\',
        \'ŀ\' => \'l\', \'ñ\' => \'n\', \'ń\' => \'n\', \'ň\' => \'n\', \'ņ\' => \'n\', \'ʼn\' => \'n\',
        \'ŋ\' => \'n\', \'ò\' => \'o\', \'ó\' => \'o\', \'ô\' => \'o\', \'õ\' => \'o\', \'ö\' => \'oe\',
        \'&ouml;\' => \'oe\', \'ø\' => \'o\', \'ō\' => \'o\', \'ő\' => \'o\', \'ŏ\' => \'o\', \'œ\' => \'oe\',
        \'ŕ\' => \'r\', \'ř\' => \'r\', \'ŗ\' => \'r\', \'š\' => \'s\', \'ù\' => \'u\', \'ú\' => \'u\',
        \'û\' => \'u\', \'ü\' => \'ue\', \'ū\' => \'u\', \'&uuml;\' => \'ue\', \'ů\' => \'u\', \'ű\' => \'u\',
        \'ŭ\' => \'u\', \'ũ\' => \'u\', \'ų\' => \'u\', \'ŵ\' => \'w\', \'ý\' => \'y\', \'ÿ\' => \'y\',
        \'ŷ\' => \'y\', \'ž\' => \'z\', \'ż\' => \'z\', \'ź\' => \'z\', \'þ\' => \'t\', \'ß\' => \'ss\',
        \'ſ\' => \'ss\', \'ый\' => \'iy\', \'А\' => \'A\', \'Б\' => \'B\', \'В\' => \'V\', \'Г\' => \'G\',
        \'Д\' => \'D\', \'Е\' => \'E\', \'Ё\' => \'YO\', \'Ж\' => \'ZH\', \'З\' => \'Z\', \'И\' => \'I\',
        \'Й\' => \'Y\', \'К\' => \'K\', \'Л\' => \'L\', \'М\' => \'M\', \'Н\' => \'N\', \'О\' => \'O\',
        \'П\' => \'P\', \'Р\' => \'R\', \'С\' => \'S\', \'Т\' => \'T\', \'У\' => \'U\', \'Ф\' => \'F\',
        \'Х\' => \'H\', \'Ц\' => \'C\', \'Ч\' => \'CH\', \'Ш\' => \'SH\', \'Щ\' => \'SCH\', \'Ъ\' => \'\',
        \'Ы\' => \'Y\', \'Ь\' => \'\', \'Э\' => \'E\', \'Ю\' => \'YU\', \'Я\' => \'YA\', \'а\' => \'a\',
        \'б\' => \'b\', \'в\' => \'v\', \'г\' => \'g\', \'д\' => \'d\', \'е\' => \'e\', \'ё\' => \'yo\',
        \'ж\' => \'zh\', \'з\' => \'z\', \'и\' => \'i\', \'й\' => \'y\', \'к\' => \'k\', \'л\' => \'l\',
        \'м\' => \'m\', \'н\' => \'n\', \'о\' => \'o\', \'п\' => \'p\', \'р\' => \'r\', \'с\' => \'s\',
        \'т\' => \'t\', \'у\' => \'u\', \'ф\' => \'f\', \'х\' => \'h\', \'ц\' => \'c\', \'ч\' => \'ch\',
        \'ш\' => \'sh\', \'щ\' => \'sch\', \'ъ\' => \'\', \'ы\' => \'y\', \'ь\' => \'\', \'э\' => \'e\',
        \'ю\' => \'yu\', \'я\' => \'ya\'
    ];

    // make a human readable string
    $text = strtr($text, $replace);

    // replace non letter or digits by -
    $text = preg_replace(\'~[^\\\\pL\\d.]+~u\', \'-\', $text);

    // trim
    $text = trim($text, \'-\');

    // remove unwanted characters
    $text = preg_replace(\'~[^-\\w.]+~\', \'\', $text);

    $text = strtolower($text);

    return $text;
}


回答11:

You could have a look at Normalizer::normalize(), see here. It just needs to load the intl module for PHP



回答12:

What about using something that is already implemented in Core?

//Clean non UTF-8 characters    
Mage::getHelper(\'core/string\')->cleanString($str)

Or one of the core url/ url rewrite methods..



回答13:

There\'s a good solution here that deals with special characters as well.

Texto Fantástico => texto-fantastico

function slugify( $string, $separator = \'-\' ) {
    $accents_regex = \'~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i\';
    $special_cases = array( \'&\' => \'and\', \"\'\" => \'\');
    $string = mb_strtolower( trim( $string ), \'UTF-8\' );
    $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
    $string = preg_replace( $accents_regex, \'$1\', htmlentities( $string, ENT_QUOTES, \'UTF-8\' ) );
    $string = preg_replace(\"/[^a-z0-9]/u\", \"$separator\", $string);
    $string = preg_replace(\"/[$separator]+/u\", \"$separator\", $string);
    return $string;
}

Author: Natxet



回答14:

Since gTLDs and IDNs are becoming more and more used I cannot see why URL shouldn\'t contain Andrés.

Just rawurlencode $URL you want instead. Most browsers show UTF-8 characters in URLs (not some ancient IE6 maybe) and bit.ly / goo.gl can be used to make it short in cases like Russian and Arabic if need may be for ad purposes or just write them in ads like user would write them on browser URL.

Only difference is spaces \" \" it might be good idea to replace them with \"-\" and \"/\" if you don\'t want to allow those.

<?php
function slugify($url)
{
    $url = trim($url);

    $url = str_replace(\" \",\"-\",$url);
    $url = str_replace(\"/\",\"-slash-\",$url);
    $url = rawurlencode($url);
}
?>

Url as encoded http://www.hurtta.com/RU/%D0%9F%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82%D1%8B/

Url as written http://www.hurtta.com/RU/Продукты/



回答15:

I wrote this based on Maerlyn\'s response. This function will work regardless of the character encoding on the page. It also won\'t turn single quotes in to dashes :)

function slugify ($string) {
    $string = utf8_encode($string);
    $string = iconv(\'UTF-8\', \'ASCII//TRANSLIT\', $string);   
    $string = preg_replace(\'/[^a-z0-9- ]/i\', \'\', $string);
    $string = str_replace(\' \', \'-\', $string);
    $string = trim($string, \'-\');
    $string = strtolower($string);

    if (empty($string)) {
        return \'n-a\';
    }

    return $string;
}


回答16:

on my localhost everything was ok, but on server it helped me \"set_locale\" and \"utf-8\" at \"mb_strtolower\".

<?
setlocale( LC_ALL, \"en_US.UTF8\" );
function slug( $string )
{
    $string = iconv( \"utf-8\", \"us-ascii//translit//ignore\", $string ); // transliterate
    $string = str_replace( \"\'\", \"\", $string );
    $string = preg_replace( \"~[^\\pL\\d]+~u\", \"-\", $string ); // replace non letter or non digits by \"-\"
    $string = preg_replace( \"~[^-\\w]+~\", \"\", $string ); // remove unwanted characters
    $string = preg_replace( \"~-+~\", \"-\", $string ); // remove duplicate \"-\"
    $string = trim( $string, \"-\" ); // trim \"-\"
    $string = trim( $string ); // trim
    $string = mb_strtolower( $string, \"utf-8\" ); // lowercase
    $string = urlencode( $string ); // safe
    return $string;
};
?>


回答17:

The most elegant way I think is using a Behat\\Transliterator\\Transliterator.

I need to extends this class by your class because it is an Abstract, some like this:

<?php
use Behat\\Transliterator\\Transliterator;

class Urlizer extends Transliterator
{
}

And then, just use it:

$text = \"Master Ápiu\";
$urlizer = new Urlizer();
$slug = $urlizer->transliterate($slug, \"-\");
echo $slug; // master-apiu

Of course you should put this things in your composer as well.

composer require behat/transliterator

More info here https://github.com/Behat/Transliterator



回答18:

I have a working code that worked in spanish website. Please see the code in my blog

Function to generate clean url slugs from string with duplication check



回答19:

I didn\'t know which one to use so I made a quick bench on phptester.net

<?php

// First test
// https://stackoverflow.com/a/42740874/10232729
function slugify(STRING $string, STRING $separator = \'-\'){

    $accents_regex = \'~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i\';
    $special_cases = [ \'&\' => \'and\', \"\'\" => \'\'];
    $string = mb_strtolower( trim( $string ), \'UTF-8\' );
    $string = str_replace( array_keys($special_cases), array_values( $special_cases), $string );
    $string = preg_replace( $accents_regex, \'$1\', htmlentities( $string, ENT_QUOTES, \'UTF-8\' ) );
    $string = preg_replace(\'/[^a-z0-9]/u\', $separator, $string);

    return preg_replace(\'/[\'.$separator.\']+/u\', $separator, $string);
}

// Second test
// https://stackoverflow.com/a/13331948/10232729
function slug(STRING $string, STRING $separator = \'-\'){

    $string = transliterator_transliterate(\'Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();\', $string);

    return str_replace(\' \', $separator, $string);;
}

// Third test - My choice
// https://stackoverflow.com/a/38066136/10232729
function slugbis($text){

    $replace = [
        \'<\' => \'\', \'>\' => \'\', \'-\' => \' \', \'&\' => \'\',
        \'\"\' => \'\', \'À\' => \'A\', \'Á\' => \'A\', \'Â\' => \'A\', \'Ã\' => \'A\', \'Ä\'=> \'Ae\',
        \'Ä\' => \'A\', \'Å\' => \'A\', \'Ā\' => \'A\', \'Ą\' => \'A\', \'Ă\' => \'A\', \'Æ\' => \'Ae\',
        \'Ç\' => \'C\', \'Ć\' => \'C\', \'Č\' => \'C\', \'Ĉ\' => \'C\', \'Ċ\' => \'C\', \'Ď\' => \'D\', \'Đ\' => \'D\',
        \'Ð\' => \'D\', \'È\' => \'E\', \'É\' => \'E\', \'Ê\' => \'E\', \'Ë\' => \'E\', \'Ē\' => \'E\',
        \'Ę\' => \'E\', \'Ě\' => \'E\', \'Ĕ\' => \'E\', \'Ė\' => \'E\', \'Ĝ\' => \'G\', \'Ğ\' => \'G\',
        \'Ġ\' => \'G\', \'Ģ\' => \'G\', \'Ĥ\' => \'H\', \'Ħ\' => \'H\', \'Ì\' => \'I\', \'Í\' => \'I\',
        \'Î\' => \'I\', \'Ï\' => \'I\', \'Ī\' => \'I\', \'Ĩ\' => \'I\', \'Ĭ\' => \'I\', \'Į\' => \'I\',
        \'İ\' => \'I\', \'IJ\' => \'IJ\', \'Ĵ\' => \'J\', \'Ķ\' => \'K\', \'Ł\' => \'K\', \'Ľ\' => \'K\',
        \'Ĺ\' => \'K\', \'Ļ\' => \'K\', \'Ŀ\' => \'K\', \'Ñ\' => \'N\', \'Ń\' => \'N\', \'Ň\' => \'N\',
        \'Ņ\' => \'N\', \'Ŋ\' => \'N\', \'Ò\' => \'O\', \'Ó\' => \'O\', \'Ô\' => \'O\', \'Õ\' => \'O\',
        \'Ö\' => \'Oe\', \'Ö\' => \'Oe\', \'Ø\' => \'O\', \'Ō\' => \'O\', \'Ő\' => \'O\', \'Ŏ\' => \'O\',
        \'Œ\' => \'OE\', \'Ŕ\' => \'R\', \'Ř\' => \'R\', \'Ŗ\' => \'R\', \'Ś\' => \'S\', \'Š\' => \'S\',
        \'Ş\' => \'S\', \'Ŝ\' => \'S\', \'Ș\' => \'S\', \'Ť\' => \'T\', \'Ţ\' => \'T\', \'Ŧ\' => \'T\',
        \'Ț\' => \'T\', \'Ù\' => \'U\', \'Ú\' => \'U\', \'Û\' => \'U\', \'Ü\' => \'Ue\', \'Ū\' => \'U\',
        \'Ü\' => \'Ue\', \'Ů\' => \'U\', \'Ű\' => \'U\', \'Ŭ\' => \'U\', \'Ũ\' => \'U\', \'Ų\' => \'U\',
        \'Ŵ\' => \'W\', \'Ý\' => \'Y\', \'Ŷ\' => \'Y\', \'Ÿ\' => \'Y\', \'Ź\' => \'Z\', \'Ž\' => \'Z\',
        \'Ż\' => \'Z\', \'Þ\' => \'T\', \'à\' => \'a\', \'á\' => \'a\', \'â\' => \'a\', \'ã\' => \'a\',
        \'ä\' => \'ae\', \'ä\' => \'ae\', \'å\' => \'a\', \'ā\' => \'a\', \'ą\' => \'a\', \'ă\' => \'a\',
        \'æ\' => \'ae\', \'ç\' => \'c\', \'ć\' => \'c\', \'č\' => \'c\', \'ĉ\' => \'c\', \'ċ\' => \'c\',
        \'ď\' => \'d\', \'đ\' => \'d\', \'ð\' => \'d\', \'è\' => \'e\', \'é\' => \'e\', \'ê\' => \'e\',
        \'ë\' => \'e\', \'ē\' => \'e\', \'ę\' => \'e\', \'ě\' => \'e\', \'ĕ\' => \'e\', \'ė\' => \'e\',
        \'ƒ\' => \'f\', \'ĝ\' => \'g\', \'ğ\' => \'g\', \'ġ\' => \'g\', \'ģ\' => \'g\', \'ĥ\' => \'h\',
        \'ħ\' => \'h\', \'ì\' => \'i\', \'í\' => \'i\', \'î\' => \'i\', \'ï\' => \'i\', \'ī\' => \'i\',
        \'ĩ\' => \'i\', \'ĭ\' => \'i\', \'į\' => \'i\', \'ı\' => \'i\', \'ij\' => \'ij\', \'ĵ\' => \'j\',
        \'ķ\' => \'k\', \'ĸ\' => \'k\', \'ł\' => \'l\', \'ľ\' => \'l\', \'ĺ\' => \'l\', \'ļ\' => \'l\',
        \'ŀ\' => \'l\', \'ñ\' => \'n\', \'ń\' => \'n\', \'ň\' => \'n\', \'ņ\' => \'n\', \'ʼn\' => \'n\',
        \'ŋ\' => \'n\', \'ò\' => \'o\', \'ó\' => \'o\', \'ô\' => \'o\', \'õ\' => \'o\', \'ö\' => \'oe\',
        \'ö\' => \'oe\', \'ø\' => \'o\', \'ō\' => \'o\', \'ő\' => \'o\', \'ŏ\' => \'o\', \'œ\' => \'oe\',
        \'ŕ\' => \'r\', \'ř\' => \'r\', \'ŗ\' => \'r\', \'š\' => \'s\', \'ù\' => \'u\', \'ú\' => \'u\',
        \'û\' => \'u\', \'ü\' => \'ue\', \'ū\' => \'u\', \'ü\' => \'ue\', \'ů\' => \'u\', \'ű\' => \'u\',
        \'ŭ\' => \'u\', \'ũ\' => \'u\', \'ų\' => \'u\', \'ŵ\' => \'w\', \'ý\' => \'y\', \'ÿ\' => \'y\',
        \'ŷ\' => \'y\', \'ž\' => \'z\', \'ż\' => \'z\', \'ź\' => \'z\', \'þ\' => \'t\', \'ß\' => \'ss\',
        \'ſ\' => \'ss\', \'ый\' => \'iy\', \'А\' => \'A\', \'Б\' => \'B\', \'В\' => \'V\', \'Г\' => \'G\',
        \'Д\' => \'D\', \'Е\' => \'E\', \'Ё\' => \'YO\', \'Ж\' => \'ZH\', \'З\' => \'Z\', \'И\' => \'I\',
        \'Й\' => \'Y\', \'К\' => \'K\', \'Л\' => \'L\', \'М\' => \'M\', \'Н\' => \'N\', \'О\' => \'O\',
        \'П\' => \'P\', \'Р\' => \'R\', \'С\' => \'S\', \'Т\' => \'T\', \'У\' => \'U\', \'Ф\' => \'F\',
        \'Х\' => \'H\', \'Ц\' => \'C\', \'Ч\' => \'CH\', \'Ш\' => \'SH\', \'Щ\' => \'SCH\', \'Ъ\' => \'\',
        \'Ы\' => \'Y\', \'Ь\' => \'\', \'Э\' => \'E\', \'Ю\' => \'YU\', \'Я\' => \'YA\', \'а\' => \'a\',
        \'б\' => \'b\', \'в\' => \'v\', \'г\' => \'g\', \'д\' => \'d\', \'е\' => \'e\', \'ё\' => \'yo\',
        \'ж\' => \'zh\', \'з\' => \'z\', \'и\' => \'i\', \'й\' => \'y\', \'к\' => \'k\', \'л\' => \'l\',
        \'м\' => \'m\', \'н\' => \'n\', \'о\' => \'o\', \'п\' => \'p\', \'р\' => \'r\', \'с\' => \'s\',
        \'т\' => \'t\', \'у\' => \'u\', \'ф\' => \'f\', \'х\' => \'h\', \'ц\' => \'c\', \'ч\' => \'ch\',
        \'ш\' => \'sh\', \'щ\' => \'sch\', \'ъ\' => \'\', \'ы\' => \'y\', \'ь\' => \'\', \'э\' => \'e\',
        \'ю\' => \'yu\', \'я\' => \'ya\'
    ];

    // make a human readable string
    $text = strtr($text, $replace);

    // replace non letter or digits by -
    $text = preg_replace(\'~[^\\pL\\d.]+~u\', \'-\', $text);

    // trim
    $text = trim($text, \'-\');

    // remove unwanted characters
    $text = preg_replace(\'~[^-\\w.]+~\', \'\', $text);

    return strtolower($text);
}

// Fourth test
// https://stackoverflow.com/a/2955521/10232729
function slugagain($string){

    $table = [
        \'Š\'=>\'S\', \'š\'=>\'s\', \'Đ\'=>\'Dj\', \'đ\'=>\'dj\', \'Ž\'=>\'Z\', \'ž\'=>\'z\', \'Č\'=>\'C\', \'č\'=>\'c\', \'Ć\'=>\'C\', \'ć\'=>\'c\',
        \'À\'=>\'A\', \'Á\'=>\'A\', \'Â\'=>\'A\', \'Ã\'=>\'A\', \'Ä\'=>\'A\', \'Å\'=>\'A\', \'Æ\'=>\'A\', \'Ç\'=>\'C\', \'È\'=>\'E\', \'É\'=>\'E\',
        \'Ê\'=>\'E\', \'Ë\'=>\'E\', \'Ì\'=>\'I\', \'Í\'=>\'I\', \'Î\'=>\'I\', \'Ï\'=>\'I\', \'Ñ\'=>\'N\', \'Ò\'=>\'O\', \'Ó\'=>\'O\', \'Ô\'=>\'O\',
        \'Õ\'=>\'O\', \'Ö\'=>\'O\', \'Ø\'=>\'O\', \'Ù\'=>\'U\', \'Ú\'=>\'U\', \'Û\'=>\'U\', \'Ü\'=>\'U\', \'Ý\'=>\'Y\', \'Þ\'=>\'B\', \'ß\'=>\'Ss\',
        \'à\'=>\'a\', \'á\'=>\'a\', \'â\'=>\'a\', \'ã\'=>\'a\', \'ä\'=>\'a\', \'å\'=>\'a\', \'æ\'=>\'a\', \'ç\'=>\'c\', \'è\'=>\'e\', \'é\'=>\'e\',
        \'ê\'=>\'e\', \'ë\'=>\'e\', \'ì\'=>\'i\', \'í\'=>\'i\', \'î\'=>\'i\', \'ï\'=>\'i\', \'ð\'=>\'o\', \'ñ\'=>\'n\', \'ò\'=>\'o\', \'ó\'=>\'o\',
        \'ô\'=>\'o\', \'õ\'=>\'o\', \'ö\'=>\'o\', \'ø\'=>\'o\', \'ù\'=>\'u\', \'ú\'=>\'u\', \'û\'=>\'u\', \'ý\'=>\'y\', \'ý\'=>\'y\', \'þ\'=>\'b\',
        \'ÿ\'=>\'y\', \'Ŕ\'=>\'R\', \'ŕ\'=>\'r\', \' \'=>\'-\'
    ];

    return strtr($string, $table);
}

// Fifth test
// https://stackoverflow.com/a/27396804/10232729
function slugifybis($url){
    $url = trim($url);

    $url = str_replace(\' \', \'-\', $url);
    $url = str_replace(\'/\', \'-slash-\', $url);

    return rawurlencode($url);
}

// Sixth and last test
// https://stackoverflow.com/a/39442034/10232729
setlocale( LC_ALL, \"en_US.UTF8\" );  
function slugifyagain($string){

    $string = iconv(\'utf-8\', \'us-ascii//translit//ignore\', $string); // transliterate
    $string = str_replace(\"\'\", \'\', $string);
    $string = preg_replace(\'~[^\\pL\\d]+~u\', \'-\', $string); // replace non letter or non digits by \"-\"
    $string = preg_replace(\'~[^-\\w]+~\', \'\', $string); // remove unwanted characters
    $string = preg_replace(\'~-+~\', \'-\', $string); // remove duplicate \"-\"
    $string = trim($string, \'-\'); // trim \"-\"
    $string = trim($string); // trim
    $string = mb_strtolower($string, \'utf-8\'); // lowercase

    return urlencode($string); // safe;
};

$string = $newString = \"¿ Àñdréß l\'affreux ğarçon & nøël en forêt !\";

$max = 10000;

echo \'<pre>\';
echo \'Beginning :\';
echo \'<br />\';
echo \'<br />\';    
echo \'> Slugging \'.$max.\' iterations of following :\';
echo \'<br />\';
echo \'>> \' . $string;
echo \'<br />\';  
echo \'<br />\';
echo \'Output results :\';
echo \'<br />\';
echo \'<br />\';  

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){

    $newString = slugify($string);
}

$time = (microtime(true) - $start) * 1000;

echo \'> First test passed in **\' . round($time, 2) . \'ms**\';
echo \'<br />\';  
echo \'>> Result : \' . $newString;
echo \'<br />\';
echo \'<br />\';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){

    $newString = slug($string);
}

$time = (microtime(true) - $start) * 1000;

echo \'> Second test passed in **\' . round($time, 2) . \'ms**\';
echo \'<br />\';
echo \'>> Result : \' . $newString;
echo \'<br />\';
echo \'<br />\';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){

    $newString = slugbis($string);
}

$time = (microtime(true) - $start) * 1000;

echo \'> Third test passed in **\' . round($time, 2) . \'ms**\';
echo \'<br />\';
echo \'>> Result : \' . $newString;
echo \'<br />\';
echo \'<br />\';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){

    $newString = slugagain($string);
}

$time = (microtime(true) - $start) * 1000;

echo \'> Fourth test passed in **\' . round($time, 2) . \'ms**\';
echo \'<br />\';
echo \'>> Result : \' . $newString;
echo \'<br />\';
echo \'<br />\';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){

    $newString = slugifybis($string);
}

$time = (microtime(true) - $start) * 1000;

echo \'> Fifth test passed in **\' . round($time, 2) . \'ms**\';
echo \'<br />\';
echo \'>> Result : \' . $newString;
echo \'<br />\';
echo \'<br />\';

$start = microtime(true);

for($i = 0 ; $i < $max ; $i++){

    $newString = slugifyagain($string);
}

$time = (microtime(true) - $start) * 1000;

echo \'> Sixth test passed in **\' . round($time, 2) . \'ms**\';
echo \'<br />\';
echo \'>> Result : \' . $newString;
echo \'</pre>\';

Beginning :

Slugging 10000 iterations of following :

¿ Àñdréß l\'affreux ğarçon & nøël en forêt !

Output results :

First test passed in 120.78ms

Result : -iquest-andresz-laffreux-arcon-and-noel-en-foret-

Second test passed in 3883.82ms

Result : -andreß-laffreux-garcon--nøel-en-foret-

Third test passed in 56.83ms

Result : andress-l-affreux-garcon-noel-en-foret

Fourth test passed in 18.93ms

Result : ¿-AndreSs-l\'affreux-ğarcon-&-noel-en-foret-!

Fifth test passed in 6.45ms

Result : %C2%BF-%C3%80%C3%B1dr%C3%A9%C3%9F-l%27affreux-%C4%9Far%C3%A7on-%26-n%C3%B8%C3%ABl-en-for%C3%AAt-%21

Sixth test passed in 112.42ms

Result : andress-laffreux-garcon-n-el-en-foret

Further tests needed.

Edit : less iterations test

Beginning :

Slugging 100 iterations of following :

¿ Àñdréß l\'affreux ğarçon & nøël en forêt !

Output results :

First test passed in 1.72ms

Result : -iquest-andresz-laffreux-arcon-and-noel-en-foret-

Second test passed in 48.59ms

Result : -andreß-laffreux-garcon--nøel-en-foret-

Third test passed in 0.91ms

Result : andress-l-affreux-garcon-noel-en-foret

Fourth test passed in 0.3ms

Result : ¿-AndreSs-l\'affreux-ğarcon-&-noel-en-foret-!

Fifth test passed in 0.14ms

Result : %C2%BF-%C3%80%C3%B1dr%C3%A9%C3%9F-l%27affreux-%C4%9Far%C3%A7on-%26-n%C3%B8%C3%ABl-en-for%C3%AAt-%21

Sixth test passed in 1.4ms

Result : andress-laffreux-garcon-n-el-en-foret



标签: php regex string