How does Stack Overflow generate its SEO-friendly

2018-12-31 06:39发布

What is a good complete regular expression or some other process that would take the title:

How do you change a title to be part of the URL like Stack Overflow?

and turn it into

how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow

that is used in the SEO-friendly URLs on Stack Overflow?

The development environment I am using is Ruby on Rails, but if there are some other platform-specific solutions (.NET, PHP, Django), I would love to see those too.

I am sure I (or another reader) will come across the same problem on a different platform down the line.

I am using custom routes, and I mainly want to know how to alter the string to all special characters are removed, it's all lowercase, and all whitespace is replaced.

20条回答
梦寄多情
2楼-- · 2018-12-31 07:22

The stackoverflow solution is great, but modern browser (excluding IE, as usual) now handle nicely utf8 encoding:

enter image description here

So I upgraded the proposed solution:

public static string ToFriendlyUrl(string title, bool useUTF8Encoding = false)
{
    ...

        else if (c >= 128)
        {
            int prevlen = sb.Length;
            if (useUTF8Encoding )
            {
                sb.Append(HttpUtility.UrlEncode(c.ToString(CultureInfo.InvariantCulture),Encoding.UTF8));
            }
            else
            {
                sb.Append(RemapInternationalCharToAscii(c));
            }
    ...
}

Full Code on Pastebin

Edit: Here's the code for RemapInternationalCharToAscii method (that's missing in the pastebin).

查看更多
千与千寻千般痛.
3楼-- · 2018-12-31 07:25

I don't much about Ruby or Rails, but in Perl, this is what I would do:

my $title = "How do you change a title to be part of the url like Stackoverflow?";

my $url = lc $title;   # Change to lower case and copy to URL.
$url =~ s/^\s+//g;     # Remove leading spaces.
$url =~ s/\s+$//g;     # Remove trailing spaces.
$url =~ s/\s+/\-/g;    # Change one or more spaces to single hyphen.
$url =~ s/[^\w\-]//g;  # Remove any non-word characters.

print "$title\n$url\n";

I just did a quick test and it seems to work. Hopefully this is relatively easy to translate to Ruby.

查看更多
登录 后发表回答