URL Routing: Handling Spaces and Illegal Character

2019-03-08 12:21发布

I've seen a lot of discussion on URL Routing, and LOTS of great suggestions... but in the real world, one thing I haven't seen discussed are:

  1. Creating Friendly URLs with Spaces and illegal characters
  2. Querying the DB

Say you're building a Medical site, which has Articles with a Category and optional Subcategory. (1 to many). ( Could've used any example, but the medical field has lots of long words)


Example Categories/Sub/Article Structure:

  1. Your General Health (Category)
    • Natural Health (Subcategory)
      1. Your body's immune system and why it needs help. (Article)
      2. Are plants and herbs really the solution?
      3. Should I eat fortified foods?
    • Homeopathic Medicine
      1. What's homeopathic medicine?
    • Healthy Eating
      1. Should you drink 10 cups of coffee per day?
      2. Are Organic Vegetables worth it?
      3. Is Burger King® evil?
      4. Is "French café" or American coffee healthier?
  2. Diseases & Conditions (Category)
    • Auto-Immune Disorders (Subcategory)
      1. The #1 killer of people is some disease
      2. How to get help
    • Genetic Conditions
      1. Preventing Spina Bifida before pregnancy.
      2. Are you predisposed to live a long time?
  3. Dr. FooBar's personal suggestions (Category)
    1. My thoughts on Herbal medicine & natural remedies (Article - no subcategory)
    2. Why should you care about your health?
    3. It IS possible to eat right and have a good diet.
    4. Has bloodless surgery come of age?

In a structure like this, you're going to have some LOOONG URLs if you go: /{Category}/{subcategory}/{Article Title}

In addition, there are numerous illegal characters, like # ! ? ' é " etc.

SO, the QUESTION(S) ARE:

  1. How would you handle illegal characters and Spaces? (Pros and Cons?)
  2. Would you handle getting this from the Database
    • In other words, would you trust the DB to find the Item, passing the title, or pull all the titles and find the key in code to get the key to pass to the Database (two calls to the database)?

note: I always see nice pretty examples like /products/beverages/Short-Product-Name/ how about handling some ugly examples ^_^

11条回答
smile是对你的礼貌
2楼-- · 2019-03-08 12:48

When cleaning URLs, here's a method I'm using to replace accented characters:

private static string anglicized(this string urlpart) {
        string before = "àÀâÂäÄáÁéÉèÈêÊëËìÌîÎïÏòÒôÔöÖùÙûÛüÜçÇ’ñ";
        string  after = "aAaAaAaAeEeEeEeEiIiIiIoOoOoOuUuUuUcC'n";

        string cleaned = urlpart;

        for (int i = 0; i < avantConversion.Length; i++ ) {

            cleaned = Regex.Replace(urlpart, before[i].ToString(), after[i].ToString());
        }

        return cleaned;

        // Here's some for Spanish : ÁÉÍÑÓÚÜ¡¿áéíñóúü"

}

Don't know if it's the most efficient Regex, but it is certainly effective. It's an extension method so to call it you simply put the method in a Static Class and do somthing like this:

string articleTitle = "My Article about café and the letters àâäá";
string cleaned = articleTitle.anglicized();

// replace spaces with dashes
cleaned = Regex.Replace( cleaned, "[^A-Za-z0-9- ]", "");

// strip all illegal characters like punctuation
cleaned = Regex.Replace( cleaned, " +", "-").ToLower();

// returns "my-article-about-cafe-and-the-letters-aaaa"

Of course, you could combine it into one method called "CleanUrl" or something but that's up to you.

查看更多
地球回转人心会变
3楼-- · 2019-03-08 12:48

I solved this problem by adding an additional column in the database (e.g: UrlTitle alongside the Title column) and saving a title stripped of all illegal characters with '&' symbols replaced with 'and', and spaces replaced by underscores. Then you can lookup via the UrlTitle and use the real one in the page title or wherever.

查看更多
来,给爷笑一个
4楼-- · 2019-03-08 12:50

Solution 2 would be my recommendation. I'm not the worlds biggest SEO expert, but I believe it's pretty much the 'standard' way to get good rankings anyway.

查看更多
对你真心纯属浪费
5楼-- · 2019-03-08 12:53

In case anyone is interested. This is the route (oooh... punny) I'm taking:

Route r = new Route("{country}/{lang}/Article/{id}/{title}/", new NFRouteHandler("OneArticle"));
Route r2 = new Route("{country}/{lang}/Section/{id}-{subid}/{title}/", new NFRouteHandler("ArticlesInSubcategory"));
Route r3 = new Route("{country}/{lang}/Section/{id}/{title}/", new NFRouteHandler("ArticlesByCategory"));

This offers me the ability to do urls like so:

  • site.com/ca/en/Article/123/my-life-and-health
  • site.com/ca/en/Section/12-3/Health-Issues
  • site.com/ca/en/Section/12/
查看更多
淡お忘
6楼-- · 2019-03-08 12:59

As a client user, not a Web designer, I find Firefox sometimes breaks the URL when it tries to replace "illegal" characters with usable ones. For example, FF replaces ~ with %7E. That never loads for me. I can't understand why the HTML editors and browsers don't simply agree not to accept characters other than A-Z and 0-9. If certain scripts need %, ?, and such, change the scripting applications so they will work with alpha numeric.

查看更多
登录 后发表回答