How to replace underscore to dash with Nginx

2019-01-08 01:20发布

问题:

I'm using Nginx for the first time ever, and got basically no knowledge of it.

I need to replace "_" with "-" in 100+ URL. I figured there must be an easy way to do this with Nginx, but can't find anything on Google.

Thanks!

Edit :

My url are for example : http://www.mywebsite.com/this_category/page1.php

I need this to become : http://www.mywebsite.com/this-category/page1.php

回答1:

Both existing answers to this question from 2013-04 and 2015 are rather suboptimal and ugly — one relies on too much copy-paste and has unclear error handling/reporting, and another one involves having an undefined number of unnecessary 301 Moved interactions for the client to handle.

There's a better way, hidden in plain sight at a QA pair from 2013-02 — only a couple of months prior to this very question from 2013-04! It involves relying on the last parameter for the http://nginx.org/r/rewrite directive, which will cause nginx to stop processing the rewrite directives should one with last result in a match, and go back in search of an appropriate "new" location per the modified $uri, causing an internal redirect loop within nginx for up to 10 times (e.g., 10 internal redirects, as per http://nginx.org/r/internal), recording a 500 Internal Server Error if you exceed the limit of 10 cycles.

In a sense, this answer is similar to the original one, it's just that you get an extra factor of 10 for free, resulting in fewer copy-paste requirements.

# Replace maximum of 3 or 1 underscores per internal redirect,
# produce 500 Internal Server Error after 10 internal redirects, 
# supporting at least 28 underscores (9*3 + 1*1) and at most 30 (10*3).
location ~ _ {
    rewrite "^([^_]*)_([^_]*)_([^_]*)_(.*)$" $1-$2-$3-$4 last;
    rewrite "^([^_]*)_(.+)$" $1-$2 last;
    return 301 $uri;
}


回答2:

No, there's not an easy way to do this, but the rewrite engine can nonetheless be coerced into doing it, assuming you can put a reasonable cap on the number of dashes you need to convert in a single url (or even if you don't, see the end of the answer.)

Here's how I'd do it (tested code):

rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7-$8-$9;
rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5;
rewrite ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3;
rewrite ^([^_]*)_(.*)$ $1-$2;

The four rewrites respectively translate the first 8, 4, 2, and 1 underscores in the url to dashes. The number of underscores in each rule are decreasing powers of 2 on purpose. This block is the most efficient set of rules that will translate from 0 up to 15 occurrences of underscore in a single url, using all 16 combinations of either matching or not matching each individual rule.

You will also notice that I used [^_]* on every group except the last one, in every rule. This avoids having the regexp engine perform unneeded backtracking in the case of non matches. Basically, having nine universal stars .* in a regexp causes O(n9) complexity (which is quite bad) in the "worst case", which is a non match, which would actually be your most frequent case. (I can recommend this book for those who wish to really understand how a regexp is actually executed by the underlying library.)

For this reason, if you can put a smaller limit on the number of dashes than 15, I would recommend taking away the first rule, or the first two. The last three rules alone will translate up to 7 underscores; the last two will translate up to 3.

Finally, you didn't mention redirecting the user to the new url. (As opposed to just serving the content both at the underscored url and at the correct one, which is usually frowned upon by the search engine nuts. Just FYI.) If that's what you need, you will have to put those rewrites into a special location that is triggered on the presence of an underscore in the url, and that redirects the user to the new url at the end of the four rewrites:

location ~ _ {
  rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6-$7-$8-$9;
  rewrite ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5;
  rewrite ^([^_]*)_([^_]*)_(.*)$ $1-$2-$3;
  rewrite ^([^_]*)_(.*)$ $1-$2;
  rewrite ^ $uri permanent;
}

This also adds the benefit of traslating an unlimited number of underscores in a single url, at the expense of more that one redirect to the user's browser.

HTH ;-P



回答3:

This comes past due time, but I must specify that the answer above needs to be corrected as the use of n different numbers of rewirtes where n is the amount of underscores present in the URL is totally unnecesary. This problem can be solved using 3 different location directives and rewrite rules while concidering the following scenarios in their regular expresion:

  1. There are one or more underscores at the END of the url.
  2. There are one or more underscores at the START of the url
  3. There are one or more undersocres at the MIDDLE of the url

            location ~*^/(?<t1>\_+)(?<t2>[a-zA-Z0-9\-]*)$ { 
            return 301 $scheme://$host/-$t2; 
            }
    
            location ~*^/(?<t2>[a-zA-Z\_0-9\-]*)(?<t1>\_+)$ { 
            return 301 $scheme://$host/$t2-; 
            }
    
            location ~*^/(?<t2>[a-zA-Z0-9\-]*)(?<t1>\_+)(?<t3>[a-zA-Z0-9\-]*)$ { 
            return 301 $scheme://$host/$t2-$t3; 
            }
    

This three directives will recursively replace all the underscores with '-' untill none are left

-BeWilled