Improving regex for parsing YouTube / Vimeo URLs

2019-01-22 03:50发布

I've made a function (in JavaScript) that takes an URL from either YouTube or Vimeo. It figures out the provider and ID for that particular video (demo: http://jsfiddle.net/csjwf/).

function parseVideoURL(url) {

    var provider = url.match(/http:\/\/(:?www.)?(\w*)/)[2],
        id;

    if(provider == "youtube") {

        id = url.match(/http:\/\/(?:www.)?(\w*).com\/.*v=(\w*)/)[2];
    } else if (provider == "vimeo") {

        id = url.match(/http:\/\/(?:www.)?(\w*).com\/(\d*)/)[2];
    } else {
        throw new Error("parseVideoURL() takes a YouTube or Vimeo URL");    
    }
    return {
        provider : provider,
        id : id
    }
}

It works, however as a regex Novice, I'm looking for ways to improve it. The input I'm dealing with, typically looks like this:

http://vimeo.com/(id)
http://youtube.com/watch?v=(id)&blahblahblah.....

1) Right now I'm doing three separate matches, would it make sense to try and do everything in one single expression? If so, how?

2) Could the existing matches be more concise? Are they unnecessarily complex? or perhaps insufficient?

3) Are there any YouTube or Vimeo URL's that would fail being parsed? I've tried quite a few and so far it seems to work pretty well.

To summarize: I'm simply looking for ways improve the above function. Any advice is greatly appreciated.

8条回答
▲ chillily
2楼-- · 2019-01-22 04:42

about sawa's answer :

a little update on the second regex :

/http:\/\/(?:www\.)?(vimeo|youtube)\.com\/(?:watch\?v=)?(.*?)(?:\z|$|&)/

(escaping the dots prevents from matching url of type www_vimeo_com/… and $ added…)

here is the same idea for matching the embed urls :

/http:\/\/(?:www\.|player\.)?(vimeo|youtube)\.com\/(?:embed\/|video\/)?(.*?)(?:\z|$|\?)/
查看更多
3楼-- · 2019-01-22 04:43

Regex is wonderfully terse but can quickly get complicated.

http://jsfiddle.net/8nagx2sk/

function parseYouTube(str) {
    // link : //youtube.com/watch?v=Bo_deCOd1HU
    // share : //youtu.be/Bo_deCOd1HU
    // embed : //youtube.com/embed/Bo_deCOd1HU

    var re = /\/\/(?:www\.)?youtu(?:\.be|be\.com)\/(?:watch\?v=|embed\/)?([a-z0-9_\-]+)/i; 
    var matches = re.exec(str);
    return matches && matches[1];
}

function parseVimeo(str) {
    // embed & link: http://vimeo.com/86164897

    var re = /\/\/(?:www\.)?vimeo.com\/([0-9a-z\-_]+)/i;
    var matches = re.exec(str);
    return matches && matches[1];
}

Sometimes simple code is nicer to your fellow developers.

https://jsfiddle.net/1dzb5ag1/

// protocol and www neutral
function getVideoId(url, prefixes) {
  var cleaned = url.replace(/^(https?:)?\/\/(www\.)?/, '');
  for(var i = 0; i < prefixes.length; i++) {
    if (cleaned.indexOf(prefixes[i]) === 0)
      return cleaned.substr(prefixes[i].length)
  }
  return undefined;
}

function getYouTubeId(url) {
  return getVideoId(url, [
    'youtube.com/watch?v=',
    'youtu.be/',
    'youtube.com/embed/',
    'youtube.googleapis.com/v/'
  ]);
}

function getVimeoId(url) {
  return getVideoId(url, [
    'vimeo.com/',
    'player.vimeo.com/'
  ]);
}

Which do you prefer to update?

查看更多
登录 后发表回答