Parsing youtube url

I've written a ruby youtube url parser. It's designed to take an input of a youtube url of one of the following structures (these are currently the youtube url structures that I could find, maybe there's more?):

http://youtu.be/sGE4HMvDe-Q
http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&#038;fs=1

The aim is to save just the id of the clip or playlist so that it can be embedded, so if it's a clip: 'sGE4HMvDe-Q', or if it's a playlist: 'p/A0C3C1D163BE880A'

The parser I wrote works fine for these urls, but seems a bit brittle and long-winded, I'm just wondering if someone could suggest a nicer ruby approach to this problem?

def parse_youtube
    a = url.split('//').last.split('/')
    b = a.last.split('watch?v=').last.split('?').first.split('&').first
    if a[1] == 'p'
        url = "p/#{b}"
    else
        url = b
    end
end

标签： ruby-on-rails ruby url

3条回答

家丑人穷心不美

2楼-- · 2019-03-11 09:46

Using the Addressable gem, you can save yourself some work. There's also a URI module in stdlib, but Addressable is more powerful.

require 'addressable/uri'

uri = Addressable::URI.parse(youtube_url)
if uri.path == "/watch"
  uri.query_values["v"] if uri.query_values
else
  uri.path
end

EDIT | Removed madness. Didn't notice Addressable provides #query_values already.

0人赞添加讨论(0) 举报

聊天终结者

3楼-- · 2019-03-11 09:58

require 'uri'
require 'cgi'

urls = %w[http://youtu.be/sGE4HMvDe-Q 
          http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
          http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1]

def parse_youtube url
  u = URI.parse url
  if u.path =~ /watch/
    p CGI::parse(u.query)["v"].first
  else
    p u.path
  end
end

urls.each { |url| parse_youtube url }
#=> "/sGE4HMvDe-Q"
#=> "Lp7E973zozc"
#=> "/p/A0C3C1D163BE880A"

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

4楼-- · 2019-03-11 10:00

def parse_youtube url
   regex = /(?:.be\/|\/watch\?v=|\/(?=p\/))([\w\/\-]+)/
   url.match(regex)[1]
end

urls = %w[http://youtu.be/sGE4HMvDe-Q 
          http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
          http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1]

urls.each {|url| puts parse_youtube url }
# sGE4HMvDe-Q
# Lp7E973zozc
# p/A0C3C1D163BE880A

Depending on how you use this, you might want a better validation that the URL is indeed from youtube.

UPDATE:

Coming back to this a few years later. I've always been annoyed by how sloppy the original answer was. Since the validity of the Youtube domain wasn't validated anyway, I've removed some of the slop.

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    .                        any character except \n
--------------------------------------------------------------------------------
    be                       'be'
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    watch                    'watch'
--------------------------------------------------------------------------------
    \?                       '?'
--------------------------------------------------------------------------------
    v=                       'v='
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \/                       '/'
--------------------------------------------------------------------------------
    (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
      p                        'p'
--------------------------------------------------------------------------------
      \/                       '/'
--------------------------------------------------------------------------------
    )                        end of look-ahead
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [\w\/\-]+                any character of: word characters (a-z,
                             A-Z, 0-9, _), '\/', '\-' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1

0人赞添加讨论(0) 举报

Parsing youtube url

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间