How do I strip a URL from a string and place it an

2019-06-14 14:05发布

I'm working on building a small script that searches for the 5 most recent pictures tweeted by a service, isolates the URL and puts that URL into an array.

def grabTweets(linkArray) #brings in empty array
  tweets = Twitter.search("[pic] "+" url.com/r/", :rpp => 2, :result_type => "recent").map do |status|
  tweets = "#{status.text}" #class = string

  url_regexp = /http:\/\/\w/ #isolates link
  url = tweets.split.grep(url_regexp).to_s #chops off link, turns link to string from an array

  #add link to url array
  #print linkArray #prints []

  linkArray.push(url)
  print linkArray

  end
end

x = []
timelineTweets = grabTweets(x)

The function is returning things like this: ["[\"http://t.co/6789\"]"]["[\"http://t.co/12345\"]"]

I'm trying to get it to return ["http://t.co/6789", "http://t.co/1245"] but it's not managing that.

Any help here would be appreciated. I'm not sure what I'm doing wrong.

3条回答
倾城 Initia
2楼-- · 2019-06-14 14:47

To strip a url out a string and push into urls array, you can do:

urls = []
if mystring =~ /(http:\/\/[^\s]+)/
  urls << $1
end
查看更多
【Aperson】
3楼-- · 2019-06-14 14:50

grep returns an array:

grep(pattern) → array
grep(pattern) {| obj | block } → array

Returns an array of every element in enum for which Pattern === element.

So your odd output is coming from the to_s call the follows your grep. You're probably looking for this:

linkArray += tweets.split.grep(url_regexp)

or if you only want the first URL:

url = tweets.split.grep(url_regexp).first
linkArray << url if(url)

You could also skip the split.grep and use scan:

# \S+ should be good enough for this sort of thing.
linkArray += tweets.scan(%r{https?://\S+})
# or
url = tweets.scan(%r{https?://\S+}).first
linkArray << url if(url)
查看更多
孤傲高冷的网名
4楼-- · 2019-06-14 14:51

The easiest way to grab URLs in Ruby is to use the URI::extract method. It's a pre-existing wheel that works:

require 'uri'
require 'open-uri'

body = open('http://www.example.com').read

urls = URI::extract(body)
puts urls

Which returns:

http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
http://www.w3.org/1999/xhtml
http://www.icann.org/
mailto:iana@iana.org?subject=General%20website%20feedback

Once you have the array you can filter for what you want, or you can give it a list of schemes to extract.

查看更多
登录 后发表回答