可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm working on building a small script that searches for the 5 most recent pictures tweeted by a service, isolates the URL and puts that URL into an array.

def grabTweets(linkArray) #brings in empty array
  tweets = Twitter.search("[pic] "+" url.com/r/", :rpp => 2, :result_type => "recent").map do |status|
  tweets = "#{status.text}" #class = string

  url_regexp = /http:\/\/\w/ #isolates link
  url = tweets.split.grep(url_regexp).to_s #chops off link, turns link to string from an array

  #add link to url array
  #print linkArray #prints []

  linkArray.push(url)
  print linkArray

  end
end

x = []
timelineTweets = grabTweets(x)

The function is returning things like this: ["[\"http://t.co/6789\"]"]["[\"http://t.co/12345\"]"]

I'm trying to get it to return ["http://t.co/6789", "http://t.co/1245"] but it's not managing that.

Any help here would be appreciated. I'm not sure what I'm doing wrong.

回答1:

The easiest way to grab URLs in Ruby is to use the URI::extract method. It's a pre-existing wheel that works:

require 'uri'
require 'open-uri'

body = open('http://www.example.com').read

urls = URI::extract(body)
puts urls

Which returns:

http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
http://www.w3.org/1999/xhtml
http://www.icann.org/
mailto:iana@iana.org?subject=General%20website%20feedback

Once you have the array you can filter for what you want, or you can give it a list of schemes to extract.

回答2:

To strip a url out a string and push into urls array, you can do:

urls = []
if mystring =~ /(http:\/\/[^\s]+)/
  urls << $1
end

回答3:

grep returns an array:

grep(pattern) → array
grep(pattern) {| obj | block } → array

Returns an array of every element in enum for which Pattern === element.

So your odd output is coming from the to_s call the follows your grep. You're probably looking for this:

linkArray += tweets.split.grep(url_regexp)

or if you only want the first URL:

url = tweets.split.grep(url_regexp).first
linkArray << url if(url)

You could also skip the split.grep and use scan:

# \S+ should be good enough for this sort of thing.
linkArray += tweets.scan(%r{https?://\S+})
# or
url = tweets.scan(%r{https?://\S+}).first
linkArray << url if(url)