I'm working on building a small script that searches for the 5 most recent pictures tweeted by a service, isolates the URL and puts that URL into an array.
def grabTweets(linkArray) #brings in empty array
tweets = Twitter.search("[pic] "+" url.com/r/", :rpp => 2, :result_type => "recent").map do |status|
tweets = "#{status.text}" #class = string
url_regexp = /http:\/\/\w/ #isolates link
url = tweets.split.grep(url_regexp).to_s #chops off link, turns link to string from an array
#add link to url array
#print linkArray #prints []
linkArray.push(url)
print linkArray
end
end
x = []
timelineTweets = grabTweets(x)
The function is returning things like this: ["[\"http://t.co/6789\"]"]["[\"http://t.co/12345\"]"]
I'm trying to get it to return ["http://t.co/6789", "http://t.co/1245"] but it's not managing that.
Any help here would be appreciated. I'm not sure what I'm doing wrong.
The easiest way to grab URLs in Ruby is to use the URI::extract
method. It's a pre-existing wheel that works:
require 'uri'
require 'open-uri'
body = open('http://www.example.com').read
urls = URI::extract(body)
puts urls
Which returns:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
http://www.w3.org/1999/xhtml
http://www.icann.org/
mailto:iana@iana.org?subject=General%20website%20feedback
Once you have the array you can filter for what you want, or you can give it a list of schemes to extract.
To strip a url out a string and push into urls array, you can do:
urls = []
if mystring =~ /(http:\/\/[^\s]+)/
urls << $1
end
grep
returns an array:
grep(pattern) → array
grep(pattern) {| obj | block } → array
Returns an array of every element in enum for which Pattern === element
.
So your odd output is coming from the to_s
call the follows your grep
. You're probably looking for this:
linkArray += tweets.split.grep(url_regexp)
or if you only want the first URL:
url = tweets.split.grep(url_regexp).first
linkArray << url if(url)
You could also skip the split.grep
and use scan
:
# \S+ should be good enough for this sort of thing.
linkArray += tweets.scan(%r{https?://\S+})
# or
url = tweets.scan(%r{https?://\S+}).first
linkArray << url if(url)