I have a string in Rails, e.g. "This is a Twitter message. #books War & Peace by Leo Tolstoy. I love this book!", and I want to parse the text and extract only certain phrases, like "War & Peace by Leo Tolstoy".
Is this a matter of using Regex and lifting the text between "#books" to "."?
What if there's no structure to the message, like: "This is a Twitter message #books War & Peace by Leo Tolstoy I love this book!" or "This is a Twitter message. I love the book War & Peace by Leo Tolstoy #books" How can I reliably pull the phrase "War & Peace by Leo Tolstoy" without knowing the phrase ex ante.
Are there any gems, methods, etc. that can help me do this?
At the very least, what would you call what I'm trying to do? It will help me search for a solution on Google. I've tried a few searches on "parsing" with no luck.
--- edit --- based on @rogeliog suggestion, I will add the following:
I can live with the garbage text that comes after #books, but nothing before. I tried "match.(/#books.*/)" -- results here: www.rubular.com/r/gM7oSZxF5M.
But how can I capture Result #6? (e.g., when someone puts #books at the end of the sentence)?
Is there a way for me to do an if-then with regex? Something like:
if [#books is at the end of the message],
then [take the last 10 words preceding #books],
else [match.(/#books.*/)]
If you offer a regex, please post your solution via a permalink using rubular.com