I have html source containing about 1000 microblogs (one tweet per line). Most of the tweets are like the below. I am using delphi memo to try to strip html marks by using Pos function and delete function but failed.
<div id='tweetText'> RT <a onmousedown="return touch(this.href,0)" href="http://twitter.com/HighfashionUK">@HighfashionUK</a> RT: Surprise goody bag up 4 grabs, Ok. <a onmousedown="return touch(this.href,0)" href="http://plixi.com/p/57846587">http://plixi.com/p/57846587</a> when we get 150</div>
I want to strip html marks and only have:
RT: Surprise goody bag up 4 grabs, Ok. http://plixi.com/p/57846587 when we get 150
How can I extract such text in delphi?
Thank you very much in advance.
Update:
Cosmin Prund is right. I mistakenly skipped a part. What I want is :
RT @HighfashionUK RT: Surprise goody bag up 4 grabs, Ok. http://plixi.com/p/57846587 when we get 150
Cosmin Prund is great.
Since all HTML markup is between
<
and>
, a routine to strip markup can be trivially written like this. Hopefully this is what you want because, as you see in my comment, there's a issue with@HighfashionUK
- your example skipped that, don't know why.