I'm looking for a PHP preg_replace() solution find links to images and replace them with respective image tags.
Find:
<a href="http://www.domain.tld/any/valid/path/to/imagefile.ext">This will be ignored.</a>
Replace with:
<img src="http://www.domain.tld/any/valid/path/to/imagefile.ext" alt="imagefile" />
Where the protocol MUST be http://, the .ext MUST be a valid image format (.jpg, .jpeg, .gif, .png, .tif), and the base file name becomes the alt="" value.
I know preg_replace() is the right function for the job, but I suck with regex, so any help is greatly appreciated! THANKS!
I would suggest using this more flexible non-greddy regex:
And a more complex regex (including PHP test code) to hopefully please Gumbo :)
Congratulations, you are the one millionth customer to ask Stack Overflow how to parse HTML with regex!
[X][HT]ML is not a regular language and cannot reliably be parsed with regex. Use an HTML parser. PHP itself gives you DOMDocument, or you may prefer simplehtmldom.
Incidentally, you cannot tell what type a file is by looking at its URL. There is no reason a JPEG has to have ‘.jpeg’ as its extension — and indeed, no guarantee that a file with ‘.jpeg’ extension will actually be JPEG. The only way to be certain is to fetch the resource (eg. using a HEAD request) and look at the Content-Type header.
Ahh, my daily DOM practice. You should use DOM to parse HTML and regex to parse strings such as html attributes.
Note: I have some basic regexes that could surely be improved upon by some wizards :)
Note #2: Though it might be extra overhead you could use something like curl to thoroughly check if the href is an actual image by sending a HEAD request and looking at the Content-Type, but this would work in 80-90% of cases.