I'm asking for help because I really spent hours (more than 5) to look for an answer online and can't find the proper solution.
My project requires that I scrap the titles of a external web pages, but sometimes these pages are coded in iso-8859-1.
As the scrapped titles are displayed in my page code in utf-8, I get � instead of characters such as é, à, ê, ô ...
So I must find a way to sometimes convert the titles from iso-8859-1 to utf-8. Can you help me?
I'm scripting with Google Scripts, e.g. I write code in JavaScript to enhance a Google spreadsheet using the API provided.
To scrap the external web pages, I use this code :
var result = UrlFetchApp.fetch( url );
var wholePage = result.getContentText();
var scrap = wholePage.match( /<title>(.*?)<\/title>/ );
var title = scrap[1];
It works perfectly if the scrapped page is coded in utf-8 but not for this url (as an example): http://www.lexpress.fr/actualite/medias/cannes-pierre-lescure-et-jerome-clement-pressentis-pour-succeder-a-gilles-jacob_1254608.html
This is the result I get on this example :
Cannes: Pierre Lescure et J�r�me Cl�ment pressentis pour succ�der � Gilles Jacob - L'EXPRESS
(yes, I'm French).
Can someone help me on this? I'll be really grateful. I tried to give as much information as I could, since many other questions related to encoding issues on StackOverflow are said to miss the real context. Tell me if you need more, I'll answer quickly.