everyone.
I'm trying to get all image urls of the current page in UIWebView.
So, here is my code.
- (void)webViewDidFinishLoad:(UIWebView*)webView {
NSString *firstImageUrl = [self.webView stringByEvaluatingJavaScriptFromString:@"var images = document.getElementsByTagName('img');images[0].src.toString();"];
NSString *imageUrls = [self.webView stringByEvaluatingJavaScriptFromString:@"var images= document.getElementsByTagName('img');var imageUrls = "";for(var i = 0; i < images.length; i++){var image = images[i];imageUrls += image.src;imageUrls += \\’,\\’;}imageUrls.toString();"];
NSLog(@"firstUrl : %@", firstImageUrl);
NSLog(@"images : %@",imageUrls);
}
1st NSLog returns correct image's src, but 2nd NSLog returns nothing.
2013-01-25 00:51:23.253 WebDemo[3416:907] firstUrl: https://www.paypalobjects.com/en_US/i/scr/pixel.gif
2013-01-25 00:51:23.254 WebDemo[3416:907] images :
I don't know why.
Please help me...
Thanks.
Perrohunter pointed out one NSRegularExpression
solution, which is great. If you don't want to enumerate the array of matches, you can use the block-based enumerateMatchesInString
method, too:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img\\s[\\s\\S]*?src\\s*?=\\s*?['\"](.*?)['\"][\\s\\S]*?>)+?"
options:NSRegularExpressionCaseInsensitive
error:&error];
[regex enumerateMatchesInString:yourHTMLSourceCodeString
options:0
range:NSMakeRange(0, [yourHTMLSourceCodeString length])
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSString *img = [yourHTMLSourceCodeString substringWithRange:[result rangeAtIndex:2]];
NSLog(@"img src %@",img);
}];
I've also updated the regex pattern to deal with the following issues:
- there can be attributes between the start
img
tag and the src
attribute;
- there can be attributes after the
src
attribute and before the >
;
- there can be newline characters in the middle of an
img
tag (the .
captures everything except newline character);
- the
src
attribute value can be quoted with '
as well as "
; and
- there can be spaces between
src
and the =
as well as between the =
and the subsequent value.
I freely recognize that reading regex patterns is painful for the uninitiated, and perhaps other solutions might make more sense (the JSON suggestion by Joris, using scanners, etc.). But if you wanted to use regex, the above pattern might cover a few more permutations of the img
tag, and enumerateMatchesInString
might be ever so slightly more efficient than matchesInString
.
I don't like regular expressions, so here's my answer without them.
The javascript indented for clarification:
// javascript to execute:
(function() {
var images=document.querySelectorAll("img");
var imageUrls=[];
[].forEach.call(images, function(el) {
imageUrls[imageUrls.length] = el.src;
});
return JSON.stringify(imageUrls);
})()
You'll notice I return a JSON string here. To read this back in Objective-C:
NSString *imageURLString = [self.webview stringByEvaluatingJavaScriptFromString:@"(function() {var images=document.querySelectorAll(\"img\");var imageUrls=[];[].forEach.call(images, function(el) { imageUrls[imageUrls.length] = el.src;}); return JSON.stringify(imageUrls);})()"];
// parse json back into an array
NSError *jsonError = nil;
NSArray *urls = [NSJSONSerialization JSONObjectWithData:[imageURLString dataUsingEncoding:NSUTF8StringEncoding] options:0 error:&jsonError];
if (!urls) {
NSLog(@"JSON error: %@", jsonError);
return;
}
NSLog(@"Images : %@", urls);
You could achieve this running a regex on the loaded webview html source code
NSString *yourHTMLSourceCodeString = [webView stringByEvaluatingJavaScriptFromString:@"document.body.innerHTML"];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(<img src=\"(.*?)\">)+?"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray *matches = [regex matchesInString:yourHTMLSourceCodeString
options:0
range:NSMakeRange(0, [yourHTMLSourceCodeString length])];
NSLog(@"total matches %d",[matches count]);
for (NSTextCheckingResult *match in matches) {
NSString *img = [yourHTMLSourceCodeString substringWithRange:[match rangeAtIndex:2]] ;
NSLog(@"img src %@",img);
}
This is a pretty basic regex that matches anything inside a tag, it would need more details if your images have more attributes such as class or id's
With given html, you can use SwiftSoup library. Using Swift 3
do {
let doc: Document = try SwiftSoup.parse(html)
let srcs: Elements = try doc.select("img[src]")
let srcsStringArray: [String?] = srcs.array().map { try? $0.attr("src").description }
// do something with srcsStringArray
} catch Exception.Error(_, let message) {
print(message)
} catch {
print("error")
}