I get data (HTML string) from website. I want to extract all links. I write function (it works), but it is so slow...
Can you help me to optimize it? What standard functions I can use? Function logic: find "http:.//" sting in text, and then read string (buy char) until I will not get "\"".
extension String {
subscript (i: Int) -> Character {
return self[advance(self.startIndex, i)]
}
subscript (i: Int) -> String {
return String(self[i] as Character)
}
subscript (r: Range<Int>) -> String {
return substringWithRange(Range(start: advance(startIndex, r.startIndex), end: advance(startIndex, r.endIndex)))
}}
func extractAllLinks(text:String) -> Array<String>{
var stringArray = Array<String>()
var find = "http://" as String
for (var i = countElements(find); i<countElements(text); i++)
{
var ch:Character = text[i - Int(countElements(find))]
if (ch == find[0])
{
var j = 0
while (ch == find[j])
{
var ch2:Character = find[j]
if(countElements(find)-1 == j)
{
break
}
j++
i++
ch = text[i - Int(countElements(find))]
}
i -= j
if (j == (countElements(find)-1))
{
var str = ""
for (; text[i - Int(countElements(find))] != "\""; i++)
{
str += text[i - Int(countElements(find))]
}
stringArray.append(str)
}
}
}
return stringArray}
Very helpful thread! Here's an example that worked in Swift 1.2, based on Victor Sigler's answer.
I wonder if you realise that every single time that you call countElements, a major complex function is called that has to scan all the Unicode characters in your string, and extract extended grapheme clusters from them and count them. If you don't know what an extended grapheme cluster is then you should be able to imagine that this isn't cheap and major overkill.
Just convert it to an NSString*, call rangeOfString and be done with it.
Obviously what you do is totally unsafe, because http:// doesn't mean there is a link. You can't just look for strings in html and hope it works; it doesn't. And then there is https, Http, hTtp, htTp, httP and so on and so on and so on. But that's all easy, for the real horror follow the link in Uttam Sinha's comment.
Like AdamPro13 said above using
NSDataDetector
you can easily get all the URLs, see it the following code :It outputs :
Remember to use the
guard
statement in the above case it must be inside a function or loop.I hope this help.
And that is the answer for Swift 4.0
There's actually a class called
NSDataDetector
that will detect the link for you.You can find an example of it on NSHipster here: http://nshipster.com/nsdatadetector/
As others have pointed out, you are better off using regexes, data detectors or a parsing library. However, as specific feedback on your string processing:
The key with Swift strings is to embrace the forward-only nature of them. More often than not, integer indexing and random access is not necessary. As @gnasher729 pointed out, every time you call
count
you are iterating over the string. Similarly, the integer indexing extensions are linear, so if you use them in a loop, you can easily accidentally create a quadratic or cubic-complexity algorithm.But in this case, there's no need to do all that work to convert string indices to random-access integers. Here is a version that I think is performing similar logic (look for a prefix, then look from there for a " character - ignoring that this doesn't cater for https, upper/lower case etc) using only native string indices:
Even this could be further optimized (the
advance(idx, count())
is a little inefficient) if there were other helpers such asfindFromIndex
etc. or a willingness to do without string slices and hand-roll the search for the end character.