Swift iOS Cache WKWebView content for offline view

2020-01-30 08:16发布

问题:

We're trying to save the content (HTML) of WKWebView in a persistent storage (NSUserDefaults, CoreData or disk file). The user can see the same content when he re-enters the application with no internet connection. WKWebView doesn't use NSURLProtocol like UIWebView (see post here).

Although I have seen posts that "The offline application cache is not enabled in WKWebView." (Apple dev forums), I know that a solution exists.

I've learned of two possibilities, but I couldn't make them work:

1) If I open a website in Safari for Mac and select File >> Save As, it will appear the following option in the image below. For Mac apps exists [[[webView mainFrame] dataSource] webArchive], but on UIWebView or WKWebView there is no such API. But if I load a .webarchive file in Xcode on WKWebView (like the one I obtained from Mac Safari), then the content is displayed correctly (html, external images, video previews) if there is no internet connection. The .webarchive file is actually a plist (property list). I tried to use a mac framework that creates a .webarchive file, but it was incomplete.

2) I obtanined the HTML in webView:didFinishNavigation but it doesn't save external images, css, javascript

 func webView(webView: WKWebView, didFinishNavigation navigation: WKNavigation!) {

    webView.evaluateJavaScript("document.documentElement.outerHTML.toString()",
        completionHandler: { (html: AnyObject?, error: NSError?) in
            print(html)
    })
}

We're struggling over a week and it is a main feature for us. Any idea is really appreciated.

Thank you!

回答1:

I know I'm late, but I have recently been looking for a way to store web pages for offline reading, and still could't find any reliable solution that wouldn't depend on the page itself and wouldn't use the deprecated UIWebView. A lot of people write that one should use the existing HTTP caching, but WebKit seems to do a lot of stuff out-of-process, making it virtually impossible to enforce complete caching (see here or here). However, this question guided me into the right direction. Tinkering with the web archive approach, I found that it's actually quite easy to write your own web archive exporter.

As written in the question, web archives are just plist files, so all it takes is a crawler that extracts the required resources from the HTML page, downloads them all and stores them in a big plist file. This archive file can then later be loaded into the WKWebView via loadFileURL(URL:allowingReadAccessTo:).

I created a demo app that allows archiving from and restoring to a WKWebView using this approach: https://github.com/ernesto-elsaesser/OfflineWebView

The implementation only depends on Fuzi for HTML parsing.



回答2:

I would recommend investigating the feasibility of using App Cache, which is now supported in WKWebView as of iOS 10: https://stackoverflow.com/a/44333359/233602



回答3:

I'm not sure if you just want to cache the pages that have already been visited or if you have specific requests that you'd like to cache. I'm currently working on the latter. So I'll speak to that. My urls are dynamically generated from an api request. From this response I set requestPaths with the non-image urls and then make a request for each of the urls and cache the response. For the image urls, I used the Kingfisher library to cache the images. I've already set up my shared cache urlCache = URLCache.shared in my AppDelegate. And allotted the memory I need: urlCache = URLCache(memoryCapacity: <setForYourNeeds>, diskCapacity: <setForYourNeeds>, diskPath: "urlCache") Then just call startRequest(:_) for each of the urls in requestPaths. (Can be done in the background if it's not needed right away)

class URLCacheManager {

static let timeout: TimeInterval = 120
static var requestPaths = [String]()

class func startRequest(for url: URL, completionWithErrorCallback: @escaping (_ error: Error?) -> Void) {

    let urlRequest = URLRequest(url: url, cachePolicy: .returnCacheDataElseLoad, timeoutInterval: timeout)

    WebService.sendCachingRequest(for: urlRequest) { (response) in

        if let error = response.error {
            DDLogError("Error: \(error.localizedDescription) from cache response url: \(String(describing: response.request?.url))")
        }
        else if let _ = response.data,
            let _ = response.response,
            let request = response.request,
            response.error == nil {

            guard let cacheResponse = urlCache.cachedResponse(for: request) else { return }

            urlCache.storeCachedResponse(cacheResponse, for: request)
        }
    }
}
class func startCachingImageURLs(_ urls: [URL]) {

    let imageURLs = urls.filter { $0.pathExtension.contains("png") }

    let prefetcher = ImagePrefetcher.init(urls: imageURLs, options: nil, progressBlock: nil, completionHandler: { (skipped, failed, completed) in
        DDLogError("Skipped resources: \(skipped.count)\nFailed: \(failed.count)\nCompleted: \(completed.count)")
    })

    prefetcher.start()
}

class func startCachingPageURLs(_ urls: [URL]) {
    let pageURLs = urls.filter { !$0.pathExtension.contains("png") }

    for url in pageURLs {

        DispatchQueue.main.async {
            startRequest(for: url, completionWithErrorCallback: { (error) in

                if let error = error {
                    DDLogError("There was an error while caching request: \(url) - \(error.localizedDescription)")
                }

            })
        }
    }
}
}

I'm using Alamofire for the network request with a cachingSessionManager configured with the appropriate headers. So in my WebService class I have:

typealias URLResponseHandler = ((DataResponse<Data>) -> Void)

static let cachingSessionManager: SessionManager = {

        let configuration = URLSessionConfiguration.default
        configuration.httpAdditionalHeaders = cachingHeader
        configuration.urlCache = urlCache

        let cachingSessionManager = SessionManager(configuration: configuration)
        return cachingSessionManager
    }()

    private static let cachingHeader: HTTPHeaders = {

        var headers = SessionManager.defaultHTTPHeaders
        headers["Accept"] = "text/html" 
        headers["Authorization"] = <token>
        return headers
    }()

@discardableResult
static func sendCachingRequest(for request: URLRequest, completion: @escaping URLResponseHandler) -> DataRequest {

    let completionHandler: (DataResponse<Data>) -> Void = { response in
        completion(response)
    }

    let dataRequest = cachingSessionManager.request(request).responseData(completionHandler: completionHandler)

    return dataRequest
}

Then in the webview delegate method I load the cachedResponse. I use a variable handlingCacheRequest to avoid an infinite loop.

func webView(_ webView: WKWebView, decidePolicyFor navigationAction: WKNavigationAction, decisionHandler: @escaping (WKNavigationActionPolicy) -> Void) {

    if let reach = reach {

        if !reach.isReachable(), !handlingCacheRequest {

            var request = navigationAction.request
            guard let url = request.url else {

                decisionHandler(.cancel)
                return
            }

            request.cachePolicy = .returnCacheDataDontLoad

           guard let cachedResponse = urlCache.cachedResponse(for: request),
                let htmlString = String(data: cachedResponse.data, encoding: .utf8),
                cacheComplete else {
                    showNetworkUnavailableAlert()
                    decisionHandler(.allow)
                    handlingCacheRequest = false
                    return
            }

            modify(htmlString, completedModification: { modifiedHTML in

                self.handlingCacheRequest = true
                webView.loadHTMLString(modifiedHTML, baseURL: url)
            })

            decisionHandler(.cancel)
            return
    }

    handlingCacheRequest = false
    DDLogInfo("Currently requesting url: \(String(describing: navigationAction.request.url))")
    decisionHandler(.allow)
}

Of course you'll want to handle it if there is a loading error as well.

func webView(_ webView: WKWebView, didFail navigation: WKNavigation!, withError error: Error) {

    DDLogError("Request failed with error \(error.localizedDescription)")

    if let reach = reach, !reach.isReachable() {
        showNetworkUnavailableAlert()
        handlingCacheRequest = true
    }
    webView.stopLoading()
    loadingIndicator.stopAnimating()
}

I hope this helps. The only thing I'm still trying to figure out is the image assets aren't being loaded offline. I'm thinking I'll need to make a separate request for those images and keep a reference to them locally. Just a thought but I'll update this when I have that worked out.

UPDATED with images loading offline with below code I used the Kanna library to parse my html string from my cached response, find the url embedded in the style= background-image: attribute of the div, used regex to get the url (which is also the key for Kingfisher cached image), fetched the cached image and then modified the css to use the image data (based on this article: https://css-tricks.com/data-uris/), and then loaded the webview with the modified html. (Phew!) It was quite the process and maybe there is an easier way.. but I had not found it. My code is updated to reflect all these changes. Good luck!

func modify(_ html: String, completedModification: @escaping (String) -> Void) {

    guard let doc = HTML(html: html, encoding: .utf8) else {
        DDLogInfo("Couldn't parse HTML with Kannan")
        completedModification(html)
        return
    }

    var imageDiv = doc.at_css("div[class='<your_div_class_name>']")

    guard let currentStyle = imageDiv?["style"],
        let currentURL = urlMatch(in: currentStyle)?.first else {

            DDLogDebug("Failed to find URL in div")
            completedModification(html)
            return
    }

    DispatchQueue.main.async {

        self.replaceURLWithCachedImageData(inHTML: html, withURL: currentURL, completedCallback: { modifiedHTML in

            completedModification(modifiedHTML)
        })
    }
}

func urlMatch(in text: String) -> [String]? {

    do {
        let urlPattern = "\\((.*?)\\)"
        let regex = try NSRegularExpression(pattern: urlPattern, options: .caseInsensitive)
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))

        return results.map { nsString.substring(with: $0.range) }
    }
    catch {
        DDLogError("Couldn't match urls: \(error.localizedDescription)")
        return nil
    }
}

func replaceURLWithCachedImageData(inHTML html: String, withURL key: String, completedCallback: @escaping (String) -> Void) {

    // Remove parenthesis
    let start = key.index(key.startIndex, offsetBy: 1)
    let end = key.index(key.endIndex, offsetBy: -1)

    let url = key.substring(with: start..<end)

    ImageCache.default.retrieveImage(forKey: url, options: nil) { (cachedImage, _) in

        guard let cachedImage = cachedImage,
            let data = UIImagePNGRepresentation(cachedImage) else {
                DDLogInfo("No cached image found")
                completedCallback(html)
                return
        }

        let base64String = "data:image/png;base64,\(data.base64EncodedString(options: .endLineWithCarriageReturn))"
        let modifiedHTML = html.replacingOccurrences(of: url, with: base64String)

        completedCallback(modifiedHTML)
    }
}


回答4:

Easiest way to use cache webpage is as following in Swift 4.0: -

/* Where isCacheLoad = true (Offline load data) & isCacheLoad = false (Normal load data) */

internal func loadWebPage(fromCache isCacheLoad: Bool = false) {

    guard let url =  url else { return }
    let request = URLRequest(url: url, cachePolicy: (isCacheLoad ? .returnCacheDataElseLoad: .reloadRevalidatingCacheData), timeoutInterval: 50)
        //URLRequest(url: url)
    DispatchQueue.main.async { [weak self] in
        self?.webView.load(request)
    }
}