How can I read Chrome Cache files?

2020-02-07 16:45发布

A forum I frequent was down today, and upon restoration, I discovered that the last two days of forum posting had been rolled back completely.

Needless to say, I'd like to get back what data I can from the forum loss, and I am hoping I have at least some of it stored in the cache files that Chrome created.

I face two problems -- the cache files have no filetype, and I'm unsure how to read them in an intelligent manner (trying to open them in Chrome itself seems to "redownload" them in a .gz format), and there are a ton of cache files.

Any suggestions on how to read and sort these files? (A simple string search should fit my needs)

11条回答
混吃等死
2楼-- · 2020-02-07 17:08

Joachim Metz provides some documentation of the Chrome cache file format with references to further information.

For my use case, I only needed a list of cached URLs and their respective timestamps. I wrote a Python script to get these by parsing the data_* files under C:\Users\me\AppData\Local\Google\Chrome\User Data\Default\Cache\:

import datetime
with open('data_1', 'rb') as datafile:
    data = datafile.read()

for ptr in range(len(data)):
    fourBytes = data[ptr : ptr + 4]
    if fourBytes == b'http':

        # Found the string 'http'. Hopefully this is a Cache Entry
        endUrl = data.index(b'\x00', ptr)
        urlBytes = data[ptr : endUrl]
        try:
            url = urlBytes.decode('utf-8')
        except:
            continue

        # Extract the corresponding timestamp
        try:
            timeBytes = data[ptr - 72 : ptr - 64]
            timeInt = int.from_bytes(timeBytes, byteorder='little')
            secondsSince1601 = timeInt / 1000000
            jan1601 = datetime.datetime(1601, 1, 1, 0, 0, 0)
            timeStamp = jan1601 + datetime.timedelta(seconds=secondsSince1601)
        except:
            continue

        print('{} {}'.format(str(timeStamp)[:19], url))
查看更多
我只想做你的唯一
3楼-- · 2020-02-07 17:09

EDIT: The below answer no longer works see here


Chrome stores the cache as a hex dump. OSX comes with xxd installed, which is a command line tool for converting hex dumps. I managed to recover a jpg from my Chrome's HTTP cache on OSX using these steps:

  1. Goto: chrome://cache
  2. Find the file you want to recover and click on it's link.
  3. Copy the 4th section to your clipboard. This is the content of the file.
  4. Follow the steps on this gist to pipe your clipboard into the python script which in turn pipes to xxd to rebuild the file from the hex dump: https://gist.github.com/andychase/6513075

Your final command should look like:

pbpaste | python chrome_xxd.py | xxd -r - image.jpg

If you're unsure what section of Chrome's cache output is the content hex dump take a look at this page for a good guide: http://www.sparxeng.com/blog/wp-content/uploads/2013/03/chrome_cache_html_report.png

Image source: http://www.sparxeng.com/blog/software/recovering-images-from-google-chrome-browser-cache

More info on XXD: http://linuxcommand.org/man_pages/xxd1.html

Thanks to Mathias Bynens above for sending me in the right direction.

查看更多
The star\"
4楼-- · 2020-02-07 17:10

EDIT: The below answer no longer works see here


In Chrome or Opera, open a new tab and navigate to chrome://view-http-cache/

Click on whichever file you want to view. You should then see a page with a bunch of text and numbers. Copy all the text on that page. Paste it in the text box below.

Press "Go". The cached data will appear in the Results section below.

查看更多
该账号已被封号
5楼-- · 2020-02-07 17:15

I've made short stupid script which extracts JPG and PNG files:

#!/usr/bin/php
<?php
 $dir="/home/user/.cache/chromium/Default/Cache/";//Chrome or chromium cache folder. 
 $ppl="/home/user/Desktop/temporary/"; // Place for extracted files 

 $list=scandir($dir);
 foreach ($list as $filename)
 {

 if (is_file($dir.$filename))
    {
        $cont=file_get_contents($dir.$filename);
        if  (strstr($cont,'JFIF'))
        {
            echo ($filename."  JPEG \n");
            $start=(strpos($cont,"JFIF",0)-6);
            $end=strpos($cont,"HTTP/1.1 200 OK",0);
            $cont=substr($cont,$start,$end-6);
            $wholename=$ppl.$filename.".jpg";
            file_put_contents($wholename,$cont);
            echo("Saving :".$wholename." \n" );


                }
        elseif  (strstr($cont,"\211PNG"))
        {
            echo ($filename."  PNG \n");
            $start=(strpos($cont,"PNG",0)-1);
            $end=strpos($cont,"HTTP/1.1 200 OK",0);
            $cont=substr($cont,$start,$end-1);
            $wholename=$ppl.$filename.".png";
            file_put_contents($wholename,$cont);
            echo("Saving :".$wholename." \n" );


                }
        else
        {
            echo ($filename."  UNKNOWN \n");
        }
    }
 }
?>
查看更多
小情绪 Triste *
6楼-- · 2020-02-07 17:18

It was removed on purpose and it won't be coming back.

Both chrome://cache and chrome://view-http-cache have been removed starting chrome 66. They work in version 65.

Workaround

You can check the chrome://chrome-urls/ for complete list of internal Chrome URLs.

The only workaround that comes into my mind is to use menu/more tools/developer tools and having a Network tab selected.

The reason why it was removed is this bug:

The discussion:

查看更多
登录 后发表回答