Django StaticFiles and Amazon S3: How to detect mo

2019-01-17 17:54发布

问题:

I'm using django staticfiles + django-storages and Amazon S3 to host my data. All is working well except that every time I run manage.py collectstatic the command uploads all files to the server.

It looks like the management command compares timestamps from Storage.modified_time() which isn't implemented in the S3 storage from django-storages.

How do you guys determine if an S3 file has been modified?

I could store file paths and last modified data in my database. Or is there an easy way to pull the last modified data from Amazon?

Another option: it looks like I can assign arbitrary metadata with python-boto where I could put the local modified date when I upload the first time.

Anyways, it seems like a common problem so I'd like to ask what solution others have used. Thanks!

回答1:

The latest version of django-storages (1.1.3) handles file modification detection through S3 Boto.

pip install django-storages and you're good now :) Gotta love open source!

Update: set the AWS_PRELOAD_METADATA option to True in your settings file to have very fast syncs if using the S3Boto class. If using his S3, use his PreloadedS3 class.


Update 2: It's still extremely slow to run the command.


Update 3: I forked the django-storages repository to fix the issue and added a pull request.

The problem is in the modified_time method where the fallback value is being called even if it's not being used. I moved the fallback to an if block to be executed only if get returns None

entry = self.entries.get(name, self.bucket.get_key(self._encode_name(name)))

Should be

    entry = self.entries.get(name)
    if entry is None:
        entry = self.bucket.get_key(self._encode_name(name))

Now the difference in performance is from <.5s for 1000 requests from 100s


Update 4:

For synching 10k+ files, I believe boto has to make multiple requests since S3 paginates results causing a 5-10 second synch time. This will only get worse as we get more files.

I'm thinking a solution is to have a custom management command or django-storages update where a file is stored on S3 which has the metadata of all other files, which is updated any time a file is updated via the collectstatic command.

It won't detect files uploaded via other means but won't matter if the sole entry point is the management command.



回答2:

I answered the same question here https://stackoverflow.com/a/17528513/1220706 . Check out https://github.com/FundedByMe/collectfast . It's a pluggable Django app that caches the ETag of remote S3 files and compares the cached checksum instead of performing a lookup every time. Follow the installation instructions and run collectstatic as normal. It took me from an average around 1m30s to about 10s per deploy.