I have one project what will store large media content after it will be deployment. Project use Python, Django, also runned via Gunicorn and Supervisor.
For static files I will use nginx.
Basic setup I made by following this article, but I have an question, how can I store content more dynamicly. At the start I have one machine with 4 hard drives for 2Tb each, later will be buyed more, as well as the new machines (currently I have only one).
Site located as site.com
, nginx located at subdomain i.site.com
and have 2 folders in a root: /static
for storing css, js, svg, e.t.c. design elements and /media
what will be stored media content.
The problem is setup nginx to write media on each hard drives and each machine will be used.
For the speed I need to write every new file in a different hard drive (like rotation/loop), for example I saving file1
, it writed at machine1/hdd1
, then I saving file2
, it writed at machine1/hdd2
... file4
at machine1/hdd4
, file5
at machine2/hdd1
(currently as I mentioned I have only one machine, but will be more in a future).
So, anyone have expirience or idea how can I make that? I'm sure nginx can at least write at multiple hard drives, but I'm not sure what can I do if I need to write media data at multiple machines.
If you also have any other idea, please suggest it as well.
Example of nginx config you may find in mentioned article, or by following this link. I also looking for nginx upstream module, but I not actually sure if I can configure that with it.
Update: Previously, I wrote the answer without giving much thought to the actual problem that you're trying to solve. Your comment below brings out some interesting problems that I had ignored previously. I've now re-written my answer. Hopefully, this will be helpful.
In the previous version of this answer I mentioned Load Balancing. But clearly your problem is more about file storage than managing load.
What you're looking for is called a Distributed File System. A distributed file system allows you to plug in many disks and it can scale out to multiple machines.
A DFS groups together all the disks and machines and gives you access to them as if it were a single disk. Not only that, a DFS software can also take care of file replication for you, if you want.
I've no experience with using any DFS, but I've read a little bit about GlusterFS. I hear it is good, but feel free to do your research.
Let me try and explain how GlusterFS works. Look at this diagram:
/ Disk 1
/ Machine 1 [Gluster Server] |
/ \ Disk 2
/
Nginx -> Gluster Client -> |
\
\ / Disk 1
\ Machine 2 [Gluster Server] |
\ Disk 2
Without getting into too much details, the Gluster Client will allow your Nginx server to access both, Machine 1 and Machine 2 from a single directory, like /media
. Inside this /media
directory, you can access both the Machines just as if all the data is stored inside /media
even though it is stored on different machines.
Guides and tutorials:
- Setting up GlusterFS on Ubuntu - very good tutorial that explains a lot of basic stuff. Although, it's written for Ubuntu 12.04, I'm sure you can adapt it to your version of OS.
- Common DFS architectures explained - some DFS architecture patterns that you might find helpful later on.