I want to create a Web app which would allow the user to upload some C code, and see the results of its execution (the code would be compiled on the server). The users are untrusted, which obviously has some huge security implications.
So I need to create some kind of sandbox for the apps. At the most basic level, I'd like to restrict access to the file system to some specified directories. I cannot use chroot jails directly, since the web app is not running as a privileged user. I guess a suid executable which sets up the jail would be an option.
The uploaded programs would be rather small, so they should execute quickly (a couple of seconds at most). Hence, I can kill the process after a preset timeout, but how do I ensure that it doesn't spawn new processes? Or if I can't, is killing the entire pgid a reliable method?
What would be the best way to go about this - other than "don't do it at all"? :) What other glaring security problems have I missed?
FWIW, the web app will be written in Python.
I think your solutions must concentrate on analyzing the source code. I don't know any tools, and I think this would be pretty hard with
C
, but, for example, aPascal
program which doesn't include any modules would be pretty harmless in my opinion.Spawning a new VM under KVM or qemu to compile and run the code looks like the way to go. Running the code under jail/LXC can compromise the machine if it exploits the unsecured parts of the OS like networking code. Advantage of running under a VM are obvious. One can only hack the VM but not the machine itself. But the side effect is you need lots of resources (CPU and Memory) to spawn a VM for each request.
Along with the other sugestions you might find this useful.
http://www.eelis.net/geordi/
This is from http://codepad.org/about, codepad.org's about page.
I'd say this is extremely dangerous on many levels. You're essentially opening yourself up to any exploit that can be found on your system (whereas you're normally limited to the ones people can exploit remotely). I'd say don't do it if you can avoid it.
If you do want to do it, you might want to use some kind of virtual machine to run the user's code. Using something like KVM it's possible to set up a number of virtual machines using the same base image (you can even store a snapshot in an already-booted state, though I'm not sure how it will handle being cloned). You can then create the VMs on demand, run the user's code, return the results, and then kill off the VM. If you keep the VMs isolated from each other and the network, the users can wreak any havoc they want and it won't hurt your physical server. The only danger you're exposing yourself to under these conditions would be some kind of exploit that allows them to escape from the VM... those are extremely rare, and will be more rare as hardware virtualization improves.
About the only chance you have is running a VirtualMachine and those can have vulnerabilities. If you want your machine hacked in the short term just use permissions and make a special user with access to maybe one directory. If you want to postpone the hacking to some point in the future then run a webserver inside a virtual machine and port forward to that. You'll want to keep a backup of that because you'll probably have that hacked in under an hour and want to restart a fresh copy every few hours. You'll also want to keep an image of the whole machine to just reimage the whole thing once a week or so in order to overcome the weekly hackings. Don't let that machine talk to any other machine on your network. Blacklist it everywhere. I'm talking about the virtual machine and the physical machine IP addresses. Do regular security audits on any other machines on your other machines on the network. Please rename the machines IHaveBeenHacked1 and IHaveBeenHacked2 and prevent access to those in your hosts lists and firewalls.
This way you might stave off your level of hackage for a while.
I guess libsandbox serves your purpose. Its core library is written for C/C++, but it also has a wrapper for Python programs. It provides options to customize which system calls can be allowed, how much memory can be used, how long can the guest program be run, etc. It is already being used in a couple of online judges such as HOJ.