PHP runs in a shared-nothing environment, which in this context means that every web request is run in a clean environment. You can not access another request's data except through a separate persistence layer (filesystem, database, etc.).
What about Ruby on Rails? I just read a blog post stating that separate requests might access the same class variable.
It has occurred to me that this probably depends on the web server. Mongrel's FAQ states that Mongrel uses one thread per request - suggesting a shared-nothing environment. The FAQ goes on to say that RoR is not thread safe, which further suggests that RoR would not exist in a shared environment unless a new request reuses the in-memory objects created from the previous request.
Obviously this has huge security ramifications. So I have two questions:
- Is the RoR environment shared-nothing?
- If RoR runs in (or might run in some circumstances) a shared environment, what variables and other data storage should I be paranoid about?
Update: I'll clarify further. In a Java servlet container you can have objects which persist across multiple requests. This is typically done for caching data which multiple users would have access to, database connections, etc.. In PHP this can not be done at the application layer, it must be done in a separate persistence layer like Memcached. So the twofold question is: which scenario is RoR like (PHP or Java) and if like Java, which data types persist across multiple requests?
In short:
- No, Rails never runs in a shared-nothing environment.
- Be paranoid about class variables and class instance variables.
The longer version:
Rails processes start their life cycle by loading the framework and application. They will typically run only a single thread, which will process many requests during its lifetime. The requests will therefore be dispatched strictly sequentially.
Nevertheless, all classes persist across requests. This means any object referenced from your classes and metaclasses (such as class variables and class instance variables) will be shared across requests. This may bite you, for example, if you try to memoize methods (@var ||= expensive_calculation
) in your class methods, expecting it will only persist during the current request. In reality, the calculation will only be performed on the first request.
On the surface, it may seem nice to implement caching, or other behaviour that depends on persistence across requests. Typically, it isn't. This is because most deployment strategies will use several Rails processes to counter their own single-threaded nature. It is simply not cool to block all requests while waiting for a slow database query, so the easy way out is to spawn more processes. Naturally, these processes do not share anything (except some memory perhaps, which you won't notice). This may bite you if you save stuff in your class variables or class instance variables during requests. Then, somehow, sometimes the stuff appears to be present, and sometimes it appears to be gone. (In reality, of course, the data may or may not be present in some process, and absent in others).
Some deployment configurations (most notably JRuby + Glassfish) are in fact multithreaded.
Rails is thread safe, so it can deal with it. But your application may not be thread safe. All controller instances are thrown away after each request, but as we know, the classes are shared. This may bite you if you pass information around in class variables or in class instance variables. If you do not properly use synchronisation methods, you may very well end up in race condition hell.
As a side note: Rails is typically run in single-threaded processes because Ruby's thread implementation sucks ass. Luckily, things are a little better in Ruby 1.9. And a lot better in JRuby.
With both these Ruby implementations gaining in popularity, it seems likely that multithreaded Rails deployment strategies will also gain in popularity and number. It is a good idea to write your application with multithreaded request dispatching in mind already.
Here is a relatively simple example that illustrates what can happen if you are not careful about modifying shared objects.
Create a new Rails project: rails test
Create a new file lib/misc.rb
and put in it this:
class Misc
@xxx = 'Hello'
def Misc.contents()
return @xxx
end
end
- Create a new controller:
ruby script/generate controller Posts index
Change app/views/posts/index.html.erb
to contain this code:
<%
require 'misc'; y = Misc.contents() ; y << ' (goodbye) '
%>
<pre><%= y %></pre>
(This is where we modify the implicitly shared object.)
- Add RESTful routes to
config/routes.rb
.
- Start the server
ruby script/server
and load the page /posts
several times. You will see the number of ( goodbye)
strings increasing by one on each successive page reload.
In your average deployment using Passenger, you probably have multiple app processes that share nothing between them but classes within each process that maintain their (static) state from request to request. Each request, though, makes a new instance of your controllers.
You might call this a cluster of distinct shared-state environments.
To use your Java analogy, you can do the caching and have it work from request to request, you just can't assume that it will be available on every request.
Shared-nothing is sometimes a good idea. But not when you have to load a large application framework and a large domain model and a large amount of configuration on every request.
For efficiency, Rails keeps some data available in memory to be shared among all requests for the lifetime of an application. Most of this data is read-only, so you shouldn't be worried.
When you write your app, stay away from writing to shared objects (excluding the database, for example, which comes out-of-the-box with good concurrency control) and you should be fine.