Specifically is there some risk that data can be lost? I'm looking at running an intensive transaction processing system where it is critical that nothing be lost. Are there any examples of NoSQL being used in mission critical applications such as banking transaction processing?
问题:
回答1:
Without being flippant, the absence of ACID means you get no guarantees of atomicity, consistency, isolation, or durability.
Without atomicity, you get no guarantee that multiple actions that must either succeed or fail together do so. For instance, if your transactions require you to debit one account and credit another in one go, in the absence of atomic transactions, you either have to roll your own solution, or accept that it's possible for you to debit one account, without making the corresponding credit.
Without consistency, there's no guarantee that "side effects" of your transactions work - in relational databases, that's things like the firing of a trigger, or the cascading of a foreign key relationship. So, if you need some kind of auto-incrementing unique identifier for your transaction, there's no guarantee that you will get one.
WIthout isolation, there's no way to guarantee that two processes don't affect the data at the same time. For instance, one process might be incrementing the value of a field, and a second one might be decrementing it - who wins?
Without durability, hardware failure could leave the database in a different state than you expect - for instance, you may believe that a change was written to the data store, but it was queued in some internal memory buffer, and disappear into thin air if there's a power failure.
It's probably possible to build a solution on NoSQL which works around the absence of ACID compliance, but the level of effort would be huge, and you almost certainly won't do as good a job as the guys who write relational databases....
回答2:
It's a paradox that every RDBMS guy thinks the sky would fall without ACID, but most NoSQL guys happily deploy and support end-user applications without ever thinking "my application would be better with ACID".
(Note: this answer is similar to what I gave to the very similar question: What Applications Don't Need ACID? )
The vast majority of NoSQL databases have the 'D' (durability) property: an unexpected loss of power will leave you with committed transactions being evident in the database, with the caveat that NoSQL transactions are 'small' in some sense. So "No": a typical NoSQL db does not lose data.
In most NoSQL databases you get to use limited versions of atomicity & isolation etc., but it takes an exponential amount of effort to implement transactions of arbitrary complexity. So there is no reason why you can't implement a bank system using a non-ACID database: most NoSQL databases would let you use micro-transactions which deduct money from one account and add it to another, with a 0% chance of the total amount of money in the system changing. (However, as a counter-example, I believe a banking application could not be written in Google AppEngine because their transactions only work within one 'complex object' which would be a single user's set of bank accounts).
In order to discuss this question in the context of real-world examples, I'll describe our application. My company sells software to high schools, primarily for timetabling but also roll-call, managing teacher absences/replacements, excursions and room bookings. Our software is based on an in-house developed non-ACID database engine called Mrjb (only available internally) which has limitations which are typical of NoSQL databases.
An example of the difference between ACID and NoSQL as relevant to the end user is that if 2 users try to mark the same roll at exactly the same time, there is a (very) small chance that the final result will be a combination of data submitted by both users. An ACID database would guarantee that the final result is either one user's data or the other's, or possibly that one user's update will fail and return an error-message to the user.
In this case I don't think our users would care about whether the individual students' "absence" statuses are all consistent with one user's update or a mixture of both, although they would be concerned if we assigned absence statuses which are contrary to both users' inputs. This example should not occur in practice, and if it does then it's a "race condition" where there's essentially no right answer about which user we believe.
A question was raised in relation to our Mrjb database about whether we're able to implement constraints such as "must not allow a Student object to exist without a corresponding Family object". (The 'C' in 'ACID' = Consistency). In fact we can and do maintain this constraint - another example of a micro-transaction.
Another example is when uploading a new version of the cyclical school timetable (typically a 2-week cycle) upon which the daily timetable is based. We would be hard pressed to make this update transaction atomic or to allow other transactions to execute in isolation from this update. So we basically have a choice to either "stop the world" while this major transaction occurs, which takes about 2 seconds, or allow the possibility that a student prints off a timetable containing a combination of pre-update and post-update data (there's probably a 100ms window in which this could occur). The "stop the world" option is probably the better option, but in fact we do the latter. You could argue that a mixed timetable is worse than a pre-update timetable, but in both cases we need to rely on the school having a process to notify students that the timetable has changed - a student working off an out-of-date timetable is a big problem either way.
For the record, our company is described here: http://edval.com.au and our NoSql technology is described here (described as a technique): http://www.edval.biz/memory-resident-programming-object-databases .
回答3:
Yes there is a risk that data will be lost and only a fool would use it for finaical transactions.