In trying to understand the correlation between program objects and data in tables (here:OO program and SQL database), which I still don't really understand, a strong difference of opinion was revealed over whether it's ok to use a SQL db to store data for an OO program.
For a new programmer, is it ok/recommended, or not?
If not, is the alternative to an OO program a procedural program?
Also I don't understand Object-relational impedance mismatch, which is what people mean when they say that it's "not OK to use an SQL DB to store data for an OO program": can someone summarize it, or is there a simple example that illustrates it?
For a new programmer, is it ok/recommended, or not?
That question implies that the system design/components/architecture are chosen/intended to benefit the programmer.
Instead I prefer to choose the design/components/architecture to benefit the system (and the system's owners, users, and operators), and ensure that the programmers know (which may require some learning or training on their part) what they need in order to develop that system.
The facts are:
- OO is often a good way to develop software
- SQL DBs are often a good way to store data
- Designing the mapping from SQL to OO may be non-trivial
If not, is the alternative to an OO program a procedural program?
Well, maybe, yes: one of the features of OO is subclassing, and subclasses/inheritance is one of the things that's problematic to model/store in a SQL database.
For example, given a OOD like this ...
class Animal
{
int id;
string name;
abstract void eat();
abstract void breed();
}
class Dog : Animal
{
bool pedigree;
override void eat() {...}
override void breed() {...}
}
class Bird : Animal
{
bool carnivore;
int numberOfEggs;
void fly() {...}
override void eat() {...}
override void breed() {...}
}
... it isn't obvious whether to store this data using 2 SQL tables, or 3. Whereas if you take the subclassing away:
class Dog
{
int id;
string name;
bool pedigree;
void eat() {...}
void breed() {...}
}
class Bird
{
int id;
string name;
bool carnivore;
int numberOfEggs;
void fly() {...}
void eat() {...}
void breed() {...}
}
... then it's easier/more obvious/more 1-to-1/more accurate to model this data using exactly two tables.
Also I don't understand Object-relational impedance mismatch
Here's an article that's longer and more famous than the Wikipedia article; maybe it's easier to understand: The Vietnam of Computer Science
Note that one of the solutions, which is proposes to the problem, is:
"Manual mapping. Developers simply
accept that it's not such a hard
problem to solve manually after all,
and write straight relational-access
code to return relations to the
language, access the tuples, and
populate objects as necessary."
In other words, it's not such a hard problem in practice. It's a hard problem in theory, i.e. it's hard to write a tool which creates the mapping automatically and without your having to think about it; but that's true of many aspects of programming, not only this one.
I would say it if definitly OK to use an SQL Database to store data for an Object-oriented program (I do this almost all the time -- and it works quite fine)
Procedural vs Object-Oriented is quite a debate ; I don't think you should choose depending on the storage type you're using : the choice should be more about your code and your application.
OOP comes with many advantages, one of which is (in my opinion) better maintenability ; better isolation, encapsulation, and all that is true too. Actually, after having programmed in OOP for a couple of years, I would find hard (and I actually do...) to do procedural programming ; the ability of having data and related-methods to manipulate them in a single class, used a an independant entity, is really great.
One problem with having one class = one table is that of normalization. There are also relationships between classes which don't map easily one to one. So my recommendation is - when doing the design for the database, do a proper schema; when doing the object oriented design, do it so without any dependency on the DB (hence the DB could be XML, flat-files, or even hand-coded array), and then write another class which would fetch the values from the form of the database which you have chosen.
First, using OO with SQL is fine. Many developers do it all the time and frankly, there's no other option available today that I think is even half as good.
That said, that are still some problems. The central one IMHO is that the Designers/Architects want Business Classes to map 1-1 to DB Tables and they just don't. It starts out well enough in high-level design, but the real problem comes you try to implement sub-classing: it just doesn't map very well to SQL and relational concepts.
So some means of mediating this slight incompatibilty (the "impedance mismatch") has to be brought into the development. There are lots of different "means" of doing this (tools, language features, DB types and options, as well as design & development disciplines) and they all have different advantages and disadvantages.
The upshot that you CAN do it, and you SHOULD do it, indeed many of us MUST do it, but there are definitely some dirty parts in the middle.
SQL databases are relational databases. Relational databases are currently the king of the hill with respect to data storage and retrieval (including program objects), except for the noSQL movement, which deals primarily with extremely large datasets.
To assist in the translation process between objects and relational tables, most programmers use some sort of Object Relational Mapper like Linq to SQL or nHibernate. This greatly reduces the effort required.
There are object oriented databases that can read and write object instances directly, but these have not caught on in the mainstream because of difficulties in reporting and retrieval of data across many objects.
To Chris W.,
This is Chris (asker of this question). I guess my reboot killed the cookie that the site used to remember me as the author of this question. (and I won't be able to comment now until I get 50 points again, I guess)
In response to your last comment to me, I understand what you're saying about compile time type-checking and sql not having that luxury. Is your point is that manually mapping doesn't have the "checking" luxury that using a mapper has? Can you use a mapper and then manually map on the same program if/when necessary?
Thanks again for all your help!
Chris W.'s response is excellent, one of the best I've read on this subject. I just want to throw in my two cents about one aspect: the problem of expressing subclassing in SQL table design.
If you do a web search on "generalization specialization relational modeling" you will get several good articles on the subject of subclassing. It's a problem that comes up repeatedly in conceptual data modeling. It's a pity that courses for database neophytes don't include this topic more often.