Possible Duplicate:
Schema for a multilanguage database
I'm working on a web application that I plan to make available in multiple languages. As I design the database, I'm going back and forth between two different ways to store localized descriptions and whatnot in the database.
First option is the well-known table_name, table_name_ml type option:
TABLE Category (
ID int,
ParentID int,
Name varchar(50),
Description varchar(255)
)
TABLE Category_ML (
ID int,
CategoryID int,
LocaleID int,
Name varchar(50),
Description varchar(255)
)
The second option would be to not store the text in the base tables at all, but instead store a token that could be used to lookup the actual localized text elsewhere, like this:
TABLE Category (
ID int,
ParentID int,
NameToken varchar(50),
DescriptionToken varchar(50),
)
// Tables in a separate content management type system
TABLE Content (
ID int,
Token varchar(50)
)
TABLE Translation (
ID int
ContentID int,
LocaleID int,
Value text
)
The idea here is that the Content and Translation tables would hold the localized text for many different entities in the database. The service layer would return the base objects with just the tokens and the view layer would look up the actual text values using the Content/Translation tables - which would be heavily cached. The Content/Translation tables would also be used for storing other CMS type content (static text on web pages, etc.)
I like the first option because it's tried and true, but the second option seems to have so many other advantages:
- All my text/localized content is in one place (makes translating easier).
- Service layer doesn't really need to care about locales.
- Simplifies queries by not having to join in a bunch of ML type tables.
Since I've never seen a design like this before, I assume I must be missing something. Any good reasons not to design it this way? Or maybe there's a better option that I haven't thought of?
There is an extra option and i think i will bet on this !
REASONS (PROS):
CONS:
I will first say that I have no dealt with localization before so this is really just my opinion and not based on experience.
I like your 2nd option. As far as the DB goes.. its data and a way to access/manipulate the data. In this case all the data is there and you will be mostly reading it and have a good way of getting to it. You can answer the same problem in both scenarios. I would prefer the 2nd option myself because it reduces crazy tables everywhere. You are keeping a table around for the specific purpose of translation. You can reuse it (no creating more tables just for upgrades later) and it maintains integrity. You could even reuse names if it makes sense somewhere. Like if you had 'Mantequilla' as a Category and as a Favorite somewhere else.
I like to put related data in one table when possible and not have data related to 'translating' in multiple places.
The only place this may fail is if you have more than just Name and Description for something that needs translation. Maybe you have Name, Description, Code, Magic Word, Silly Nickname, etc for an item. Although you could get around this by adding more NameTokens in that relevant table and reusing Name, but this is a bit of a hack.
Just make sure the model meets your needs everyone and it should work fine. You can always throw in a special translation table if needed later for a specific table. This wouldn't be to different from creating lots of table although a hybrid solution could be confusing. It is best to find one way and try to stick to it.