Looking for a scalable, flexible and fast database design for 'Build your own form' style website - e.g Wufoo.
Rules:
- User has only 1 Form they can build
- User can create their own fields or choose from 'standard' fields
- User's 1 Form has as many fields as the user wants
- Values can be the sibling of another value E.g A photo value could have name, location, width, height as sibling values
Special Rules:
- User can submit their form a maximum of 5 times a day
- Value's Date is important
- Flexibility to report on values (for single user, across all users, 1 field, many fields) is very important -- data visualization (most will be chronologically based e.g. all photos for July 2009 for all users).
Table "users"
uid
Table "field_user" - assign a field to a users form
fid
uid
weight - int - used to order the fields on the users form
Table "fields"
fid
creator_uid - int - the field 'creator'
label - varchar - e.g. Email
value_type - varchar - used to determine what field in the 'values' table will be filled in (e.g. if 'int' then values of this field will submit data into the values.type_int field - and all other .type_x fields will be NULL).
field_type - varchar - e.g. 'email' - used for special conditions e.g. validation rules
Table "values"
vid
parent_vid
fid
uid
date - date
date_group - int - value 1-5 (user may submit max of 5 forms per day)
type_varchar - varchar
type_text - text
type_int - int
type_float - float
type_bool - bool
type_date - date
type_timestamp - timestamp
I understand that this approach will mean records in the 'Value' table will only have 1 piece of data with other .type_x fields containing NULL's... but from my understanding this design will be the 'fastest' solution (less queries, less join tables)
At OSCON yesterday, Josh Berkus gave a good tutorial on DB design, and he spent a good fraction of it mercilessly tearing into such "EAV"il tables; you should be able to find his slides on the OSCON site soon, and eventually the audio recording of his whole tutorial online (the latter will probably take a while).
You'll need a join per attribute (multiple instances of the values
table, one per attribute you're fetching or updating) so I don't know what you mean by "less join tables". Joining many instances of the same table isn't a particularly fast operation, and your design makes indices nearly unfeasible and unusable.
At least as a minor improvement use per-type separate tables for your attributes' values (maybe some indexing might be applicable in that case, though with MySQL's limitation to one index per query per table even that's somewhat dubious).
You should really look into schema-free dbs like CouchDB, problems like this are exactly those these types of DBs want to solve.
y'know, create table, alter, add a column, etc are operations you can do at run time in many modern rdbms implementations. Why be EAVil? Especially if you are using dynamic sql.
It's not for the fainthearted. I recall an implementation at Boeing which resulted in 70,000 tables in a database.
Obviously there are pitfalls in dynamic table creation, but they also exist for EAV tables. Things like two attributes for the same key expressing the same fact. Or transitive dependencies and other normalization gotchas. So why not at least leverage the power of the RDBMS on your behalf?
I agree with john owen.
dynamically creating a query from the schema is a small price to pay compared to querying EVA tables. Especially if the tables are large.
Usually table columns are considered an "interface". A design that relies on a dynamically changing interface is usually bad, but EAV data is a special case where you don't have many options. You have to choose between slow unintuitive queries or dynamic schema.