I have a database that is holding real estate MLS (Multiple Listing Service) data. Currently, I have a single table that holds all the listing attributes (price, address, sqft, etc.). There are several different property types (residential, commercial, rental, income, land, etc.) and each property type share a majority of the attributes, but there are a few that are unique to that property type.
My question is the shared attributes are in excess of 250 fields and this seems like too many fields to have in a single table. My thought is I could break them out into an EAV (Entity-Attribute-Value) format, but I've read many bad things about that and it would make running queries a real pain as any of the 250 fields could be searched on. If I were to go that route, I'd literally have to pull all the data out of the EAV table, grouped by listing id, merge it on the application side, then run my query against the in memory object collection. This also does not seem very efficient.
I am looking for some ideas or recommendations on which way to proceed. Perhaps the 250+ field table is the only way to proceed.
Just as a note, I'm using SQL Server 2012, .NET 4.5 w/ Entity Framework 5, C# and data is passed to asp.net web application via WCF service.
Thanks in advance.
Lets consider the pros and cons of the alternatives:
One table for all listings + attributes:
One table for all listings, one table for attribute types and one for (listing IDs + attribute IDS +) values (EAV):
or: (compare performance on actual DB)
and then:
Compromise option - one table for all listings and one table per group of attributes including values (assuming you can divide attributes into groups):
Consider the pros and cons according to your specific statistics (regarding sparseness) and requirements/maintainability plan (e.g. How often are attribute types added/changed?) and decide.
What I probably do:
I first create a table for the 250 fields, where I have the ID, and the FieldName, for example:
This table it will also hard coded on my code as enum and used on queries.
Then in the main table I have two fields together, one the type of the field ID get it from the above table, and the second the value of it, for example
Here the issue is that you may need at least two fields, one for number and one for strings.
This is just a proposal of course.
I would create a
listing
table which contains only the shared attributes. This table would havelistingId
as the primary key. It would have a column that stores the listing type so you know if it's a residential listing, landing listing, etc.Then, for each of the subtypes, create an extra table. So you would have tables for
residential_listing
,land_listing
, etc. The primary key for all of these tables would also belistingId
. This column is also a foreign key tolisting
.When you wish to operate on the shared data, you can do this entirely from the
listing
table. When you are interested in specific data you will join in the specific table. Some queries may be able to run entirely on the specific table if all the data is there.