To Normalize or Not To Normalize

2019-04-29 10:08发布

问题:

I am designing a System which has different types of Addresses. For example, a Person Address, A Hotel Address, An Airport Address, An Office Address.

I am involved in a discussion where I am of the opinion that as the addresses are different(of different entities Hotel, Airport etc.) the addresses should be stored in separate tables. I think this would improve performance.

There is another opinion to have all the addresses in the same table.

I am using PostgreSQL and am looking at over 10 million records.

What do you think is the better design?

I look forward to your opinions.

Regards,

Shardul

回答1:

I would recommend to keep the addresses in the same table, and have a type field do indicate what type of address it is.

10 million records isn't an unmanageable amount if you have the correct indexes and udpated statistics.

By having them in the same table, you ensure scalability. What happens if there is another type of address added? The change to code would be drastic for another added table, but it would be minimal if you are just having another address type in your existing table.



回答2:

Since your addresses are not different, that is, they have the same format for whatever entity they are attached to, I don't see a good reason to separate them out, at least not without any operational data to support such a decision.

By all means, if you find that you are having a bottleneck with the addresses, use multiple tables for specific entity addresses, but not before.



回答3:

Personally, I think an address is an address, so they should be in the same table. Any extra information about what type of address it is could be stored with the links to the owner. For example, a company could have 5 addresses, and the CompanyAddress table could have an 'IsHeadOffice' column along with the CompanyId and AddressId and so on.



回答4:

For 10 million records you'd better put addresses into separate table.

Furthermore, if you'll want to search for address you could further normalize Address table. I worked on several systems with a design of something like this:

Address table

  • address_id
  • city_id -- this is redundant but speeds up search
  • street_id
  • street_number (varchar, it could be anything like "32/c 2nd floor 15" )

Street table

  • street_id
  • city_id
  • street_name

City table

  • city_id
  • city_name

And Person, Hotel, Office, etc tables will have an address_id



回答5:

It is always difficult answer this kind of question as it really depends by your business. Anyway the classic approach would be create a Table address that contains all the addresses and create a relationship with all the entities that needs to have an address

Hotel ---> Address

AirPort -->Address

etc

in this way you will be able to understand also the typology of address by the entity is related to it(or if you prefer you can even add an address type table)

If in your business you don't need to consider an address as an entity but you are just interested in the value it got(you identify an address by its state rather than by its identity id) you could consider an address as a value object(immutable). In this case you can add the address attribute to each "main entity": Hotel,Airport,etc

Have a look at the Enric Envas book and DDD concept:

http://lostechies.com/jimmybogard/2008/05/21/entities-value-objects-aggregates-and-roots/



回答6:

I don't think you've given enough information about how the addresses are used and how they are retrieved or updated.

If it was just addresses of different types with no other entities, I would store them in the same table.

Relational database performance is designed to be scalable with table-size, given appropriate indexing.



回答7:

To determine the tables can be the trickiest step in the database design process. That is because the results you want from your database (e.g. the reports you want to print, the forms you want to use, the questions you want answered) don't necessarily provide clues about the structure of the tables that produce them. In fact, it may be better to sketch out and rework your design on paper first. When you design your tables, divide up pieces of information by keeping following fundamental design principles in mind:

  • A table should not contain duplicate information, and information should not be duplicated between tables (e.g. Store each customer address and phone number once, in one table).

  • When each piece of information is stored in only one table, you update it in one place. This is more efficient, and also eliminates the possibility of duplicate entries that contain different information.

  • Each table should contain information about one subject. When each table contains facts about only one subject, you can maintain information about each subject independently from other subjects (e.g. You would store customer addresses in a different table from the customers' orders, so that you could delete one order and still maintain the customer information).



回答8:

Normalize - because it's the correct logical design - and then use horizontal partitioning if/when necessary to split the physical design from the logical.