可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm interested in learning about tools that operate like this:

Given a database schema and some data, suggest whether the schema is likely structured in any particular normal form, and then tell how the schema might be factored to produce further normalization.

Basically, a static analysis tool for database schema design.

Like other static analysis tools, such a database tool need not generate perfect results (I doubt such a "perfect" tool is computationally feasible), or applicable to all database systems, or free/open source, or anything else. The tool need not be stand-alone; it could be bundled as some sophisticated IDE feature. I'm just wondering what's out there.

回答1:

It can be done, and there are at least 2 commecially available tools that can do normalization for you: GeneXus and DeKlarit. They use a process named NormalizationBySynthesis

回答2:

A tool like you describe, that tried to analyze your data and metadata and advise you of possible non-normalized structure, would give wrong suggestions so frequently that it would make Clippy, the Microsoft assistant seem like an indispensable writer's aid.

The process of normalization involves mapping software requirements to a logical data model. An analysis tool cannot know your data requirements any better than you do. So it can't infer from an incorrect database design which parts of it are wrong.

I understand you have qualified the question and you have limited expectations for the tool's capability. But you would want it to be useful for ordinary, everyday tasks -- but it would not be reliable even for the most simple cases.

Compare to static code analysis tools. Suppose you write an application and deliver it to your client, and the client says "why can't I send email from this app?" How would a static code analysis tool tell you that you omitted a desired feature? It can't know those requirements.

Likewise, how would a database normalization analysis tool know whether it's appropriate for the UserAccount table to have a single MobilePhoneNumber attribute, or if it would be more appropriate to separate phone numbers into another table so a single user could have multiple phones listed?

回答3:

I have thought about this problem too. It is theoritically possible and there are some research papers on this topic. there used to be pretty cool tool at www.dbtools.cs.cornell.edu. This was developed by the famous author Raghu Ramakrishnan. He is currently at Yahoo Research. You can refer to the following papers for more information

Diederich,T. and Miton,J.,(1988),"New Methods and Fast Algorithms for Database Normalization ",ACM Transactions on Database Systems, 13(3),339-365.
Bernstein, P. A.(1986),"Synthesizing Third Normal Form Relations from Functional Dependencies",ACM Transactions on Database Systems , Vol.1. No. 4, pp 277-298.
JMathNorm: A Database Normalization Tool Using Mathematica, Lecture Notes In Computer Science; Vol. 4488, Proceedings of the 7th international conference on Computational Science, Part II, Ali Yazici, Ziya Karakaya

The third link is very interesting. Here is the abstract of the paper:

This paper is about designing a complete interactive tool, named JMathNorm, for relational database (RDB) normalization using Mathematica. It is an extension of the prototype developed by the same authors [1] with the inclusion of Second Normal Form (2NF), and Boyce-Codd Normal Form (BCNF) in addition to the existing Third normal Form (3NF) module. The tool developed in this study is complete and can be used for real-time database design as well as an aid in teaching fundamental concepts of DB normalization to students with limited mathematical background. JMathNorm also supports interactive use of modules for experimenting the fundamental set operations such as closure, and full closure together with modules to obtain the minimal cover of the functional dependency set and testing an attribute for a candidate key. JMathNorm's GUI interface is written in Java and utilizes Mathematica's JLink facility to drive the Mathematica kernel.

I am also very interested in an answer to this question. If anyone has come across a tool, please let us know!

回答4:

The open-source tool SchemaSpy detects some "Anomalies", like "Tables without indexes" or "Columns that are flagged as both 'nullable' and 'must be unique'". One of these anomalies is "Tables with incrementing column names, potentially indicating denormalization"

http://schemaspy.sourceforge.net/

回答5:

I think that's hard to implement.

Like, a table

Id | Name | Surname | SSN

is in 1NF, and

Id | Name | Surname | Mobile

is not, but you can say it neither from the design nor from data, only from the field name.

I've seen once a database for ultrasonography that actually had both GENDER and LMP (last menstrual period) in one table.

回答6:

It can't be done.

Normalization is defined in terms of functional dependencies which

Cannot be expressed in SQL
Cannot be inferred from the data

You can look at a table such as

A | B
--+--
1 | 1
1 | 2

and deduce that B is not dependent on A (because it has two distinct values for a single value of A), but you can never deduce that a dependency does exist, e.g. A may or may not depend on B.