What are some of your most useful database standar

2019-01-20 21:45发布

问题:

I have some ideas, some that I have accumulated over time, but I really want to know what makes things go smoothly for you when modeling database:

  1. Table name matches Primary Key name and description key
  2. Schemas are by functional area
  3. Avoid composite primary keys where possible (use unique constraints)
  4. Camel Case table names and field names
  5. Do not prefix tables with tbl_, or procs with SP_ (no hungarian notation)
  6. OLTP databases should be atleast in BCNF / 4NF

回答1:

  • Name similarly targetted stored procs with the same prefix, for instance if you've got 3 stored procedures for Person. That way everything for person is grouped in one place and you can find them easily without having to look through all your procs to find them.
    • PersonUpdate
    • PersonDelete
    • PersonCreate
  • Do similar things for tables when you have groups of tables with related data. For instance:
    • InvoiceHeaders
    • InvoiceLines
    • InvoiceLineDetails
  • If you have the option of schemas within your database, use them. It's much nicer to see:
    • Invoice.Header
    • Invoice.Line.Items
    • Invoice.Line.Item.Details
    • Person.Update
    • Person.Delete
    • Person.Create
  • Don't use triggers unless there's no other reasonable approach to achieve that goal.
  • Give field names a meaningful prefix so you can tell what table they come from without someone needing to explain. That way when you see a field name referenced, you can easily tell which table it's from.
  • Use consistent data types for fields containing similar data, i.e. don't store phone number as numeric in one table and varchar in another. In fact, don't store it as numeric, if I come across a negative phone number I'll be mad.
  • Don't use spaces or other obscure characters in table/field names. They should be entirely alphanumeric - or if I had my druthers, entirely alphabetic with the exception of the underscore. I'm currently working on an inherited system where table and field names contain spaces, question marks and exclamation marks. Makes me want to kill the designer on a daily basis!
  • Don't use syntax keywords as object names it'll cause headaches trying to retrieve data from them. I hate having to wrap object names as [index] that's two needless chars I didn't need to type damn you!


回答2:

One thing I haven't seen mentioned yet:

Never use database keywords as object names. You do not want to have to qualify them every time you use them

If you misspell something when you create it, fix it as soon as you notice it. Don't spend years having to remember that in this table UserName is really Usernmae. It's a whole lot easier to fix when there isn't much code written against it.

Never use implied joins (the comma syntax), always specify the joins.



回答3:

Putting everybody's input together into one list.

Naming Standards

  • Schemas are named by functional area (Products, Orders, Shipping)
  • No Hungarian Notation: No type names in object names (no strFirstName)
  • Do not use registered keywords for object names
  • No spaces or any special characters in object names (Alphanumber + Underscore are the only things allowed)
  • Name objects in a natural way (FirstName instead of NameFirst)
  • Table name should match Primary Key Name and Description field (SalesType – SalesTypeId, SalesTypeDescription)
  • Do not prefix with tbl_ or sp_
  • Name code by object name (CustomerSearch, CustomerGetBalance)
  • CamelCase database object names
  • Column names should be singular
  • Table names may be plural
  • Give business names to all constraints (MustEnterFirstName)

Data Types

  • Use same variable type across tables (Zip code – numeric in one table and varchar in another is not a good idea)
  • Use nNVarChar for customer information (name, address(es)) etc. you never know when you may go multinational

In code

  • Keywords always in UPPERCASE
  • Never use implied joins (Comma syntax) - always use explicit INNER JOIN / OUTER JOIN
  • One JOIN per line
  • One WHERE clause per line
  • No loops – replace with set based logic
  • Use short forms of table names for aliases rather than A, B, C
  • Avoid triggers unless there is no recourse
  • Avoid cursors like the plague (read http://www.sqlservercentral.com/articles/T-SQL/66097/)

Documentation

  • Create database diagrams
  • Create a data dictionary

Normalization and Referential Integrity

  • Use single column primary keys as much as possible. Use unique constraints where required.
  • Referential integrity will be always enforced
  • Avoid ON DELETE CASCADE
  • OLTP must be at least 4NF
  • Evaluate every one-to-many relationship as a potential many-to-many relationship
  • Non user generated Primary Keys
  • Build Insert based models instead of update based
  • PK to FK must be same name (Employee.EmployeeId is the same field as EmployeeSalary.EmployeeId)
  • Except when there is a double join (Person.PersonId joins to PersonRelation.PersonId_Parent and PersonRelation.PersonId_Child)

Maintenance : run periodic scripts to find

  • Schema without table
  • Orphaned records
  • Tables without primary keys
  • Tables without indexes
  • Non-deterministic UDF
  • Backup, Backup, Backup

Be good

  • Be Consistent
  • Fix errors now
  • Read Joe Celko's SQL Programming Style (ISBN 978-0120887972)


回答4:

My standards for Oracle are:

  • Keywords are always in UPPERCASE;
  • Database object names are always in lowercase;
  • Underscores will replace spaces (ie there won't be any camel case conventions that are common on, say, SQL Server);
  • Primary keys will pretty much always be named 'id';
  • Referential integrity will be enforced;
  • Integer values (including table ids) will generally always be NUMBER(19,0). The reason for this is that this will fit in a 64-bit signed integer thus allowing the Java long type to be used instead of the more awkward BigInteger;
  • Despite the misnomer of appending "_number" to some column names, the type of such columns will be VARCHAR2 not a number type. Number types are reserved for primary keys and columns you do arithmetic on;
  • I always use a technical primary keys; and
  • Each table will have its own sequence for key generation. The name of that sequence will be _seq.

With SQL Server, the only modification is to use camel case for database object names (ie PartyName instead of party_name).

Queries will tend to be written multi-line with one clause or condition per line:

SELECT field1, field2, field2
FROM tablename t1
JOIN tablename2 t2 ON t1.id = t2.tablename_id
WHERE t1.field1 = 'blah'
AND t2.field2 = 'foo'

If the SELECT clause is sufficiently long I'll split it out one field per line.



回答5:

  • Name all constraints


回答6:

don't forget to back up your databases on a regular basis.



回答7:

  1. Don't use type names in the field names. The older guys will remember the old MS standard of lpszFieldName and the stupidity that ensued.

  2. Use descriptive field names That follow normal language conventions. For example "FirstName" instead of "NameFirst"

  3. Each word in the field name is capitalized

  4. No underscores

  5. Do not use normal keywords such as "Index"

  6. Do not prefix ANYTHING with the object type. For example we do NOT use tblCustomers or spCustomersGet. These don't allow for good sorting and provide zero value.

  7. Use schemas to define separate areas of the database. Such as sales.Customers and hr.Employees. This will get rid of most of the prefixes people use.

  8. Loops of any kind should be viewed with suspicion. There's usually a better set based way.

  9. Use views for complicated joins.

  10. Avoid complicated joins when possible. It may be more astheticaly pleasing to have a CustomerPhoneNumbers table; but honestly, how many phone numbers do we really need to store? Just add the fields to the Customers table. Your DB queries will be faster and it's much easier to understand.

  11. If one table calls a field "EmployeeId" then EVERY SINGLE TABLE that references it should use that name. It doesn't need to be called CustomerServiceRepId just because it's in an extension table.

  12. Almost all tables have the "s" ending. For example: Customers, Orders, etc. After all the table holds many records...

  13. Evaluate your queries, indexes and foreign key relationships with an analysis tool. Even those that may be generated for you. You might be surprised.

  14. Linking tables which support many to many relationships have both linked tables in the name. For example, SchoolsGrades. It's very easy to tell by the table name what it does.

  15. Be CONSISTENT. If you start down one path with your conventions, don't change horses halfway unless you are willing to refactor all of the previous work. This should put the brakes on any "wouldn't it be great if.." ideas that end up causing confusion and vast amounts of rework.

  16. Think before you type. Do you really need that table, field, sproc, or view? Are you sure it isn't covered somewhere else? Get concensus before adding it. And if for some reason you have to take it out, talk to your team first. I've been at places where the DBA's make daily breaking changes without regard for the devs. This isn't fun.



回答8:

If a database is for a particular application, have a version table so that the database releases can be checked against the code releases (amongst other reasons).



回答9:

I always try not to use the type in the field name - "sFirstName", "sLastName", or "iEmployeeID". While they match at first, if something changes, they'll be out of sync, and it's a huge headache to change those names later, since you have to change the dependant objects as well.

Intellisense and the GUI tools make it trivial to find out what type a column is, so I don't feel this is necessary.



回答10:

The WITH clause really helps break queries down into manageable parts.

It also really helps for efficiency on the execution plans of the queries.



回答11:

Ensure that every varchar/nvarchar choice is appropriate.

Ensure that every NULLable column choice is appropriate - avoid NULLable columns where possible - allowing NULL should be the justifiable position.

Regardless of any other rules you might use from the suggestions here, I would create a stored procedure in the database that can be run on a regular basis to determine system health for any rules or standards you do have (some of this is a little SQL-Server specific):

  • Look for orphaned records in any cases where the DBMS system's referential integrity cannot be used for some reason (in my system I have a table of processes and a table of tests - so my system_health SP looks for processes without tests, since I only have a one-way FK relationship)

  • Look for empty schemas

  • Look for tables without primary keys

  • Look for tables without any indexes

  • Look for database objects without documentation (we use SQL Server Extended properties to put the documentation in the database - this documentation can be as granular as the column).

  • Look for system-specific issues - tables which need to be archived, exceptions which are not part of normal monthly or daily processing, certain common column names with or without defaults (CreateDate, say).

  • Look for non-deterministic UDFs

  • Look for TODO comments to ensure that code in the DB does not somehow have untested or pre-release code.

All this can be automated to give you an overall picture of system health.



回答12:

Everyone writes SQL queries (views, stored procedures, etc) in the same basic format. It really helps development/maintenance efforts down the road.



回答13:

Consistent naming standards. Having everyone on the same page, using the same format (whether it be Camel Case, specific prefixes, etc..) helps in being able to maintain a system accurately.



回答14:

A few likes and dislikes.

My opinion is prefixes are horrible in every aspect. I currently work on a system where the tables are prefixed, and the columns within the tables are prefixed with 2 letter table name acronyms, I waste at least 30 mins each day working on this database because the acronym isn't logical. If you want to denote something with a prefix use a schema owner instead.

Using NVarchar from the start of a project if there is even a slight hint that down the line the text data will need to support multi lingual chars. Upgrading large databases because of lack of forward planning and thinking is a pain and wastes time.

Splitting each condition within a where clause onto a new line for readability (in and not in statements wrapped in brackets and tabbed in.) I think this is the important standard for me.

I worked at one company where a standard was that comma's must always be placed at the start of a line when performing parameter or variable declarations. This apparently made it more readable however I found it a complete nightmare.



回答15:

In addition to normalization to 3NF or BCNF (more about that in this question), I have found the following to be useful:

  • Name tables as plural nouns
  • Name columns as sigular

So a "People" table has a "PersonID" column.

  • There is nothing wrong with composite keys, so long as the rules of 3NF or BCNF still hold. In many cases (such as the "many-to-many" case) this is entirely desirable.
  • Avoid repeating the table name in the column names. peoplePersonID is better written as table.column anyway, and much more readable and therefore self-documenting. People.PersonID is better, to me at least.
  • ON DELETE CASCADE should be used very carefully.
  • Remember that NULL means one of two things: Either it's unknown or it's not applicable.
  • Remember also that NULLs have interesting affects on joins, so practice your LEFT, RIGHT, and FULL outer joins.


回答16:

Some others (albeit small) comments to throw against the wall...

SQL Server database schemas can be useful for both organizing tables and stored procedures as well as controlling security.

Every transactional table should always track both who and when created the record as well as updated the record in separate columns. I've seen implementation that simply used "update date" which can lead to auditing challenges in the future.

Use GUID's for row identifiers for all rows for projects with offline/synchronization requirements.



回答17:

Good database design and normalization.



回答18:

  • Tables are named in the singular, lowercase, no underscores, no prefix
  • Fields also lowercase, no underscores, no prefix
  • Stored procedures prefixed with "st_" (sorts nicely)
  • Views that are treated like tables have no prefix
  • Views created for special reports, etc. have a "v" prefix
  • Indexed views created for performance have an "ixv" prefix
  • All indexes have purposeful names (no auto-naming)
  • Strongly prefer uniqueidentifier (with sequential increment) over int IDENTITY for surrogate keys
  • Don't artificially limit VARCHAR/NVARCHAR fields to 100 or 255. Give them room to breath. This isn't the 1980s, fields are not stored padded to their max length.
  • 3NF minimum standard
  • Prefer joining tables to column-level foreign keys: many 1:m assumptions are challenged as a system grows over time.
  • Always use surrogate keys, not natural keys, as the primary key. All assumptions about "natural" keys (SSNs, usernames, phone numbers, internal codes, etc.) will eventually be challenged.


回答19:

Tabular formatted SQL.

select a.field1, b.field2
from       any_table   a
inner join blah        b on b.a_id       = a.a_id
inner join yet_another y on y.longer_key = b.b_id
where a.field_3         > 7
and   b.long_field_name < 2;

Part of this is to use uniformly long alias names (in the example, here, a, b, and y are all length 1).

With this kind of formatting, I can more quickly answer common questions like, "what table is aliased by 'a'?" and "which fields join table T into the query?" The structure doesn't take long to apply or to update, and I find that it saves a lot of time. We spend more time reading code than writing it.



回答20:

Document everything; wiki type documentation is easy to setup and the software is free.

Make sure you understand the interface first and design the database second. Most of the time its a lot better to know how the data you are going to use needs to work and then engineer the database. Most bad DB design happens as things evolve not upfront.

Then define the database standard and version you are going to work to. Define standards for the code elements (views, functions etc), database naming; naming conventions for columns, tables; type conventions for columns; coding templates.

Spend time considering how you define types having standard database types for fields or bespoke types are a good thing to sort out upfront.

As part of your documentation include a list of don'ts as well as dos for the application which include your prefered hated functionality cursors, triggers.

Review it regularly.



回答21:

13- Evaluate your queries

Thats true. Sometimes you don't get what you wanted.

For me, it's always useful to name the tables and fields with their exact content and (for us) in clear spanish and using Upper Camel Case, with no whitespaces:

User Name: NombreUsuario

First Last Name: ApellidoPaterno

Second Last Name: ApellidoMaterno

etc etc



回答22:

Taking "database" to mean "SQL product", my answer is, "Too many to mention. You could write a whole book on the subject." Happily, someone has.

We use Joe Celko's SQL Programming Style (ISBN 978-0120887972): "this book is a collection of heuristics and rules, tips, and tricks that will help you improve SQL programming style and proficiency, and for formatting and writing portable, readable, maintainable SQL code."

Advantages of this approach is include:

  • the guy knows more about this kind of thing than me (is there another book on SQL heuristics?!);
  • the work has already been done e.g. I can give the book to someone on the team to read and refer to;
  • if someone doesn't like my coding style I can blame someone else;
  • I recently got a load of rep on SO by recommending another Celko book :)

In practice we do deviate from the prescriptions of The Book but surprisingly rarely.



回答23:

In MS-SQL, I've always had objects owned by dbo., and I prefix calls to those objects with dbo.

Too many times I've seen our devs wonder why they can't call their objects that they inadvertainly owned.



回答24:

Avoid silly abbreviation conventions, such as comprehensive dictionaries of abbreviations that actively encourage monstrosities like EMP_ID_CONV_FCTR_WTF_LOL_WAK_A_WAK_HU_HU. This rule is inspired a real set of guidelines I've seen before.



回答25:

MVP Aaron Bertrand's "My stored procedure "best practices" checklist"



回答26:

Table name matches Primary Key name and description key

I have just recently, after years of agreeing with this, jumped ship, and now have an "ID" column on every table.

Yes I know, when linking tables it's abiguous! But so is linking ProductID to ProductID, so uhh, why the extra typing?

This:

SELECT p.Name, o.Quantity FROM Products p, Orders o WHERE o.ProductID = p.ID

Is slightly better than this:

SELECT p.Name, o.Quantity FROM Products p, Orders o WHERE o.ProductID = p.ProductID

Note that both will require table or alias prefixes. But not only am I typing slightly less (multiply that across dozens of tables with long descriptive names and it adds up fast in a data intensive application) but it also makes it easier to know which table is the parent table in every join, which, when joining 8-10 tables in a query, can help quite a bit.



回答27:

I follow a lot of the same conventions as others here, but I wanted to say a few things that haven't been said yet.

Regardless of whether you like plural names or singular names for your tables, be consistent. Choose one or the other, but don't use both.

The primary key in a table has the same name as the table, with the suffix _PK. Foreign keys have their same name as their corresponding primary key, but with a suffix of _FK. For example, the Product table's primary key is called Product_PK; in the Order table the corresponding foreign key is Product_FK. I picked this habit up from another DBA friend of mine and so far I'm liking it.

Whenever I do an INSERT INTO...SELECT, I alias all the columns in the SELECT portion to match the names of the columns from the INSERT INTO portion to make it easier to maintain and see how things match up.



回答28:

The most important standard is: don't have a database by default. I find too many developers grabbing a database for projects where life would have been much easier without one (at least yet). It is just a tool in the toolbox, and not every problem is a nail.

Inappropriate use of a database leads to anemic domain models, badly testable code and unneeded performance problems.



回答29:

I agree with just about everything you have put there except for #5. I often use prefixes for tables and stored procedures because the systems that we develop have lots of different functional areas, so I will tend to prefix the tables and sprocs with an identifier that will allow for them to group nicely in Management Studio based on what area they belong to.

Example: cjso_Users, cjso_Roles, and then you have routing_Users, routing_Roles. This may sound like replication of data, but in reality the two different user/roles tables are for completely separate functions of the system (cjso would be for the customer-based ecommerce app while the routing would stand for employees and distributors who use the routing system).



回答30:

I like our table naming convention:

People Table
PEO_PersonID
PEO_FirstName 
...

Which helps make larger querys a bit more readable. and joins make a bit more sense:

Select * -- naughty!
From People
Join Orders on PEO_PersonID = ORD_PersonID
--...

i guess rather than what the naming convention is, is the consistency of the naming.