NCHAR(1) vs BIT

2019-04-29 21:29发布

问题:

I'm working under refactoring of database (SQL Server 2008) scheme and gather arguments to change NCHAR(1) columns (which keep Y|N values) to BIT. Everybody understand this is necessary and don't know why does it take place but this change affects production database so weighty arguments are required. Table keeps address catalog (up to 1m of records).

First argument I found - each nchar fields take 2 bytes, each 8 bit fields - 1 byte (next 8 - additional 1 byte).

What's next? Maybe some indexes performance issues?

回答1:

A bit field helps your logic by automatically enforcing what is currently an implicit business rule (i.e., this column can only contain 'Y' or 'N'). If you're enforcing that rule programmatically, you can save by eliminating that overhead. Indexing a bit column on its own has little value due to the low cardinality, but it could be useful as part of a composite index.

See also:

  • Bit vs. Char(1) in SQL Server
  • Should I index a bit field in SQL Server?


回答2:

I would hesitate to provide any arguments for such a change unless you had a good reason to make that change. i.e. you have to balance the cost of a change to what you would personally of done / prefer, vs the cost of actually implementing it and the benefits.

Have you checked whether the use of nchar(1) is hurting performance, or are you falling into the trap of premature-optimization? You are only talking about 1 million records here.

For the minor storage / IO cost you think you are incurring, consider the total man hours to change, retest and upgrade the system * hourly rate vs cost of just buying a faster disk. I suspect the disk will be far cheaper - as well as benefit every aspect of the system.



回答3:

One common reason to find NCHAR(1) instead of bit is that Oracle did not support a bit type. If you had an Oracle or Oracle-trained developer, or a database that used to run on Oracle, you're gonna see this a lot. In Sql Server, there's really no need for this.

However, I've found that most places where I have a bit field (or NCHAR(1) in Oracle) what I really want is a datetime that indicates not as much the value of the flag but exactly when it became true. This isn't always that case, but when I think back about old code I've written I'd guess that 4 out of 5 times I used a bit field I should have used a datetime.



回答4:

Create the bit field, add a computed column that emulates the nchar(1) for now.

What not to use nchar:

  • Y vs y vs some unicode Y
  • Overhead of checking Y or N
  • Not natively "true" o "false" (eg won't map directly to .net boolean)
  • Y and N are English. Ja/Nein, Oui/Non etc

You shouldn't index this anyway so it comes down to efficient storage and use. bit is

  • smaller
  • datatype safe (eg no CHECK needed)
  • maps to client meaning directly
  • independent of region

Saying that, we use a smalldatetime "WhenInactive" field as a substitute for "IsActive" field. NULL = active.



回答5:

If you are using LINQ2SQL or Entity Framework a BIT column will translate into a bool, but NCHAR(1) will translate into a string.



回答6:

Is the field used extensively in queries Where fld = 'Y'?

If so i would consider doing a test to see whether or not changing it to bit impacts performance.

Changing it now just because it ought to be a bit field since you're storing boolean values on a table of 1m+ records doesn't sound like a good idea to me either and i'd go with @Andrew's answer.



回答7:

Use Bit:

  • Logical representation / expressiveness of intent - since boolean states aren't always expressable consistently as Yes or No, which would then mean you would either need to be inconsistent in modelling bits, or non-intuitive, e.g. True/False (T/F), On/Off (?O/F), Open/Closed(O/C) etc.

  • Referential integrity - non-nullable bit can be restricted to only 0 or 1. Unless you add constraints, your *char(1) could be Y,N, X or .

  • Bits can be packed, so could have smaller storage.

  • Re: Performance : Indexing of bit (or few-state CHAR) columns is usually a waste, unless there is high selectivity of either 0 or 1 in the data. In this case, a filtered index on the selective value would be a good idea.

(Migrated from deleted answer here)



回答8:

I had a few occasions where we wanted a bit field but couldn't know for sure there would never be the need for a third or fourth value in that field. We therefore structured it as a string field containing Y or N. Of course, we only did this in very unique situations.