Is VARCHAR like totally 1990s? [closed]

2020-02-20 05:56发布

  1. VARCHAR does not store Unicode characters.
  2. NVARCHAR does store Unicode characters.
  3. Today's applications should always be Unicode compatible.
  4. NVARCHAR takes twice the amount of space to store it.
  5. Point 4 doesn't matter because storage space is extremely inexpensive.

Ergo: When designing SQL Server databases today, one should always use NVARCHAR.

Is this sound reasoning? Does anyone disagree with any of the premises? Are there any reasons to choose VARCHAR over NVARCHAR today?

14条回答
小情绪 Triste *
2楼-- · 2020-02-20 06:14

Your point 3 is invalid. Systems that are designed only for a single country's use don't have to worry about unicode, and some languages/products in use don't support unicode either at all or only partially. For example, TurboTax is only for the U.S. (and even with a Canadian version with French is still just LATIN-1), so they wouldn't need or have to worry about unicode and probably don't support it (I don't know if they do or not, but even if they do, it's just an example).

"Today's applications should always be Unicode compatible."

is probably more valid expressed as:

"Today's applications should always be Unicode compatible if nothing special needs to occur to handle Unicode properly, and a previously existing codebase or any other piece of the application does not need to be updated specifically to support it"

查看更多
The star\"
3楼-- · 2020-02-20 06:15

You match the datatype with the data that will be stored in the column. By a similar argument you could say why not store all data in NVARCHAR columns, because numbers and dates can be represented as strings of digits.

If the best match for the data that will be stored in the column is VARCHAR, then use it.

查看更多
Bombasti
4楼-- · 2020-02-20 06:15

Storage is less expensive than it's ever been historically, but still if you can store twice as much data on a given hard drive, that's attractive, isn't it?

Also there's RAM for caching, and solid-state drives, which are both a lot more expensive than hard drives. It's beneficial to use more compact data formats when you have millions of rows.

查看更多
狗以群分
5楼-- · 2020-02-20 06:18

My leaning is "use NVARCHAR" as a default... but @CadeRoux has a good point: if you are SURE the data will never hold anything but ASCII -- like a US license plate -- VARCHAR might save you a tiny bit of cost.

I'd say the flip side of his well-put statement is "DO use NVARCHAR" for anything that will have names (people, streets, places) or natural language text (email, chat, articles, blog postings, photo captions). Otherwise, your "firstname" column will not be able to encode "François" or "José" correctly, and your text columns will not allow text with "foreign" diacritcal marks, or -- for that matter -- very common US characters like cent-mark "¢", paragraph mark "¶", a bullet "•". (Because none of those are ASCII characters, and there is no good, standard way to put them in to a VARCHAR field. Trust me: you'll hurt yourself.)

On ANY project I've worked on, I've NEVER been scolded for using NVARCHAR because I was "squandering too much company money on disk space". And if I had to rework code or the DB schema (especially on a live, production system), the cost spent in the re-fit would EASILY outweigh the "savings" from buying a disk that was 50% smaller.

To really understand this question you really have to understand ASCII, Unicode, and Unicode's typical encodings (like UCS-2 and UTF-8).

查看更多
smile是对你的礼貌
6楼-- · 2020-02-20 06:19

Is there a way for your database server to use UTF-8 as an encoding? You then get the benefits of low storage for mostly ASCII loads, and the ability to store anything in the range of Unicode so that expansion is possible.

I would ask your database vendor to support UTF-8 as an encoding for the VARCHAR SQL type, as well. I don't know how other DB servers do it, but I do know that you can use UTF-8 in VARCHAR and TEXT fields in at least MySQL and PostgreSQL.

All that having been said though, the only reason to not use a UTF-16 encoded field is if you have to interact with applications which will break on UTF-16 input. This would be most legacy applications which were designed to handle ASCII or ISO-8815 text encodings, which would be better off processing UTF-8.

查看更多
我想做一个坏孩纸
7楼-- · 2020-02-20 06:24

Point 4 doesn't matter because storage space is extremely inexpensive.

it is not just storage, but bandwidth - cpu, memory, backup, recovery, transfer. Conserve.

查看更多
登录 后发表回答