If I have fields of NVARCHAR (or NTEXT) data type in a Microsoft SQL Server database, what would be the equivalent data type in a PostgreSQL database?
标签:
postgresql
相关问题
- Django distinct is not working
- PostgreSQL: left outer join syntax
- Connecting Python to a Heroku PostgreSQL DB?
- PostgreSQL - Deleting data that are older than an
- Does PLV8 support making http calls to other serve
相关文章
- postgresql 关于使用between and 中是字符串的问题
- postgresql 月份差计算问题
- Using boolean expression in order by clause
- Table valued Parameter Equivalent in Postgresql
- in redshift postgresql can I skip columns with the
- Oracle equivalent of PostgreSQL INSERT…RETURNING *
- PostgreSQL field data type for IPv4 addresses
- Using prepared statement in stored function
It's varchar and text, assuming your database is in UNICODE encoding. If your database is in a non-UNICODE encoding, there is no special datatype that will give you a unicode string - you can store it as a bytea stream, but that will not be a string.
I'm pretty sure postgres varchar is the same as Oracle/Sybase/MSSQL nvarchar even though it is not explicit in the manual:
http://www.postgresql.org/docs/7.4/static/datatype-character.html
Encoding conversion functions are here:
http://www.postgresql.org/docs/current/static/functions-string.html http://www.postgresql.org/docs/current/static/functions-string.html#CONVERSION-NAMES
Example:
Also, there is this response to a similar question from a Postgresql rep:
Standard TEXT datatype is perfectly fine for it.
Short answer: There is no PostgreSQL equivalent to SQL Server NVARCHAR.
The types of NVARCHAR(N) on different database are not equivalent. The standard allows for a wide choice of character collations and encodings/character sets. When dealing with unicode PostgreSQL and SQLServer fall into different camps and no equivalence exists.
These differ w.r.t.
Thus moving data from one DB system (or encoding/character set) to another can lead to truncation/content loss.
Specifically there is no equivalent between a PostgreSQL (9.1) character type and SQL Server NVARCHAR.
You may migrate the data to a PostgreSQL binary type, but would then loose text querying capabilities.
(Unless PostgreSQL starts supporting a UTF-16 based unicode character set)
1) Length semantics
N is interpreted differently (Characters, Bytes, 2*N = Bytes) depending on database and encoding.
Microsoft SQL Server uses UCS2 encoding with the VARCHAR length interpreted as UCS-2 points, that is length*2 = bytes length ( https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2017 ):
their NVARCHAR(1) can store 1 UCS2 Characters (2 bytes of UCS2). Oracle UTF-encoding has the same semantics ( and internal CESU-8 storage).
Postgres 9.1 only has a Unicode UTF-8 character set (https://www.postgresql.org/docs/9.1/multibyte.html) , which, like Oracle (in AL32UTF8 or AL16UTF16 encoding) can store 1 full UCS32 codepoints. That is potentially up to 4 bytes (See e.g http://www.oracletutorial.com/oracle-basics/oracle-nvarchar2/ which explicitly state the NVARCHAR2(50) column may take up to 200 bytes).
The difference becomes significant when dealing with characters outside the basic multilingual plane which count as one "char unit" in utf8 ucs32 (go, char, char32_t, PostgreSQL ), but are represented as surrogate pairs in UTF-16 which count as two units ( Java, Javascript, C#, ABAP, wchar_t , SQLServer).
e.g. U+1F60A SMILING FACE WITH SMILING EYES will use up all space in SQL Server NVARCHAR(2). But only one character unit in PostgreSQL.
Classical enterprise grade DBs will offer at least a choice with UTF-16 like semantics (SAP HANA (CESU-8), DB 2 with collation, SQL Anywhere (CESU8BIN), ...) E.g. Oracle also offers what they misleadingly call an UTF-8 Collation, which is effectivly CESU-8. This has the same length semantics, representable content as UTF-16 (=Microsoft SQL Server) and is a suitable collation used by UTF-16 based enterprise systems ( e.g. SAP R/3 ) or under a Java application server.
Note that some databases may still interpret NVARCHAR(N) as a length in byte limitation, even with a variable length unicode encoding ( Example SAP IQ ).
2) Unrepresentable content
UTF-16 / CESU-8 based system can represent half surrogate pairs, while UTF-8/UTF-32 based system can not. This content is unrepresentable in this character set, but are a frequent occurrence in UTF-16 based enterprise systems. (e.g. Windows pathnames may contain such non-utf-8 representable characters, see e.g. https://github.com/rust-lang/rust/issues/12056). Thus UTF-16 is a "superset" of UTF-8/UTF-16 which is typically a killer-criteria when dealing with data from enterprise/os-systems based on this encoding ( SAP, Windows, Java, JavaScript ). Note that Javascript JSON encoding took specific care to be able to represent these characters (https://tools.ietf.org/html/rfc8259#page-10 ).
(2) and (3) are more relevant when migration queries, but not for data migration.
3) Binary sort order:
Note that binary sort order of CESU-8/UTF-16 is different than UTF-8/UTF-32.
UTF-16/CESU-8/Java/JavaScript/ABAP sort order:
UTF-8 / UCS-32 (go) sort order:
4) Padding semantics
Padding semantics differ on databases esp. when comparing VARCHAR with CHAR content.