SELECT FOR XML AUTO and return datatypes

2020-07-05 05:55发布

问题:

During playing with sys.dm_exec_describe_first_result_set I get to this point:

CREATE TABLE #tab(col INT, x XML );
INSERT INTO #tab(col,x) VALUES (1,NULL), (2,NULL), (3,'<a>x</a>');

SELECT 'Simple XML' AS description, name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
   N'SELECT col
     FROM #tab
     FOR XML AUTO', NULL, 0)  
UNION ALL
SELECT 'Wrapped with subquery', name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
   N'SELECT(SELECT col
            FROM #tab
            FOR XML AUTO) AS wrapped_subquery', NULL, 0)
UNION ALL 
SELECT 'XML column', name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
   N'SELECT x FROM #tab ', NULL, 0)
UNION ALL
SELECT 'Casted XML', name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
   N'SELECT CAST(''<o>O</o>'' AS XML) AS x', NULL, 0)
UNION ALL
SELECT 'Wrapped Casted XML', name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
   N'SELECT (SELECT CAST(''<o>O</o>'' AS XML) AS x) AS wrapped', NULL, 0)
UNION ALL
SELECT 'Text value', name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
N'SELECT CAST(''aaa'' AS NTEXT) AS text_string', NULL, 0)
UNION ALL
SELECT 'Wrapped Text Value', name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
  N'SELECT (SELECT CAST(''aaa'' AS NTEXT)) AS text_string_wrapped', NULL, 0)

LiveDemo

Output:

╔═══════════════════════╦═════════════════════════════════════════╦══════════════════╗
║      Description      ║                   name                  ║ system_type_name ║
╠═══════════════════════╬═════════════════════════════════════════╬══════════════════╣
║ Simple XML            ║ XML_F52E2B61-18A1-11d1-B105-00805F49916 ║ ntext            ║
║ Wrapped with subquery ║ wrapped_subquery                        ║ nvarchar(max)    ║
║ XML column            ║ x                                       ║ xml              ║
║ Casted XML            ║ x                                       ║ xml              ║
║ Wrapped Casted XML    ║ wrapped                                 ║ xml              ║
║ Text value            ║ text_string                             ║ ntext            ║
║ Wrapped Text Value    ║ text_string_wrapped                     ║ ntext            ║
╚═══════════════════════╩═════════════════════════════════════════╩══════════════════╝

And:

SELECT col        -- SSMS result grid - XML column
FROM #tab
FOR XML AUTO

SELECT(SELECT col  -- text column
       FROM #tab
       FOR XML AUTO) AS wrapped_subquery

Questions:

  1. Why FOR XML AUTO doesn't return XML/NVARCHAR(MAX) datatype but ntext (deprecated datatype!)?
  2. How wrapping with subquery changes datatype from ntext to nvarchar(max)?
  3. Why the same rules don't apply to XML/NTEXT columns?

I know my questions may be technical and internal operations, but I would be grateful for any insight or documentation in MSDN/Connect?

EDIT:

Funny thing is when I use normal table(not temporary) it returns all ntext:

╔════════════════════════╦═══════════════════════════════════════╦══════════════════╗
║      description       ║                   name                ║ system_type_name ║
╠════════════════════════╬═══════════════════════════════════════╬══════════════════╣
║ Simple XML             ║ XML_F52E2B61-18A1-11d1-B105-00805F499 ║ ntext            ║
║ Wrapped with subquery  ║ wrapped_subquery                      ║ ntext            ║
║ XML column             ║ x                                     ║ ntext            ║
║ Casted XML             ║ x                                     ║ ntext            ║
║ Wrapped Casted XML     ║ wrapped                               ║ ntext            ║
║ Text value             ║ text_string                           ║ ntext            ║
║ Wrapped Text Value     ║ text_string_wrapped                   ║ ntext            ║
╚════════════════════════╩═══════════════════════════════════════╩══════════════════╝

SqlFiddleDemo

According to TYPE directive:

SQL Server support for the xml (Transact-SQL) enables you to optionally request that the result of a FOR XML query be returned as xml data type by specifying the TYPE directive.

SQL Server returns XML data type instance data to the client as a result of different server-constructs such as FOR XML queries that use the TYPE directive, or where the xml data type is used to return XML instance data values from SQL table columns and output parameters. In client application code, the ADO.NET provider requests this XML data type information to be sent in a binary encoding from the server. However, if you are using FOR XML without the TYPE directive, the XML data comes back as a string type.

And:

SELECT 'Simple XML' AS description, name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
   N'SELECT col AS col
     FROM #tab
     FOR XML AUTO, TYPE', NULL, 0)  
UNION ALL
SELECT 'Wrapped with subquery', name, system_type_name
FROM sys.dm_exec_describe_first_result_set(
   N'SELECT(SELECT col
            FROM #tab
            FOR XML AUTO,TYPE) AS wrapped_subquery', NULL, 0);

LiveDemo

  1. Why ntext not nvarchar(max) as in quote the XML data comes back as a string type and where is the difference normal/temp table?

回答1:

FOR XML was introduced in SQL Server 2000.

SQL Server 2000 did not have MAX datatypes or the XML datatype. Nor was it possible to use FOR XML in a sub query.

The article What does server side FOR XML return? explains

In SQL Server 2000 ... FOR XML... was implemented in the layer of code between the query processor and the data transport layer ... the query processor produces the result the same way as without FOR XML and then FOR XML code formats the rowset as XML. For maximum XML publishing performance FOR XML does steaming XML formatting of the resulting rowset and directly sends its output to the server side TDS code in small chunks without buffering whole XML in the server space. The chunk size is 2033 UCS-2 characters. Thus, XML larger than 2033 UCS-2 characters is sent to the client side in multiple rows each containing a chunk of the XML. SQL Server uses a predefined column name for this rowset with one column of type NTEXT - “XML_F52E2B61-18A1-11d1-B105-00805F49916B” – to indicate chunked XML rowset in UTF-16 encoding.

So it appears that this is still implemented the same way for top level FOR XML in later versions too.

SQL Server 2005 introduced the ability to use FOR XML in subqueries (meaning that these now need to be handled by the query processor rather than a layer outside it whilst streaming the results to the client)

The same article explains that these will be typed as NVARCHAR(MAX) or XML dependant on the presence or not of a type directive.

As well as the datatype difference this does mean the additional SELECT wrapper can make a drastic difference in performance if #tab is big.

/*Can be streamed straight out to client without using server storage*/
SELECT col
FROM #tab
FOR XML AUTO

/*XML constructed in its entirety in tempdb first*/
SELECT(SELECT col
FROM #tab
FOR XML AUTO) AS wrapped_subquery

It is possible to see the different approaches in the call stacks as well as execution plans.

Directly streamed

sqllang.dll!CXMLExecContext::AddTagAndAttributes()  + 0x5a9 bytes                   
sqllang.dll!CXMLExecContext::AddXMLRow()  + 0x2b7 bytes                 
sqltses.dll!CEsExec::FastMoveEval()  + 0x9c bytes                   
sqllang.dll!CXStmtQuery::ErsqExecuteQuery()  + 0x280 bytes                  
sqllang.dll!CXStmtXMLSelect::WrapExecute()  + 0x2d7 bytes                   
sqllang.dll!CXStmtXMLSelect::XretDoExecute()  + 0x355 bytes                 
sqllang.dll!CXStmtXMLSelect::XretExecute()  + 0x46 bytes                    
sqllang.dll!CMsqlExecContext::ExecuteStmts<1,1>()  + 0x368 bytes                    
sqllang.dll!CMsqlExecContext::FExecute()  + 0x6cb bytes                 
sqllang.dll!CSQLSource::Execute()  + 0x3ee bytes                    
sqllang.dll!process_request()  + 0x757 bytes    

With sub query

sqllang.dll!CXMLExecContext::AddTagAndAttributes()  + 0x5a9 bytes
sqllang.dll!CXMLExecContext::AddXMLRow()  + 0x2b7 bytes
sqllang.dll!CForXmlSerialize::ProcessRow()  + 0x19 bytes
sqllang.dll!CUDXR_Base::PushRow()  + 0x30 bytes
sqlmin.dll!CQScanUdx::Open()  + 0xd5 bytes
sqlmin.dll!CQueryScan::StartupQuery()  + 0x170 bytes
sqllang.dll!CXStmtQuery::SetupQueryScanAndExpression()  + 0x391 bytes
sqllang.dll!CXStmtQuery::InitForExecute()  + 0x34 bytes
sqllang.dll!CXStmtQuery::ErsqExecuteQuery()  + 0x217 bytes
sqllang.dll!CXStmtSelect::XretExecute()  + 0xed bytes
sqllang.dll!CMsqlExecContext::ExecuteStmts<1,1>()  + 0x368 bytes
sqllang.dll!CMsqlExecContext::FExecute()  + 0x6cb bytes
sqllang.dll!CSQLSource::Execute()  + 0x3ee bytes
sqllang.dll!process_request()  + 0x757 bytes

Both end up calling the same underlying XML code but the "unwrapped" version doesn't have any XML iterators in the plan itself, the result is achieved by replacing method calls from CXStmtSelect with CXStmtXMLSelect instead (represented in the plan as an XML Select root node rather than a plain old Select).


On SQL Server 2016 CTP3 I still see ntext for top level FOR XML. However top level FOR JSON shows up as nvarchar(max)

At least in the CTP the JSON special column name still contains the GUID F52E2B61-18A1-11d1-B105-00805F49916B despite the fact that the origin of this is the IXMLDocument Interface.

The plans look much the same though the XML Select is replaced with a JSON Select


BTW: On build Microsoft SQL Server 2014 - 12.0.4213.0 (X64) I don't see any difference in behaviour between temp tables and permanent tables. This is probably down to the different @@Version between the environments your question uses http://sqlfiddle.com/ (12.0.2000.8) and https://data.stackexchange.com/ (12.0.4213.0).

Maybe a bug was fixed in sys.dm_exec_describe_first_result_set between the two 2014 builds.

On 2012 I get the same results as Shnugo on 11.0.5343.0 (with NULL in the first three rows) but after installing SP3 11.0.6020.0 I get the same as your initial results shown in the question.



回答2:

Interesting question! The NTEXT is really strange!

I have an idea about the sub-query: When you return an XML it is ever transfered as string, unless you specify TYPE (You surely know this from nested XML with CROSS APPLY or from string concatenation with STUFF, where you somethimes see this with TYPEand a follwing .value() - and sometimes "naked".

I could not really reproduce your results (SQL Server 2012 too). Simple copy'n'paste comes back with (I wanted to test with declared table variable and return value of a function):

Simple XML              NULL                  NULL
Wrapped with subquery   NULL                  NULL
XML column              NULL                  NULL
Casted XML              x                     xml
Wrapped Casted XML      wrapped               xml
Text value              text_string           ntext
Wrapped Text Value      text_string_wrapped   ntext

EDIT: There was a new observation I thought was unclear, but it was my mistake... Took it away...