guarantee order of table valued function results

2019-03-07 01:57发布

问题:

PREMISE: The application code cannot be changed. The conditions are very specific. I am looking for something off the books, a last resort workaround if you may.

I have a table-valued function (inline) that produces between 2 and 7 records. At times it could be only 1 or up to 15 (but rarely).

The function is only used by an application in this way, without any ORDER BY.

select * from dbo.myfunction(...)

Is there any way at all, in your experience, to guarantee ensure (as far as you have ever observed using a particular technique) that the results are returned ordered by the second column? Columns are: varchar(3), datetime, varchar(50). Don't get me started on select *, it is INTENTIONAL so that the front end will display however many columns I make the function display in the future.

From experience, with a single index (clustered PK) to traverse the data, any current version of SQL Server and SP level should always perform a simple INDEX SCAN on <20 records without parallelism, thereby giving me ordered results in the application select.

Your thoughts? I would prefer to keep theory out of the discussion. If you can stick to practical experience and keep sermons about best practice at home, I would also appreciate it.


UPDATED This is what it looks like now

create function dbo.myfunction(....)
returns @RES table
    (
    [#] int identity primary key clustered,
    [Varchar3Col] varchar(3),
    [DateTimeCol] datetime,
    [Varchar50Col] varchar(50)
    ) as
BEGIN
declare @RES2 table
    (
    rn int,
    [Varchar3Col] varchar(3),
    [DateTimeCol] datetime,
    [Varchar50Col] varchar(50)
    )

insert @RES2
select rn=row_number() over (order by action_time),
    [Varchar3Col]
    [DateTimeCol]
    [Varchar50Col]
from (.....)
inner join (.....) ON (.....)

declare @i int
set @i = 0
while @@rowcount > 0 begin
    set @i=@i+1
    insert @RES
    select [Varchar3Col], [DateTimeCol], [Varchar50Col]
    from @RES2
    where rn=@i
end
return
END
GO
  • If you look at the above, the population of @RES is done sequentially in the order desired, manually.
  • @RES has a clustered PK representing the order inserted.
  • the columns are small enough that 20 rows should always fit in a single 8K page

Would this work (with the straightforward SELECT from the application layer)?

回答1:

For an inline TVF nothing will really work. Not only that, the inline TVF may even return more rows than you believe it should, and the rows will be trimmed after the TVF executed (basically a predicate in the TVF definition can be pulled out of the TVF and moved somewhere else in the query tree). See T-SQL functions do no imply a certain order of execution for an example of this happening.

Converting the inline TVF to a multi statement one will introduce some procedural order, since the statements cannot be executed out of order, but the TVF result may be re-ordered, sorted, split, spooled, basically mangled by the optimizer generated plan and in the end break your assumption about output order. I'm afraid if you must have a certain order of execution, cursors are your best friend.



回答2:

Perhaps the question should have been:

What's the best I can do to encourage ordered results given that I cannot change the application to add an ORDER BY clause?"

Asking for a guarantee (as the original question does) should only ever produce the correct response: there is no guarantee without an outer ORDER BY clause.

One way to illustrate this lack of a guarantee is to look at the clustered index scan of the returned table variable in the query plan. Without an ORDER BY clause, the Ordered property shows False. This means the the query execution and storage engines are free to produce rows in any order. For example, it might be occasionally convenient for the storage engine to traverse the clustered index in reverse key order at runtime. The scan direction attribute is not present when the Ordered property is False.

Adding an ORDER BY clause would change the plan, minimally to set the Ordered property on the index scan to True, and Scan Direction to FORWARD. This, together with other guarantees and invariants in the code, act to ensure that rows are returned to the client ordered in the specified way.

These guarantees might affect everything from how order is preserved once it has been established, whether the index is scanned forward or backward, whether asynchronous or synchronous methods are used, and how rows are packaged up for transmission over the network.

For clarity, Ordered:True on an index scan is not sufficient on its own. The query plan is an abstraction of the code structure that actually executes - it should not be taken literally. No-one without source code access and the necessary skill can make any guarantees over and above what is documented.

So, leaving aside the fact the question is flawed, have you done everything you can to encourage a preserved ordering at the client? Probably. Possibly. Who knows?



回答3:

Is it just that you don't like the answer you are hearing? Truth is, order is only guaranteed with an order by clause. It's not an opinion, it is fact. There is no alternative if it is a guarantee you are looking for.



回答4:

If only a single app is using the function, in this particular way as you mentioned ( so not joined/applied to another object) this should have been a sproc ! Add an oder by clause to the data returning query to receive the result set in a particular order.

There is always a risk your fn will be used by another dev, but not in the way you intended it to be used, causing issues of all kind. Performance being the one I would be worried about.



回答5:

You'll have a better shot at a predictable query plan if you use a single-statement TVF instead of a multistatement TVF. ROW_NUMBER OVER should enfore the ordering you want in your RES2 query, and if it doesn't, just put it inside a CTE and order by your row number column. See below.

CREATE FUNCTION [dbo].[MyFunction]
(
/*
Parameters
*/
)
RETURNS TABLE
RETURN
WITH res2
(
    rn,
    Varchar3Col,
    DateTimeCol,
    Varchar50Col
)
AS
(
    SELECT
        ROW_NUMBER() OVER (ORDER BY action_time) AS rn,
        Varchar3Col,
        action_time AS DateTimeCol,
        Varchar50Col
    FROM
/*
        [...from statement...]
*/      
)
SELECT
    rn,
    Varchar3Col,
    DateTimeCol,
    Varchar50Col
FROM
    res2
ORDER BY
    rn;