Using Polybase to load data into an existing table

Using CTAS we can leverage the parallelism that Polybase provides to load data into a new table in a highly scalable and performant way.

Is there a way to use a similar approach to load data into an existing table? The table might even be empty.

Creating an external table and using INSERT INTO ... SELECT * FROM ... - I would assume that this goes through the head node and is therefore not in parallel?

I know that I could also drop the table and use CTAS to recreate it but then I have to deal with all the metadata again (column names, data types, distributions, ...).

标签： azure-sql-database azure-sqldw azure-sql-server

1条回答

甜甜的少女心

2楼-- · 2020-03-08 04:05

You could use partition switching to do this, although remember not to use too many partitions with Azure SQL Data Warehouse. See 'Partition Sizing Guidance' here.

Bear in mind check constraints are not supported so the source table has to use the same partition scheme as the target table.

Full example with partitioning and switch syntax:

-- Assume we have a file with the values 1 to 100 in it.

-- Create an external table over it; will have all records in
IF NOT EXISTS ( SELECT * FROM sys.schemas WHERE name = 'ext' )
EXEC ( 'CREATE SCHEMA ext' )
GO


-- DROP EXTERNAL TABLE ext.numbers
IF NOT EXISTS ( SELECT * FROM sys.external_tables WHERE object_id = OBJECT_ID('ext.numbers') )
CREATE EXTERNAL TABLE ext.numbers (
    number          INT             NOT NULL
    )
WITH (
    LOCATION = 'numbers.csv',
    DATA_SOURCE = eds_yourDataSource, 
    FILE_FORMAT = ff_csv
);
GO

-- Create a partitioned, internal table with the records 1 to 50
IF OBJECT_ID('dbo.numbers') IS NOT NULL DROP TABLE dbo.numbers

CREATE TABLE dbo.numbers
WITH (
    DISTRIBUTION = ROUND_ROBIN,
    CLUSTERED INDEX ( number ), 
    PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
    )
AS 
SELECT * 
FROM ext.numbers
WHERE number Between 1 And 50;
GO

-- DBCC PDW_SHOWPARTITIONSTATS ('dbo.numbers')


-- CTAS the second half of the external table, records 51-100 into an internal one.
-- As check contraints are not available in SQL Data Warehouse, ensure the switch table
-- uses the same scheme as the original table.
IF OBJECT_ID('dbo.numbers_part2') IS NOT NULL DROP TABLE dbo.numbers_part2

CREATE TABLE dbo.numbers_part2
WITH (
    DISTRIBUTION = ROUND_ROBIN,
    CLUSTERED INDEX ( number ),
    PARTITION ( number RANGE LEFT FOR VALUES ( 50, 100, 150, 200 ) )
    )
AS 
SELECT *
FROM ext.numbers
WHERE number > 50
GO


-- Partition switch it into the original table
ALTER TABLE dbo.numbers_part2 SWITCH PARTITION 2 TO dbo.numbers PARTITION 2;


SELECT *
FROM dbo.numbers
ORDER BY 1;

0人赞添加讨论(0) 举报

Using Polybase to load data into an existing table

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间