Difference between MIN(__$start_lsn) and fn_cdc_ge

2019-08-30 07:50发布

Using CDC on SQL Server 2012.

I have a table (MyTable) which is CDC enabled. I thought the following two queries would always return the same value:

SELECT MIN(__$start_lsn) FROM cdc.dbo_MyTable_CT;
SELECT sys.fn_cdc_get_min_lsn('dbo_MyTable');

But they don't seem to do so: in my case the first one returns 0x00001EC6000000DC0003 and the second one 0x00001E31000000750001, so the absolute minimum in the table is actually greater than the value returned by fn_cdc_get_min_lsn.

My questions:

  1. Why are the results different?
  2. Is there any problem with using the value from the first query as the first parameter on fn_cdc_get_all_changes_dbo_MyTable? (all examples I've seen use the value from the second query)

2条回答
姐就是有狂的资本
2楼-- · 2019-08-30 08:28

sys.fn_cdc_get_min_lsn returns the minimum available lsn for a change captured table.

Like @Ben says, this can be different (earlier) from the earliest change actually captured, for example when a table is first added to CDC and there haven't been any changes yet.

As per the MSDN doco you should always use this to validate your query ranges prior to execution because change data will eventually get cleaned up. So you will not only use this once - you will check it every time.

You should use this rather than getting the min LSN other ways because

  • it'll be faster (as Ben pointed out). Much faster potentially.
  • it's the documented API for doing so. The implementation of the backing tables might change in future versions etc...

Workflow is generally:

  • load your previous LSN from (your state)
  • query current LSN
  • query minimum available for the table
  • if prev > min available load changes only
  • otherwise load whole table and handle it (somehow)
  • save current LSN to (your state)
查看更多
▲ chillily
3楼-- · 2019-08-30 08:32

My understanding is that the first one returns the oldest LSN for the data that's currently in the CDC table and the latter reflects when the table was added to CDC. I will note though that you'll only want to use the minimum (whichever method you go with) once so you don't process duplicate records. Also, since the second method gets its result from sys.cdc_tables (which very likely has far fewer rows than your CDC table does), it's going to be more efficient.

查看更多
登录 后发表回答