I have read that enabling Change Data Capture obviously has an impact on database performance.
Would this performance loss only effect the tables that have CDC enabled or would it effect all actions across the database
In my situation I am using SSIS and have a large amount of data moving into and out of a staging database. I also have some lookup tables in the system that are used for transforms. I am hoping to use CDC as a means to audit changes to these reference tables only (not the imported data). The major queries I run as part of the ETL access these reference tables but don't change them so I am trying to figure out if there will still be a noticeable performance hit?
Thanks
The answer to that question would be both yes and no.
No because
When a table is enabled for change data capture, an associated capture
instance is created to support the
dissemination of the change data in
the source table. The capture instance
consists of a change table and up to
two query functions.
Untracked tables seem not to be not involved in tracking data.
Yes because
The source of change data for change
data capture is the SQL Server
transaction log. As inserts, updates,
and deletes are applied to tracked
source tables, entries that describe
those changes are added to the log.
The log serves as input to the change
data capture capture process. This
reads the log and adds information
about changes to the tracked table’s
associated change table.
As the source of changes come from the transaction log, dissemination of the the changes requires the capture instances to read and interprete the transaction log (disclaimer: my interpretation of things). Merely enabling CDC has a performance impact on your entire database.
Recommendations
Storage:
- When planning change data capture architecture, take a significant increase in log size and log volume I/O operations into account.
- Consider specifying a filegroup in sys.sp_cdc_enable_table.
- Consider changing the default filegroup for the database before you execute sys.sp_cdc_enble_db so that change data capture metadata and especially cdc.lsn_time_mappings are located on a different filegroup than PRIMARY.
Workload behavior:
- Try to avoid scenarios where a row needs to be updated immediately after insert.
- Try to avoid using change data capture to capture changes to tables that have frequent large update transactions.
Change data capture parameters:
- Always reduce the list of columns captured by change data capture to only the columns you really need to track.
- If you do not require support for net changes, set @ to 0.
- Use to see whether change data capture can keep up with your workload.
- If change data capture cannot keep up with your workload, modify scan job parameters and restart the scan job.
Conclusion
If your server currently has no problems keeping up with its load, I very much doubt you'll notice any performance problems enabling CDC for infrequent changed tables.
Sources
- Basics of Change Data Capture
- Tuning the Performance of Change Data Capture