I may be wrong, but it seems to be different opinions in the interwebs about what these are. SQL Server, MySQL, DB2, and PostgreSQL show different definitions for these tables.
After reading a ton from different vendors (database manuals, user posts, etc.) I was able to distinguish three types of tables of interest (there are many, many more types of no interest for this question). Please bear with me:
Heap Table:
- All rows are stored (probably unordered) in the heap table.
- Each row has an internal ROWID that identifies it.
- Indexes are optional. If added, they include the indexed columns as the index key, plus the ROWID (to eventually access the real rows in the heap).
- Note: this case is of no interest for this question, but I added it here to make a difference with the third case below.
Pure Index Table: <-- Is this a Clustered Index Table?
- There's one main index that includes the key columns, as well as the non-key columns in it. All the data is stored in the index.
- The data follows the main index order, so it's by definition sorted by the main index.
- There's no need for a heap table to store the rows. All data is already in the index. There's no ROWID whatsoever, since there's no heap table.
- SQL Server tables (typically) fall by default in this category.
- MySQL InnoDB tables seem to also fall in this category since they don't seem to have a heap table at all.
Index + Sorted Heap Table: <-- Is this a Clustered Index Table?
- There's one main "clustered index".
- There's a heap table where the rows are stored in the order defined by the clustered index.
- Each row in the heap table has a ROWID.
- The clustered index does not include non-key columns, but a ROWID to access the real row in the heap table.
- DB2 seems to be able to "Cluster" tables.
- PostgreSQL seems to also call these tables as "Clustering Index" tables.
Now, which ones of these #2 or #3 is a "Clustered Index Table"? Who's telling the truth and who's lying? :D
In other words, is the term "Clustered Index Table" a commercial term that each vendor freely defines as he/she pleases, or is there an official unique definition according to some official database theory?