我得到了与结构的大(> 100M行)的Postgres表{整数,整数,整数,时间戳没有时区}。 我期望的行的大小为3×整数+ 1 *时间戳= 3 * 4 + 1×8 = 20个字节。
在现实中,行大小是pg_relation_size(tbl) / count(*)
= 52个字节。 为什么?
(没有删除操作对表进行: pg_relation_size(tbl, 'fsm')
〜= 0)
我得到了与结构的大(> 100M行)的Postgres表{整数,整数,整数,时间戳没有时区}。 我期望的行的大小为3×整数+ 1 *时间戳= 3 * 4 + 1×8 = 20个字节。
在现实中,行大小是pg_relation_size(tbl) / count(*)
= 52个字节。 为什么?
(没有删除操作对表进行: pg_relation_size(tbl, 'fsm')
〜= 0)
Calculation of row size is much more complex than that.
Storage is typically partitioned in 8 kb data pages. There is a small fixed overhead per page, possible remainders not big enough to fit another tuple, and more importantly dead rows or a percentage initially reserved with the FILLFACTOR
setting.
More importantly, there is overhead per row (tuple). The HeapTupleHeader
of 23 bytes and alignment padding. The start of the tuple header as well as the start of tuple data are aligned at a multiple of MAXALIGN
, which is 8 bytes on a typical 64-bit machine. Some data types require alignment to the next multiple of 2, 4 or 8 bytes.
Quoting the manual on the system table pg_tpye
:
typalign
is the alignment required when storing a value of this type. It applies to storage on disk as well as most representations of the value inside PostgreSQL. When multiple values are stored consecutively, such as in the representation of a complete row on disk, padding is inserted before a datum of this type so that it begins on the specified boundary. The alignment reference is the beginning of the first datum in the sequence.Possible values are:
c
=char
alignment, i.e., no alignment needed.
s
=short
alignment (2 bytes on most machines).
i
=int
alignment (4 bytes on most machines).
d
=double
alignment (8 bytes on many machines, but by no means all).
Read about the basics in the manual here.
This results in 4 bytes of padding after your 3 integer
columns, because the timestamp
column requires double
alignment and needs to start at the next multiple of 8 bytes.
So, one row occupies:
23 -- heaptupleheader
+ 1 -- padding or NULL bitmap
+ 12 -- 3 * integer (no alignment padding here)
+ 4 -- padding after 3rd integer
+ 8 -- timestamp
+ 0 -- no padding since tuple ends at multiple of MAXALIGN
Finally, there is an ItemData
pointer (item pointer) per tuple in the page header (as pointed out by @A.H. in the comment) that occupies 4 bytes:
+ 4 -- item pointer in page header
------
= 52 bytes
So we arrive at the observed 52 bytes.
The calculation pg_relation_size(tbl) / count(*)
is a pessimistic estimation. pg_relation_size(tbl)
includes bloat (dead rows) and space reserved by fillfactor
, as well as overhead per data page and per table. (And we didn't even mention compression for long varlena data in TOAST tables, since it doesn't apply here.)
You can install the additional module pgstattuple and call SELECT * FROM pgstattuple('tbl_name');
for more information on table and tuple size.
Related answer:
每一行都有一个与之相关的元数据。 正确的公式是(假设幼稚对准):
3 * 4 + 1 * 8 == your data
24 bytes == row overhead
total size per row: 23 + 20
或约53个字节。 其实我写的PostgreSQL-varint专门帮助解决这个问题,这个确切的使用情况。 你可能想看看一个类似的职位了解更多详细信息重新:元组的开销。