Redshift Querying: error xx000 disk full redshift

2019-07-10 18:03发布

I executed the below query

select employee_name, max(employee_dept) as dept
from employeeDB 
where employee_name is not null and employee_name != ''
group by employee_name
order by employee_name asc
limit 1000

and received the error ERROR: XX000: Disk Full.

upon investigation by executing the below query i found that i have 941 GB free space and 5000 GB used space.

select
  sum(capacity)/1024 as capacity_gbytes, 
  sum(used)/1024 as used_gbytes, 
  (sum(capacity) - sum(used))/1024 as free_gbytes 
from 
  stv_partitions where part_begin=0;

Can anyone suggest how to mitigate this challenge so that i can get the desired results?

2条回答
何必那么认真
2楼-- · 2019-07-10 18:43
+-------+              +-------+
|-------|              |-------|
||10 kb||              ||25 kb||
+-------+              +-------+
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx+------------->+xxxxxxx|
+-------+              |xxxxxxx|
||10 kb||              |xxxxxxx|
+-------+              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
+-------+              |xxxxxxx|
||05 kb||              |xxxxxxx|
+-------+              +-------+

Look at the representation above. Let us assume that xxxxxxx represents the occupied space on the disk while the numbers represent the empty space available.

Both the scenarios represent a vacant space of 25 kb. But in case 1, if you have to insert (or perform operations) that would require a contiguous memory allocation of, say 15 kb, you won't be able to do that. Although a space of 25 kb is available, but since that isn't contiguous, you might get a Memory / Disk Full Error and thus either the space will go waste or will be assigned for tasks that are very low on memory requirement.

In case 2, a block of contiguous memory is available. A task requiring ~25kb of memory can easily be executed,

This isn't only with Redshift or DBMS; it holds true with anything that remotely involves memory management, including Operating Systems.

What causes such memory partitions (called Fragmentation)?

Fragmentation is caused by continuously creating and deleting (modifying) files on disk. When a file occupying a space is removed, it creates a gaping memory hole there. A file of size less than the memory hole can occupy that space or the space will go waste.

What should be done?

Defragment! In your specific case, Amazon Redshift provides the provision to VACUUM tables and/or schema. You might be having enough disk space, but not enough contiguous memory that the engine would be able to allocate to the task executed by you.

查看更多
再贱就再见
3楼-- · 2019-07-10 18:47

Free disk space is important for query execution on Redshift. That's why the VACUUM process is important and should be executed regularly, especially for tables where deletions happen often.

Have you VACUUMed your tables lately?

Check the VACUUM documentation and also look at the Amazon Redshift at 100% disk usage due to VACUUM query question on StackOverflow.

查看更多
登录 后发表回答