I executed the below query
select employee_name, max(employee_dept) as dept
from employeeDB
where employee_name is not null and employee_name != ''
group by employee_name
order by employee_name asc
limit 1000
and received the error ERROR: XX000: Disk Full
.
upon investigation by executing the below query i found that i have 941 GB free space and 5000 GB used space.
select
sum(capacity)/1024 as capacity_gbytes,
sum(used)/1024 as used_gbytes,
(sum(capacity) - sum(used))/1024 as free_gbytes
from
stv_partitions where part_begin=0;
Can anyone suggest how to mitigate this challenge so that i can get the desired results?
Look at the representation above. Let us assume that
xxxxxxx
represents the occupied space on the disk while the numbers represent the empty space available.Both the scenarios represent a vacant space of 25 kb. But in case 1, if you have to insert (or perform operations) that would require a contiguous memory allocation of, say
15 kb
, you won't be able to do that. Although a space of 25 kb is available, but since that isn't contiguous, you might get aMemory / Disk Full Error
and thus either the space will go waste or will be assigned for tasks that are very low on memory requirement.In case 2, a block of contiguous memory is available. A task requiring
~25kb
of memory can easily be executed,This isn't only with Redshift or DBMS; it holds true with anything that remotely involves memory management, including Operating Systems.
Fragmentation is caused by continuously creating and deleting (modifying) files on disk. When a file occupying a space is removed, it creates a gaping memory hole there. A file of size less than the memory hole can occupy that space or the space will go waste.
Defragment! In your specific case, Amazon Redshift provides the provision to VACUUM tables and/or schema. You might be having enough disk space, but not enough contiguous memory that the engine would be able to allocate to the task executed by you.
Free disk space is important for query execution on Redshift. That's why the VACUUM process is important and should be executed regularly, especially for tables where deletions happen often.
Have you VACUUMed your tables lately?
Check the VACUUM documentation and also look at the Amazon Redshift at 100% disk usage due to VACUUM query question on StackOverflow.