I have a hive table (with compression) with definition like
create table temp1 (col1 string, col2 int)
partitioned by (col3 string, col4 string)
row format delimited
fields terminated by ','
escaped by '\\'
lines terminated by '\n'
stored as sequencefile;
When I do a simple select and insert (no reducers running) from another hive table to this table i see a unique pattern, data in this table with compression gets split in high no of files of very small size (table 1 : at times 1gb data gets split over 200-300 files thus increasing the no of blocks consumed though it should have spanned only 16blocks) due to this very high no of maps are formed when I query this new table. File size does not go beyond 245mb (table 2). Is there a setting to restrict this to 64mb (or multiple of 64mb or just a single file) as my block size is 64 mb and hence excess blocks will not get created.
TABLE 1
Name | Type | Size | Block Size
000000_0 | file | 30.22MB | 64 MB
000001_0 | file | 26.19MB | 64 MB
000002_0 | file | 25.19MB | 64 MB
000003_0 | file | 24.74MB | 64 MB
000004_0 | file | 24.54MB | 64 MB
..........
000031_0 | file | 0.9MB | 64 MB
TABLE 2
Name | Type | Size | Block Size
000000_0 | file | 245.02MB | 64 MB
000001_0 | file | 245.01MB | 64 MB
000002_0 | file | 244.53MB | 64 MB
000003_0 | file | 244.4MB | 64 MB
000004_0 | file | 198.21MB | 64 MB