-Hive skewed tables
Rajesh Balamohan 2013-11-14, 01:05
I have the following skewed table "addresses_1"
select id, count(*) c from addresses_1 group by id order by c desc limit 10;
I was able to create the following table with the skew information. And I
was able to load the data into the table as well.
CREATE TABLE skew_addresses_1(
) PARTITIONED BY (dateTS string) SKEWED BY (id) ON (142624653, 198477395,
102641838, 138947865, 156483436, 96411677, 210082076, 800174765, 139116901,
stored as rcfile;
select id,count(*) c from skew_addresses_1 where id=142624653 group by id
order by c limit 10;
*However, at the time of running select query, entire dataset is
scanned. *I thought only the relevant dataset (with skew information
scanned). Am I missing anything here? Any help will be appreciated. I am
using Hive 10.x
I have enabled hive.optimize.skewjoin.compiletime=true and I can see the
skew information populated in SKEWED_COL_NAMES in metadata. But there is
no information in SKEWED_COL_VALUE_LOC_MAP table.