Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Hive skewed tables


Copy link to this message
-
Hive skewed tables
Hi All,

I have the following skewed table "addresses_1"

select id, count(*) c from addresses_1 group by id order by c desc limit 10;
142624653    1554806
198477395    958492
102641838    220181
138947865    211331
156483436    193429
96411677    179771
210082076    168033
800174765    152421
139116901    141207
704352025    137263

I was able to create the following table with the skew information.  And I
was able to load the data into the table as well.

 CREATE  TABLE skew_addresses_1(
  id bigint,
  address_id bigint,
  address_lines string,
  city string,
  state string,
  postal_code string,
  country string,
  latitude string,
  longitude string,
  ) PARTITIONED BY (dateTS string) SKEWED BY (id) ON (142624653, 198477395,
102641838, 138947865, 156483436, 96411677, 210082076, 800174765, 139116901,
704352025)
   stored as rcfile;

select id,count(*) c from skew_addresses_1 where id=142624653 group by id
order by c limit 10;

*However, at the time of running select query,  entire dataset is
scanned. *I thought only the relevant dataset (with skew information
will be
scanned).  Am I missing anything here?  Any help will be appreciated.  I am
using Hive 10.x

I have enabled hive.optimize.skewjoin.compiletime=true and I can see the
skew information populated in SKEWED_COL_NAMES in metadata.  But there is
no information in SKEWED_COL_VALUE_LOC_MAP table.
--
~Rajesh.B
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB