Thanks Nitin. I have only one partition in this table for testing. I
thought within the partition it will scan only certain files based on
skewed fields. However it is scanning the entire data within the
On Nov 14, 2013 9:38 AM, "Nitin Pawar" <[EMAIL PROTECTED]> wrote:
> In my understanding,
> when you are saying scanning entire dataset it is looking at all your
> partitions because your data has been partitioned by the date column.
> A skewed table is a table where there will be different files created for
> all your skewed keys in all the partitions.
> So for your query it will look at all partitions.
> The setting you have kept is only applicable to join queries as it clearly
> says skewjoin. Non join queries it does not have an affect.
> On Thu, Nov 14, 2013 at 6:35 AM, Rajesh Balamohan <
> [EMAIL PROTECTED]> wrote:
>> Hi All,
>> I have the following skewed table "addresses_1"
>> select id, count(*) c from addresses_1 group by id order by c desc limit
>> 142624653 1554806
>> 198477395 958492
>> 102641838 220181
>> 138947865 211331
>> 156483436 193429
>> 96411677 179771
>> 210082076 168033
>> 800174765 152421
>> 139116901 141207
>> 704352025 137263
>> I was able to create the following table with the skew information. And
>> I was able to load the data into the table as well.
>> CREATE TABLE skew_addresses_1(
>> id bigint,
>> address_id bigint,
>> address_lines string,
>> city string,
>> state string,
>> postal_code string,
>> country string,
>> latitude string,
>> longitude string,
>> ) PARTITIONED BY (dateTS string) SKEWED BY (id) ON (142624653,
>> 198477395, 102641838, 138947865, 156483436, 96411677, 210082076, 800174765,
>> 139116901, 704352025)
>> stored as rcfile;
>> select id,count(*) c from skew_addresses_1 where id=142624653 group by id
>> order by c limit 10;
>> *However, at the time of running select query, entire dataset is
>> scanned. * I thought only the relevant dataset (with skew information
>> will be scanned). Am I missing anything here? Any help will be
>> appreciated. I am using Hive 10.x
>> I have enabled hive.optimize.skewjoin.compiletime=true and I can see the
>> skew information populated in SKEWED_COL_NAMES in metadata. But there is
>> no information in SKEWED_COL_VALUE_LOC_MAP table.
> Nitin Pawar