Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Partition by directory


Copy link to this message
-
Re: Partition by directory
Erik,
Did you find out the answer to this? I would be curious to hear what
the problem is.

BTW, I would check the hive logs (/var/log/apps/hive or /var/log/hive
or similar on EMR). Try increasing the log level and see if that
helps.

Given that EMR comes with it's own distribution of Hive (which the
last I saw was 0.8*), it would be interesting how Shark's Hive 0.9 is
going to play around with EMR's version of Hive. FWIW, commands like
"ALTER TABLE RECOVER PARTITIONS" are only available in EMR Hive.

Keep us posted!
Mark

On Mon, Dec 10, 2012 at 1:46 PM, Erik Thorson <[EMAIL PROTECTED]> wrote:
> Hello All,
>
> I have been using the AWS setup for EMR for some time now and I am currently
> in the process of implementing spark/shark on my own cluster. I am
> installing from
> https://github.com/downloads/mesos/spark/spark-0.6.0-sources.tar.gz. Which
> includes hive0.9.0. I am using this with s3 and am unable to recover
> partitions from a directory with a series of other directories (partitions)
> inside of it. I want to have 2 partitions 2012-10-25 and 2012-10-26 which
> contain their respective files. For example I have the following files
> located at s3://varickTest3/nn/.
>
>
> drwxrwxrwx   -          0 1970-01-01 00:00 /nn/ds=2012-10-25
>
> -rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00000
>
> -rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00001
>
> drwxrwxrwx   -          0 1970-01-01 00:00 /nn/ds=2012-10-26
>
> -rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00000
>
> -rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00001
>
>
> When I run the query in hive (not shark):
>
>
> CREATE EXTERNAL TABLE wiki(id BIGINT, title STRING, last_modified STRING,
> xml STRING, text STRING)
>
> PARTITIONED BY (ds STRING)
>
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION
> 's3n://varickTest3/nn';
>
> ALTER TABLE wiki RECOVER PARTITIONS;
>
>
> This will result in an empty table.
>
>
> I have tried many iterations of this and nothing has worked so far.
> Including adding:
>
> MSCK REPAIR TABLE wiki;
>
> And using s3 rather than s3n (credentials for both types are set in
> core-site.xml)
>
>
> And setting the options:
>
> SET hive.exec.dynamic.partition=true;
>
> SET hive.exec.dynamic.partition.mode=nonstrict;
>
>
> Although if I use:
>
> LOCATION 's3n://varickTest3/nn/*
>
>
> The table will have content but I am still unable to recover partitions.
>
>
> Is there any way to do this using settings or data structure (rather than
> writing a script) to partition the table using the directories as I can in
> AWS?
>
>
> Thank you for any help anyone can give me.