Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Partition by directory


Copy link to this message
-
Partition by directory
Hello All,

I have been using the AWS setup for EMR for some time now and I am currently in the process of implementing spark/shark on my own cluster. I am installing from https://github.com/downloads/mesos/spark/spark-0.6.0-sources.tar.gz. Which includes hive0.9.0. I am using this with s3 and am unable to recover partitions from a directory with a series of other directories (partitions)  inside of it. I want to have 2 partitions 2012-10-25 and 2012-10-26 which contain their respective files. For example I have the following files located at s3://varickTest3/nn/.
drwxrwxrwx   -          0 1970-01-01 00:00 /nn/ds=2012-10-25

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00000

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-25/part-00001

drwxrwxrwx   -          0 1970-01-01 00:00 /nn/ds=2012-10-26

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00000

-rwxrwxrwx   1   49696432 2012-12-10 20:55 /nn/ds=2012-10-26/part-00001
When I run the query in hive (not shark):
CREATE EXTERNAL TABLE wiki(id BIGINT, title STRING, last_modified STRING, xml STRING, text STRING)

PARTITIONED BY (ds STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3n://varickTest3/nn';

ALTER TABLE wiki RECOVER PARTITIONS;
This will result in an empty table.
I have tried many iterations of this and nothing has worked so far. Including adding:

MSCK REPAIR TABLE wiki;

And using s3 rather than s3n (credentials for both types are set in core-site.xml)
And setting the options:

SET hive.exec.dynamic.partition=true;

SET hive.exec.dynamic.partition.mode=nonstrict;
Although if I use:

LOCATION 's3n://varickTest3/nn/*
The table will have content but I am still unable to recover partitions.
Is there any way to do this using settings or data structure (rather than writing a script) to partition the table using the directories as I can in AWS?
Thank you for any help anyone can give me.