Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> msck repair table not adding partitions which contains data.


Copy link to this message
-
msck repair table not adding partitions which contains data.
Hi All,
I have created a partitioned HIVE external table as follows

create external table test_part (key int, val int) partitioned by (part int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/test/';

I have the following folders and files in /test

/test/part=1
/test/part=1/1.txt
/test/part=2
/test/part=2/1.txt
/test/part=3
/test/part=4

Now I try to automatically add the partitions into the table using the 'msck repair' command
hive> msck repair table test_part;
OK
Partitions not in metastore:    test_part:part=3        test_part:part=4
Repair: Added partition to metastore test_part:part=3
Repair: Added partition to metastore test_part:part=4
Time taken: 0.685 seconds

As you can see only partitions which do not contain any data have been added. part=1 and part=2 folders have been ignored.
Is this by design? Am I using this correctly?

The only alternative I found is to explicitly add partitions using 'alter table add partition' once every subfolder.
Is there a simpler way to achieve this?

Thanks in advance
Suri
+
Mark Grover 2013-02-07, 19:53
+
Krishnappa, Suresh 2013-02-07, 21:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB