Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Parallel Load Data into Two partitions of a Hive Table


Copy link to this message
-
Parallel Load Data into Two partitions of a Hive Table
Hi All,

I need to load a month worth of processed data into a hive table. Table
have 10 partitions. Each day have many files to load and each file is
taking two seconds(constantly) and i have ~3000 files). So it will take
days to complete for 30 days worth of data.

I planned to load every day data parallel into respective partition so that
i can complete it short time.

But i need clarrification before proceeding it.

Question:

1. Will it cause data loss/corruption by loading parallel in different
partition of same hive table ?

For example, Assume i am doing like below,

Table     : processedlogs
Partition : logdate

Running below commands parallel,
LOAD DATA INPATH '/logs/processed/2013-04-01' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-01');
LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-02');
LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-03');
LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-04');
.....
LOAD DATA INPATH '/logs/processed/2013-04-30' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-30');

Thanks
Selva
+
Yanbo Liang 2013-05-03, 09:08
+
selva 2013-05-03, 09:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB