Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Parallel Load Data into Two partitions of a Hive Table


Copy link to this message
-
Re: Parallel Load Data into Two partitions of a Hive Table
Thanks Yanbo. I my doubt is got clarified now.
On Fri, May 3, 2013 at 2:38 PM, Yanbo Liang <[EMAIL PROTECTED]> wrote:

> load data to different partitions parallel is OK, because it equivalent to
> write to different file on HDFS
>
>
> 2013/5/3 selva <[EMAIL PROTECTED]>
>
>> Hi All,
>>
>> I need to load a month worth of processed data into a hive table. Table
>> have 10 partitions. Each day have many files to load and each file is
>> taking two seconds(constantly) and i have ~3000 files). So it will take
>> days to complete for 30 days worth of data.
>>
>> I planned to load every day data parallel into respective partition so
>> that i can complete it short time.
>>
>> But i need clarrification before proceeding it.
>>
>> Question:
>>
>> 1. Will it cause data loss/corruption by loading parallel in different
>> partition of same hive table ?
>>
>> For example, Assume i am doing like below,
>>
>> Table     : processedlogs
>> Partition : logdate
>>
>> Running below commands parallel,
>> LOAD DATA INPATH '/logs/processed/2013-04-01' OVERWRITE INTO TABLE
>> processedlogs PARTITION(logdate='2013-04-01');
>> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
>> processedlogs PARTITION(logdate='2013-04-02');
>> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
>> processedlogs PARTITION(logdate='2013-04-03');
>> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
>> processedlogs PARTITION(logdate='2013-04-04');
>> .....
>> LOAD DATA INPATH '/logs/processed/2013-04-30' OVERWRITE INTO TABLE
>> processedlogs PARTITION(logdate='2013-04-30');
>>
>> Thanks
>> Selva
>>
>>
>>
>>
>>
>>
>
--
-- selva
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB