Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - Merging different HDFS file for HIVE


Copy link to this message
-
Re: Merging different HDFS file for HIVE
Stephen Sprague 2013-07-26, 23:37
i like #2.

so you have three, say, external tables representing your three feed files.
After the third and final file is loaded then join 'em all together - maybe
make the table partitioned for one per day.

for example:

alter table final add partition (datekey=YYYYMMDD);
insert overwrite table final partition (datekey=YYYYMMDD)  select
EMP_ID,f1,...,f10 from FF1 a join FF2 b on (a.EMP_ID=b.EMP_ID join FF3 c on
(b.EMP_ID=c.EMP_ID)
Or a variation on #3.   make a view on the three tables which would look
just like the select statement above.
What do you want to optimize for?
On Fri, Jul 26, 2013 at 5:30 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> Option 1 ) Use pig or oozie, write a workflow and join the files to a
> single file
> Option 2 ) Create a temp table for each of the different file and then
> join them to a single table and delete temp table
> Option 3 ) don't do anything, change your queries to look at three
> different files when they query  about different files
>
> Wait for others to give better suggestions :)
>
>
> On Fri, Jul 26, 2013 at 4:22 PM, Ramasubramanian Narayanan <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Please help in providing solution for the below problem... this scenario
>> is applicable in Banking atleast...
>>
>> I have a HIVE table with the below structure...
>>
>> Hive Table:
>> Field1
>> ...
>> Field 10
>>
>>
>> For the above table, I will get the values for each feed in different
>> file. You can imagine that these files belongs to same branch and will get
>> at any time interval. I have to load into table only if I get all 3 files
>> for the same branch. (assume that we have a common field in all the files
>> to join)
>>
>> *Feed file 1 :*
>> EMP ID
>> Field 1
>> Field 2
>> Field 6
>> Field 9
>>
>> *Feed File2 :*
>> EMP ID
>> Field 5
>> Field 7
>> Field 10
>>
>> *Feed File3 :*
>> EMP ID
>> Field 3
>> Field 4
>> Field 8
>>
>> Now the question is,
>> what is the best way to make all these files to make it as a single file
>> so that it can be placed under the HIVE structure.
>>
>> regards,
>> Rams
>>
>
>
>
> --
> Nitin Pawar
>