Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive >> mail # user >> Merging different HDFS file for HIVE


+
Ramasubramanian Narayanan... 2013-07-26, 10:52
+
Nitin Pawar 2013-07-26, 12:30
Copy link to this message
-
Re: Merging different HDFS file for HIVE
i like #2.

so you have three, say, external tables representing your three feed files.
After the third and final file is loaded then join 'em all together - maybe
make the table partitioned for one per day.

for example:

alter table final add partition (datekey=YYYYMMDD);
insert overwrite table final partition (datekey=YYYYMMDD)  select
EMP_ID,f1,...,f10 from FF1 a join FF2 b on (a.EMP_ID=b.EMP_ID join FF3 c on
(b.EMP_ID=c.EMP_ID)
Or a variation on #3.   make a view on the three tables which would look
just like the select statement above.
What do you want to optimize for?
On Fri, Jul 26, 2013 at 5:30 AM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> Option 1 ) Use pig or oozie, write a workflow and join the files to a
> single file
> Option 2 ) Create a temp table for each of the different file and then
> join them to a single table and delete temp table
> Option 3 ) don't do anything, change your queries to look at three
> different files when they query  about different files
>
> Wait for others to give better suggestions :)
>
>
> On Fri, Jul 26, 2013 at 4:22 PM, Ramasubramanian Narayanan <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Please help in providing solution for the below problem... this scenario
>> is applicable in Banking atleast...
>>
>> I have a HIVE table with the below structure...
>>
>> Hive Table:
>> Field1
>> ...
>> Field 10
>>
>>
>> For the above table, I will get the values for each feed in different
>> file. You can imagine that these files belongs to same branch and will get
>> at any time interval. I have to load into table only if I get all 3 files
>> for the same branch. (assume that we have a common field in all the files
>> to join)
>>
>> *Feed file 1 :*
>> EMP ID
>> Field 1
>> Field 2
>> Field 6
>> Field 9
>>
>> *Feed File2 :*
>> EMP ID
>> Field 5
>> Field 7
>> Field 10
>>
>> *Feed File3 :*
>> EMP ID
>> Field 3
>> Field 4
>> Field 8
>>
>> Now the question is,
>> what is the best way to make all these files to make it as a single file
>> so that it can be placed under the HIVE structure.
>>
>> regards,
>> Rams
>>
>
>
>
> --
> Nitin Pawar
>
+
Sanjay Subramanian 2013-07-27, 01:23
+
Sanjay Subramanian 2013-07-27, 01:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB