Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop


Copy link to this message
-
Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop
Roberto Congiu 2009-09-30, 07:07
Hi Namit,
that's what I thought. Right now unfortunately we can't migrate to 0.20.
I realize we lose data locality but as you said, it would still be
considerably better than now.

I had a look at the shim code, shouldn't be difficult since it would
be basically mimicking CombineFileInputFormat.

Once I add the appropriate logic to the shim, I have to set
hive.input.format to
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat to have hive
actually use it, right ?

Roberto

2009/9/29 Namit Jain <[EMAIL PROTECTED]>:
> Hi Roberto,
>
> Talked with Raghu and Dhruba – it is possible to do so using
> MutliFileInputFormat,
> But the performance will not be very good since MutliFileInputFormat does
> not
> provide any locality. However, it will still be much better than the problem
> you are
> running into right now.
>
> Can you move to hadoop-0.20 ? That might be simpler.
>
> If not, you can definitely implement the shim using MultiFileInputFormat for
> 0.19
> (which should work even with 0.17). Do you need some help in understanding
> the
> current shim code ?
>
> Thanks,
> -namit
>
>
>
>
>
> On 9/29/09 10:53 AM, "Namit Jain" <[EMAIL PROTECTED]> wrote:
>
> Just checked – CombineFileInputFormat and a lot of other related stuff went
> to hadoop 0.20
> So, it would be very difficult to add this for 0.19
>
>
>
> From: Namit Jain [mailto:[EMAIL PROTECTED]]
> Sent: Monday, September 28, 2009 10:30 PM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop
>
> I am not sure whether CombineFileInputFormat (in hadoop) is available in
> 0.19 -
> If it is, we can add it, otherwise it will be very difficult.
>
>
>
> On 9/28/09 7:06 PM, "Raghu Murthy" <[EMAIL PROTECTED]> wrote:
> Can we add MultiFileInputFormat as the CombineFileInputFormatShim for
> hadoop-0.19?
>
> On 9/28/09 6:57 PM, "Roberto Congiu" <[EMAIL PROTECTED]> wrote:
>
>> Hi guys,
>> I've been working on integrating hive with a legacy file format we use
>> here. I wrote the appropriate InputFormat and SerDe and everything
>> works, but it's painfully slow.
>> The reason is that the files I am reading are many and hive uses one
>> mapper for every file.
>> I saw the HIVE-74 patches but those use CombineFileInputFormat which
>> is available on hadoop 0.20...but we use 0.19. Is there any reason the
>> same goal could not be achieved using the deprecated (but present  <
>> 0.20) MultiFileInputFormat ?
>>
>> Thanks,
>> Roberto
>
>
>