Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop


Copy link to this message
-
Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop
Hi Namit,
that's what I thought. Right now unfortunately we can't migrate to 0.20.
I realize we lose data locality but as you said, it would still be
considerably better than now.

I had a look at the shim code, shouldn't be difficult since it would
be basically mimicking CombineFileInputFormat.

Once I add the appropriate logic to the shim, I have to set
hive.input.format to
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat to have hive
actually use it, right ?

Roberto

2009/9/29 Namit Jain <[EMAIL PROTECTED]>:
> Hi Roberto,
>
> Talked with Raghu and Dhruba – it is possible to do so using
> MutliFileInputFormat,
> But the performance will not be very good since MutliFileInputFormat does
> not
> provide any locality. However, it will still be much better than the problem
> you are
> running into right now.
>
> Can you move to hadoop-0.20 ? That might be simpler.
>
> If not, you can definitely implement the shim using MultiFileInputFormat for
> 0.19
> (which should work even with 0.17). Do you need some help in understanding
> the
> current shim code ?
>
> Thanks,
> -namit
>
>
>
>
>
> On 9/29/09 10:53 AM, "Namit Jain" <[EMAIL PROTECTED]> wrote:
>
> Just checked – CombineFileInputFormat and a lot of other related stuff went
> to hadoop 0.20
> So, it would be very difficult to add this for 0.19
>
>
>
> From: Namit Jain [mailto:[EMAIL PROTECTED]]
> Sent: Monday, September 28, 2009 10:30 PM
> To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop
>
> I am not sure whether CombineFileInputFormat (in hadoop) is available in
> 0.19 -
> If it is, we can add it, otherwise it will be very difficult.
>
>
>
> On 9/28/09 7:06 PM, "Raghu Murthy" <[EMAIL PROTECTED]> wrote:
> Can we add MultiFileInputFormat as the CombineFileInputFormatShim for
> hadoop-0.19?
>
> On 9/28/09 6:57 PM, "Roberto Congiu" <[EMAIL PROTECTED]> wrote:
>
>> Hi guys,
>> I've been working on integrating hive with a legacy file format we use
>> here. I wrote the appropriate InputFormat and SerDe and everything
>> works, but it's painfully slow.
>> The reason is that the files I am reading are many and hive uses one
>> mapper for every file.
>> I saw the HIVE-74 patches but those use CombineFileInputFormat which
>> is available on hadoop 0.20...but we use 0.19. Is there any reason the
>> same goal could not be achieved using the deprecated (but present  <
>> 0.20) MultiFileInputFormat ?
>>
>> Thanks,
>> Roberto
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB