Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Which InputFormat to use?


Copy link to this message
-
Re: Which InputFormat to use?
Harsh J 2013-07-05, 10:35
Whichever you pick, both are supported right now and pretty much offer
the same functionality. FWIW though, Pig and HBase both use the new
APIs.

On Fri, Jul 5, 2013 at 11:30 AM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
> Using InputFormat under mapreduce package.  mapred package is very old
> package. but generally you can extend from FileInputFormat under
> o.a.h.mapreduce package.
>
>
> On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k <[EMAIL PROTECTED]> wrote:
>>
>> Hi Ahmed,
>>
>>
>>
>>                 Hadoop 0.20.0 included the new mapred API, these sometimes
>> refer as context objects. These are designed to make API easier to evolve in
>> future. There are some differences between new & old API's,
>>
>>
>>
>> > The new API's favour abstract classes rather than interfaces, since
>> > abstract classes are easy to evolve.
>>
>> > New API's use context objects like MapContext & ReduceContext to connect
>> > the user code.
>>
>> > The old API has a special JobConf object for jobconf, in new API Job
>> > configuration will be done using Configuration.
>>
>>
>>
>> You can find the new API's in org.apache.hadoop.mapreduce.lib.input.*
>> package and its sub packages, old API's in org.apache.hadoop.mapred.*
>> package its sub packages.
>>
>>
>>
>> The new API is type-incompatible with the old, we need to rewrite the jobs
>> to make use of these advantages.
>>
>>
>>
>> Based on these things you can select which API's to use.
>>
>>
>>
>> Thanks
>>
>> Devaraj k
>>
>>
>>
>> From: Ahmed Eldawy [mailto:[EMAIL PROTECTED]]
>> Sent: 05 July 2013 00:00
>>
>>
>> To: [EMAIL PROTECTED]
>> Subject: Which InputFormat to use?
>>
>>
>>
>> Hi I'm developing a new set of InputFormats that are used for a project
>> I'm doing. I found that there are two ways to create  a new InputFormat.
>>
>> 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat
>>
>> 2- Implement the interface org.apache.hadoop.mapred.InputFormat
>>
>> I don't know why there are two versions which are incompatible. I found
>> out that for each one, there is a whole set of interfaces for different
>> classes such as InputSplit, RecordReader and MapReduce job. Unfortunately,
>> each set of classes is not compatible with the other one. This means that I
>> have to choose one of the interfaces and go with it till the end. I have two
>> questions basically.
>>
>> 1- Which of these two interfaces I should go with? I didn't find any
>> deprecation in one of them so they both seem legitimate. Is there any plan
>> to retire one of them?
>>
>> 2- I already have some classes implemented in one of the formats, does it
>> worth refactoring these classes to use the other interface, in case I used
>> he old format.
>>
>> Thanks in advance for your help.
>>
>>
>>
>>
>> Best regards,
>> Ahmed Eldawy
>
>

--
Harsh J