Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - RE: Which InputFormat to use?


+
Otto Mok 2013-07-05, 03:28
Copy link to this message
-
Re: Which InputFormat to use?
Azuryy Yu 2013-07-05, 06:00
Using InputFormat under mapreduce package.  mapred package is very old
package. but generally you can extend from FileInputFormat under
o.a.h.mapreduce package.
On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k <[EMAIL PROTECTED]> wrote:

>  Hi Ahmed,****
>
> ** **
>
>                 Hadoop 0.20.0 included the new mapred API, these sometimes
> refer as context objects. These are designed to make API easier to evolve
> in future. There are some differences between new & old API's,****
>
> ** **
>
> > The new API's favour abstract classes rather than interfaces, since
> abstract classes are easy to evolve.****
>
> > New API's use context objects like MapContext & ReduceContext to connect
> the user code. ****
>
> > The old API has a special JobConf object for jobconf, in new API Job
> configuration will be done using Configuration. ****
>
> ** **
>
> You can find the new API's in org.apache.hadoop.mapreduce.lib.input.*
> package and its sub packages, old API's in org.apache.hadoop.mapred.*
> package its sub packages. ****
>
> ** **
>
> The new API is type-incompatible with the old, we need to rewrite the jobs
> to make use of these advantages.****
>
> ** **
>
> Based on these things you can select which API's to use.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Ahmed Eldawy [mailto:[EMAIL PROTECTED]]
> *Sent:* 05 July 2013 00:00
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Which InputFormat to use?****
>
>  ** **
>
> Hi I'm developing a new set of InputFormats that are used for a project
> I'm doing. I found that there are two ways to create  a new InputFormat.**
> **
>
> 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat****
>
> 2- Implement the interface org.apache.hadoop.mapred.InputFormat****
>
> I don't know why there are two versions which are incompatible. I found
> out that for each one, there is a whole set of interfaces for different
> classes such as InputSplit, RecordReader and MapReduce job. Unfortunately,
> each set of classes is not compatible with the other one. This means that I
> have to choose one of the interfaces and go with it till the end. I have
> two questions basically.****
>
> 1- Which of these two interfaces I should go with? I didn't find any
> deprecation in one of them so they both seem legitimate. Is there any plan
> to retire one of them?****
>
> 2- I already have some classes implemented in one of the formats, does it
> worth refactoring these classes to use the other interface, in case I used
> he old format.****
>
> Thanks in advance for your help.****
>
> ** **
>
>
> ****
>
> Best regards,
> Ahmed Eldawy****
>