Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> RE: Which InputFormat to use?


Copy link to this message
-
Re: Which InputFormat to use?
Using InputFormat under mapreduce package.  mapred package is very old
package. but generally you can extend from FileInputFormat under
o.a.h.mapreduce package.
On Fri, Jul 5, 2013 at 1:23 PM, Devaraj k <[EMAIL PROTECTED]> wrote:

>  Hi Ahmed,****
>
> ** **
>
>                 Hadoop 0.20.0 included the new mapred API, these sometimes
> refer as context objects. These are designed to make API easier to evolve
> in future. There are some differences between new & old API's,****
>
> ** **
>
> > The new API's favour abstract classes rather than interfaces, since
> abstract classes are easy to evolve.****
>
> > New API's use context objects like MapContext & ReduceContext to connect
> the user code. ****
>
> > The old API has a special JobConf object for jobconf, in new API Job
> configuration will be done using Configuration. ****
>
> ** **
>
> You can find the new API's in org.apache.hadoop.mapreduce.lib.input.*
> package and its sub packages, old API's in org.apache.hadoop.mapred.*
> package its sub packages. ****
>
> ** **
>
> The new API is type-incompatible with the old, we need to rewrite the jobs
> to make use of these advantages.****
>
> ** **
>
> Based on these things you can select which API's to use.****
>
> ** **
>
> Thanks****
>
> Devaraj k****
>
> ** **
>
> *From:* Ahmed Eldawy [mailto:[EMAIL PROTECTED]]
> *Sent:* 05 July 2013 00:00
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Which InputFormat to use?****
>
>  ** **
>
> Hi I'm developing a new set of InputFormats that are used for a project
> I'm doing. I found that there are two ways to create  a new InputFormat.**
> **
>
> 1- Extend the abstract class org.apache.hadoop.mapreduce.InputFormat****
>
> 2- Implement the interface org.apache.hadoop.mapred.InputFormat****
>
> I don't know why there are two versions which are incompatible. I found
> out that for each one, there is a whole set of interfaces for different
> classes such as InputSplit, RecordReader and MapReduce job. Unfortunately,
> each set of classes is not compatible with the other one. This means that I
> have to choose one of the interfaces and go with it till the end. I have
> two questions basically.****
>
> 1- Which of these two interfaces I should go with? I didn't find any
> deprecation in one of them so they both seem legitimate. Is there any plan
> to retire one of them?****
>
> 2- I already have some classes implemented in one of the formats, does it
> worth refactoring these classes to use the other interface, in case I used
> he old format.****
>
> Thanks in advance for your help.****
>
> ** **
>
>
> ****
>
> Best regards,
> Ahmed Eldawy****
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB