Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - DataCreator


Copy link to this message
-
Re: DataCreator
Ted Dunning 2011-02-16, 15:18
Sounds like Pig.  Or Cascading.  Or Hive.

Seriously, isn't this already available?

On Wed, Feb 16, 2011 at 7:06 AM, Guy Doulberg <[EMAIL PROTECTED]>wrote:

>
> Hey all,
> I want to consult with you hadoppers about a Map/Reduce application I want
> to build.
>
> I want to build a map/reduce job, that read files from HDFS, perform some
> sort of transformation on the file lines, and store them to several
> partition depending on the source of the file or its data.
>
> I want this application to be as configurable as possible, so I designed
> interfaces to Parse, Decorate and Partition(On HDFS) the Data.
>
> I want to be able to configure different data flows, with different
> parsers, decorators and partitioners, using a config file.
>
> Do you think, you would use such an application? Does it fit an open-source
> project?
>
> Now, I have some technical questions:
> I was thinking of using reflection, to load all the classes I would need
> according to the configuration during the setup process of the Mapper.
> Do you think it is a good idea?
>
> Is there a way to send the Mapper objects or interfaces from the Job
> declaration?
>
>
>
>  Thanks,
>
>