Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> questions regarding data storage and inputformat


Copy link to this message
-
Re: questions regarding data storage and inputformat
You could either use a custom RecordReader or you could override the
run() method on your Mapper class to do the merging before calling the
map() method.

-Joey

On Wed, Jul 27, 2011 at 11:09 AM, Tom Melendez <[EMAIL PROTECTED]> wrote:
>>
>>> 3. Another idea might be create separate seq files for chunk of
>>> records and make them non-splittable, ensuring that they go to a
>>> single mapper.  Assuming I can get away with this, see any pros/cons
>>> with that approach?
>>
>> Separate sequence files would require the least amount of custom code.
>>
>
> Thanks for the response, Joey.
>
> So, if I were to do the above, I would still need a custom record
> reader to put all the keys and values together, right?
>
> Thanks,
>
> Tom
>
> --
> ==================> Skybox is hiring.
> http://www.skyboximaging.com/careers/jobs
>

--
Joseph Echeverria
Cloudera, Inc.
443.305.9434
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB