Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> How to Create an effective chained MapReduce program.


Copy link to this message
-
Re: How to Create an effective chained MapReduce program.
* open a SequenceFile.Reader on the sequence file
* in a loop, call next(key,val) on the reader to read the next key/val
pair in the file (see:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next(org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable)
)
* write code to format the key & val into whatever appropriate format
you want, and write them to the console
* when next(key,val) returns false, exit the loop

HTH,

DR

On 09/07/2011 06:10 PM, ilyal levin wrote:
> Can you be more specific on how to perform this. In general is there a way
> to convert the binary files i have to text files?
>
>
>
> On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch<[EMAIL PROTECTED]>wrote:
>
>> On 09/06/2011 01:57 AM, Niels Basjes wrote:
>>
>>> Hi,
>>>
>>> In the past i've had the same situation where I needed the data for
>>> debugging. Back then I chose to create a second job with simply
>>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
>>> TextOutputFormat.
>>>
>>> In my situation that worked great for my purpose.
>>>
>>
>> I did similar at my last job, but rather than writing a 2nd map/reduce job
>> for this, we just wrote a simple command line app that used the Hadoop Java
>> API to dump the contents of the binary file as text (JSON) to the console.
>>
>> HTH,
>>
>> DR
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB