Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> how to read binary data from hdfs


Copy link to this message
-
Re: how to read binary data from hdfs
Amritanshu,

Implement your own custom InputFormat with a RecordReader and you can
read your files directly.

To learn how to implement custom readers/formats you can refer to an
example provided via sub-title "Processing a whole file as a record",
Page 206 | Chapter 7: MapReduce Types and Formats in Tom White's
Hadoop: The Definitive Guide, or you can read up the details on
http://developer.yahoo.com/hadoop/tutorial/module5.html#inputformat.

On Tue, May 1, 2012 at 3:42 PM, Amritanshu Shekhar
<[EMAIL PROTECTED]> wrote:
> Hi Guys,
> I want to read binary data (produced by a C program) that is copied to HDFS using a java program. The idea is that I would write a map-reduce job eventually  that would  use the aforementioned programs output(the java program would read binary data and create a Java object which the map function would use). I read about the sequence file format that hadoop supports but converting the binary data using java serialization into sequence file format would add another layer of complexity. Is there a simple no frills API  that I can use to read binary data directly from HDFS. Any help/resources would be deeply appreciated.
> Thanks and Regards,
> Amritanshu

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB