Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to read a file generated by Pig+BinStorage using the HDFS API ?


Copy link to this message
-
Re: How to read a file generated by Pig+BinStorage using the HDFS API ?
I haven't done it myself, so I can't give you a detailed answer. But every
storage is associated with Input/outputFormat as well as
RecordReader/Writer.

As for BinStorage, you can take a look at BinStorageRecordReader-
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/io/BinStorageRecordReader.java#L40
On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi all and merry Christmas !
>
> I generate a file using a Pig script embedded in a Java process and store
> it using a BinStorage.
>
> Then, I would like to read this file directly from another Java client,
> but without starting a Pig script (i.e only by using Hadoop API and Pig's
> BinStorage class).
> The goal is to achieve some real-time computation by scanning the file in
> realtime, and so I cannot offer to start a Pig script to do the
> computation, as the time overhead to start the script and get the result is
> too long for my realtime objectives (I need a result in a few seconds).
>
> Of course, I could use a JsonStorage and read my file using a Json
> deserializer, but my guess is it would be much slower, and also painful to
> handle the various parts generated for the output file (part-r-XXXXX).
>
> Best regards,
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB