Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to read a file generated by Pig+BinStorage using the HDFS API ?


Copy link to this message
-
Re: How to read a file generated by Pig+BinStorage using the HDFS API ?
I haven't done it myself, so I can't give you a detailed answer. But every
storage is associated with Input/outputFormat as well as
RecordReader/Writer.

As for BinStorage, you can take a look at BinStorageRecordReader-
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/io/BinStorageRecordReader.java#L40
On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi all and merry Christmas !
>
> I generate a file using a Pig script embedded in a Java process and store
> it using a BinStorage.
>
> Then, I would like to read this file directly from another Java client,
> but without starting a Pig script (i.e only by using Hadoop API and Pig's
> BinStorage class).
> The goal is to achieve some real-time computation by scanning the file in
> realtime, and so I cannot offer to start a Pig script to do the
> computation, as the time overhead to start the script and get the result is
> too long for my realtime objectives (I need a result in a few seconds).
>
> Of course, I could use a JsonStorage and read my file using a Json
> deserializer, but my guess is it would be much slower, and also painful to
> handle the various parts generated for the output file (part-r-XXXXX).
>
> Best regards,
>