Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Is it possible to use Pig streaming (StreamToPig) in a way that handles multiple lines as a single input tuple?


Copy link to this message
-
Re: Is it possible to use Pig streaming (StreamToPig) in a way that handles multiple lines as a single input tuple?
why not pipe multi-line xml from the executable through another script that
understands it?

On Wed, Mar 28, 2012 at 8:24 AM, Ahmed Sobhi <[EMAIL PROTECTED]> wrote:

> I'm streaming data in a pig script through an executable that returns an
> xml fragment for each line of input I stream to it. That xml fragment
> happens to span multiple lines and I have no control whatsoever over the
> output of the executable I stream to
>
> In relation to Use Hadoop Pig to load data from text file w/ each record on
> multiple lines?<
> http://stackoverflow.com/questions/6726407/use-hadoop-pig-to-load-data-from-text-file-w-each-record-on-multiple-lines
> >,
> the answer was suggesting writing a custom record reader. The problem is,
> this works fine if you want to implement a LoadFunc that reads from a file,
> but to be able to use streaming, it has to implement StreamToPig.
> StreamToPig allows you to only read one line at a time as far as I
> understood
>
> Does anyone know how to handle such a situation?
>
>
> http://stackoverflow.com/questions/9910138/is-it-possible-to-use-pig-streaming-streamtopig-in-a-way-that-handles-multiple
>
> --
> Best Regards,
> Ahmed Sobhi
> http://about.me/humanzz/bio
>