Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Is it possible to use Pig streaming (StreamToPig) in a way that handles multiple lines as a single input tuple?


Copy link to this message
-
Re: Is it possible to use Pig streaming (StreamToPig) in a way that handles multiple lines as a single input tuple?
why not pipe multi-line xml from the executable through another script that
understands it?

On Wed, Mar 28, 2012 at 8:24 AM, Ahmed Sobhi <[EMAIL PROTECTED]> wrote:

> I'm streaming data in a pig script through an executable that returns an
> xml fragment for each line of input I stream to it. That xml fragment
> happens to span multiple lines and I have no control whatsoever over the
> output of the executable I stream to
>
> In relation to Use Hadoop Pig to load data from text file w/ each record on
> multiple lines?<
> http://stackoverflow.com/questions/6726407/use-hadoop-pig-to-load-data-from-text-file-w-each-record-on-multiple-lines
> >,
> the answer was suggesting writing a custom record reader. The problem is,
> this works fine if you want to implement a LoadFunc that reads from a file,
> but to be able to use streaming, it has to implement StreamToPig.
> StreamToPig allows you to only read one line at a time as far as I
> understood
>
> Does anyone know how to handle such a situation?
>
>
> http://stackoverflow.com/questions/9910138/is-it-possible-to-use-pig-streaming-streamtopig-in-a-way-that-handles-multiple
>
> --
> Best Regards,
> Ahmed Sobhi
> http://about.me/humanzz/bio
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB