Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> best way for pig and mapreduce jobs to be used interchangeably


Copy link to this message
-
Re: best way for pig and mapreduce jobs to be used interchangeably
Mridul -

What file format do you use to exchange data between pig and java?  Text or something else?

On Jul 25, 2010, at 1:52 PM, Mridul Muralidharan wrote:

>
>
> In some of our pipelines, pig jobs are part of the pipeline - which consist of other hadoop jobs, shell executions, etc.
> We currently do this by using intermediate file dumps.
>
>
> Regards,
> Mridul
>
>
>
> On Friday 23 July 2010 10:45 PM, Corbin Hoenes wrote:
>> What are some strategies to have pig and java mapreduce jobs exchange data?  E.g. we find a particular pig script in a chain is too slow and we could optimize with a custom mapreduce job we'd want pig to write the data out in a format that mapreduce could access and vice versa.
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB