Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> storing intermediate results ?


Copy link to this message
-
Re: storing intermediate results ?
Hello,

Thank for your answer.

Actually, I use PIG by running it from Java (using a set of
registerQuery() methods). The exec you mention cannot be used in
that context (AFAIK).

Ashutosh Chauhan a �crit :
> Hi Vincent,
>
> Pig has a multi-query optimization which if firing will automatically figure
> out that join needs to be done only once and there will not be any
> repetition of work. If Pig determines that its not safe to do that
> optimization then its possible that your join is getting computed more then
> once. If thats the case, then it will be better to do the join and store it.
> You can do that within same script using "exec"
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#exec
>
> You can read more about multi-query optimization here:
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#Multi-Query+Execution
>
> Hope it helps,
> Ashutosh
>
> On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>> I'm new to PIG, and I have a bunch of statements that process the same
>> input, which is actually the result of a JOIN between two very big data set
>> (millions of entries).
>>
>> I wonder if it is better (faster) to save the result of this JOIN into an
>> Hadoop file and then to LOAD it, instead of just relying on PIG
>> optimizations ?
>>
>> Thank a lot for your help.
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB