Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> storing intermediate results ?

Vincent BARAT 2009-10-07, 14:54
Ashutosh Chauhan 2009-10-07, 16:33
zaki rahaman 2009-10-07, 20:08
Thejas Nair 2009-10-07, 20:16
Vincent BARAT 2009-10-08, 13:33
Alan Gates 2009-10-12, 18:50
Copy link to this message
Re: storing intermediate results ?

Thank for your answer.

Actually, I use PIG by running it from Java (using a set of
registerQuery() methods). The exec you mention cannot be used in
that context (AFAIK).

Ashutosh Chauhan a �crit :
> Hi Vincent,
> Pig has a multi-query optimization which if firing will automatically figure
> out that join needs to be done only once and there will not be any
> repetition of work. If Pig determines that its not safe to do that
> optimization then its possible that your join is getting computed more then
> once. If thats the case, then it will be better to do the join and store it.
> You can do that within same script using "exec"
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#exec
> You can read more about multi-query optimization here:
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#Multi-Query+Execution
> Hope it helps,
> Ashutosh
> On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <[EMAIL PROTECTED]>wrote:
>> Hello,
>> I'm new to PIG, and I have a bunch of statements that process the same
>> input, which is actually the result of a JOIN between two very big data set
>> (millions of entries).
>> I wonder if it is better (faster) to save the result of this JOIN into an
>> Hadoop file and then to LOAD it, instead of just relying on PIG
>> optimizations ?
>> Thank a lot for your help.