Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - storing intermediate results ?


Copy link to this message
-
Re: storing intermediate results ?
Vincent BARAT 2009-10-08, 09:43
Hello,

Thank for your answer.

Actually, I use PIG by running it from Java (using a set of
registerQuery() methods). The exec you mention cannot be used in
that context (AFAIK).

Ashutosh Chauhan a �crit :
> Hi Vincent,
>
> Pig has a multi-query optimization which if firing will automatically figure
> out that join needs to be done only once and there will not be any
> repetition of work. If Pig determines that its not safe to do that
> optimization then its possible that your join is getting computed more then
> once. If thats the case, then it will be better to do the join and store it.
> You can do that within same script using "exec"
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#exec
>
> You can read more about multi-query optimization here:
> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#Multi-Query+Execution
>
> Hope it helps,
> Ashutosh
>
> On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>>
>> I'm new to PIG, and I have a bunch of statements that process the same
>> input, which is actually the result of a JOIN between two very big data set
>> (millions of entries).
>>
>> I wonder if it is better (faster) to save the result of this JOIN into an
>> Hadoop file and then to LOAD it, instead of just relying on PIG
>> optimizations ?
>>
>> Thank a lot for your help.
>>
>