Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - number of M/R jobs for a Pig Script


+
ey-chih chow 2013-10-15, 07:57
+
Shahab Yunus 2013-10-15, 12:43
+
Geert Van Landeghem 2013-10-15, 12:51
+
Bertrand Dechoux 2013-10-15, 13:14
+
Shahab Yunus 2013-10-15, 13:14
+
ey-chih chow 2013-10-15, 17:07
+
Pradeep Gollakota 2013-10-15, 17:16
+
ey-chih chow 2013-10-15, 19:12
+
Pradeep Gollakota 2013-10-15, 20:12
Copy link to this message
-
Re: number of M/R jobs for a Pig Script
Alan Gates 2013-10-15, 20:50
Pig handles doing multiple group bys on the same input, often in a single MR job.  So:

A = load 'file';
B = group A by $0;
C = foreach B generate group, COUNT(A);
store C into 'output1';
D = group A by $1;
E = foreach D generate group, COUNT(A);
store D into 'output2';

This can be done in a single MR job.  Is that what you're looking for?

Alan.

On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote:

> What I really want to know is,in Pig, how can I read an input data set only
> once and generate multiple instances with distinct keys for each data point
> and do a group-by?
>
> Best regards,
>
> Ey-Chih Chow
>
>
> On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <[EMAIL PROTECTED]>wrote:
>
>> I'm not aware of anyway to do that. I think you're also missing the spirit
>> of Pig. Pig is meant to be a data workflow language. Describe a workflow
>> for your data using PigLatin and Pig will then compile your script to
>> MapReduce jobs. The number of MapReduce jobs that it generates is the
>> smallest number of jobs (based on the optimizers) that Pig thinks it needs
>> to complete the workflow.
>>
>> Why do you want to control the number of MR jobs?
>>
>>
>> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks everybody.  Is there anyway we can programmatically control the
>>> number of M-R jobs that a Pig script will generate, similar to write M-R
>>> jobs in Java?
>>>
>>> Best regards,
>>>
>>> Ey-Chih Chow
>>>
>>>
>>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <[EMAIL PROTECTED]
>>>> wrote:
>>>
>>>> And Geert's comment about using external-to-Pig approach reminds me
>> that,
>>>> then you have Netflix's PigLipstick too. Nice visual tool for actual
>>>> execution and stores job history as well.
>>>>
>>>> Regards,
>>>> Shahab
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
>> [EMAIL PROTECTED]
>>>>> wrote:
>>>>
>>>>> You can also use ambrose to monitor execution of your pig script at
>>>>> runtime. Remark: from pig-0.11 on.
>>>>>
>>>>> It show you the DAG of MR jobs and which are currently being
>> executed.
>>> As
>>>>> long as pig-ambrose is connected to the execution of your script
>>>> (workflow)
>>>>> you can replay the workflow.
>>>>>
>>>>> --
>>>>> kind regards,
>>>>> Geert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <[EMAIL PROTECTED]>
>>> wrote:
>>>>>
>>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
>>> know,
>>>> I
>>>>>> don't think they give you the exact number as it depends on the
>>> actual
>>>>> data
>>>>>> but I believe you can interpret it/extrapolate it from the
>>> information
>>>>>> provided by these commands.
>>>>>>
>>>>>> Regards,
>>>>>> Shahab
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <[EMAIL PROTECTED]>
>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a Pig script that has two group-by statements on the the
>>> input
>>>>> data
>>>>>>> set.  Is there anybody knows how many M-R jobs the script will
>>>> generate?
>>>>>>> Thanks.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Ey-Chih Chow
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
+
ey-chih chow 2013-10-15, 22:40
+
ey-chih chow 2013-12-03, 23:03