Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Limit number of Streaming Programs


+
Thomas Bach 2012-12-18, 20:00
+
Cheolsoo Park 2012-12-24, 22:15
Copy link to this message
-
Re: Limit number of Streaming Programs
Kshiva Kps 2012-12-25, 05:39
Hi,

Is there any PIG editors and where we can write 100 to 150 pig scripts
I'm believing is not possible to  do in CLI mode .
Like IDE for JAVA /TOAD for SQL pls advice , many thanks

Thnaks
On Tue, Dec 25, 2012 at 3:45 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> Hi Thomas,
>
> If I understand your question correctly, what you want is reduce the number
> of mappers that spawn streaming processes. The default-parallel controls
> the number of reducers, so it won't have any effect to the number of
> mappers. Although the number of mappers is auto-determined by the size of
> input data, you can try to set "pig.maxCombinedSplitSize" to combine input
> files into bigger ones. For more details, please refer to:
> http://pig.apache.org/docs/r0.10.0/perf.html#combine-files
>
> You can also read a discussion on a similar topic here:
>
> http://search-hadoop.com/m/J5hCw1UdxTa/How+can+I+set+the+mapper+number&subj=How+can+I+set+the+mapper+number+for+pig+script+
>
> Thanks,
> Cheolsoo
>
>
> On Tue, Dec 18, 2012 at 12:00 PM, Thomas Bach
> <[EMAIL PROTECTED]>wrote:
>
> > Hi,
> >
> > I have around 4 million time series. ~1000 of them had a special
> > occurrence at some point. Now, I want to draw 10 samples for each
> > special time-series based on a similarity comparison.
> >
> > What I have currently implemented is a script in Python which consumes
> > time-series one-by-one and does a comparison with all 1000 special
> > time-series. If the similarity is sufficient with one of them I pass
> > it back to Pig and strike out the according special time-series,
> > subsequent time-series will not be compared against this one.
> >
> > This routine runs, but it lasts around 6 hours.
> >
> > One of the problems I'm facing is that Pig starts >160 scripts
> > although 10 would be sufficient. Is there some way to define the
> > number of scripts Pig starts in a `STREAM THROUGH` step? I tried to
> > set default_parallel to 10, but it doesn't seem to have any effect.
> >
> > I'm also open to any other ideas on how to accomplish the task.
> >
> > Regards,
> >         Thomas Bach.
> >
>
+
Prasanth J 2012-12-25, 12:46
+
Mohammad Tariq 2012-12-25, 05:49