Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Limit number of Streaming Programs


Copy link to this message
-
Re: Limit number of Streaming Programs
Folks on the list need some time mate. I have specified a couple of links
on the other thread of yours. Check it out and see if it helps.

Best Regards,
Tariq
+91-9741563634
https://mtariq.jux.com/
On Tue, Dec 25, 2012 at 11:09 AM, Kshiva Kps <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Is there any PIG editors and where we can write 100 to 150 pig scripts
> I'm believing is not possible to  do in CLI mode .
> Like IDE for JAVA /TOAD for SQL pls advice , many thanks
>
> Thnaks
>
>
> On Tue, Dec 25, 2012 at 3:45 AM, Cheolsoo Park <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Thomas,
> >
> > If I understand your question correctly, what you want is reduce the
> number
> > of mappers that spawn streaming processes. The default-parallel controls
> > the number of reducers, so it won't have any effect to the number of
> > mappers. Although the number of mappers is auto-determined by the size of
> > input data, you can try to set "pig.maxCombinedSplitSize" to combine
> input
> > files into bigger ones. For more details, please refer to:
> > http://pig.apache.org/docs/r0.10.0/perf.html#combine-files
> >
> > You can also read a discussion on a similar topic here:
> >
> >
> http://search-hadoop.com/m/J5hCw1UdxTa/How+can+I+set+the+mapper+number&subj=How+can+I+set+the+mapper+number+for+pig+script+
> >
> > Thanks,
> > Cheolsoo
> >
> >
> > On Tue, Dec 18, 2012 at 12:00 PM, Thomas Bach
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Hi,
> > >
> > > I have around 4 million time series. ~1000 of them had a special
> > > occurrence at some point. Now, I want to draw 10 samples for each
> > > special time-series based on a similarity comparison.
> > >
> > > What I have currently implemented is a script in Python which consumes
> > > time-series one-by-one and does a comparison with all 1000 special
> > > time-series. If the similarity is sufficient with one of them I pass
> > > it back to Pig and strike out the according special time-series,
> > > subsequent time-series will not be compared against this one.
> > >
> > > This routine runs, but it lasts around 6 hours.
> > >
> > > One of the problems I'm facing is that Pig starts >160 scripts
> > > although 10 would be sufficient. Is there some way to define the
> > > number of scripts Pig starts in a `STREAM THROUGH` step? I tried to
> > > set default_parallel to 10, but it doesn't seem to have any effect.
> > >
> > > I'm also open to any other ideas on how to accomplish the task.
> > >
> > > Regards,
> > >         Thomas Bach.
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB