

Limit number of Streaming Programs
Hi,
I have around 4 million time series. ~1000 of them had a special occurrence at some point. Now, I want to draw 10 samples for each special timeseries based on a similarity comparison. What I have currently implemented is a script in Python which consumes timeseries onebyone and does a comparison with all 1000 special timeseries. If the similarity is sufficient with one of them I pass it back to Pig and strike out the according special timeseries, subsequent timeseries will not be compared against this one. This routine runs, but it lasts around 6 hours. One of the problems I'm facing is that Pig starts >160 scripts although 10 would be sufficient. Is there some way to define the number of scripts Pig starts in a `STREAM THROUGH` step? I tried to set default_parallel to 10, but it doesn't seem to have any effect. I'm also open to any other ideas on how to accomplish the task. Regards, Thomas Bach. +
Cheolsoo Park 20121224, 22:15
+
Kshiva Kps 20121225, 05:39
+
Prasanth J 20121225, 12:46
+
Mohammad Tariq 20121225, 05:49


