|
|
Norbert Burger 2012-03-21, 13:49
Folks -- how are folks handling the "productionalization" of their Pig submit nodes?
For our PROD environment, I originally thought we'd just have a few VMs from which Pig jobs would be submitted onto our cluster. But on our 8GB VMs, I found that we were often hitting heap OOM errors on a relatively small set of approx. 50 analytics jobs. As a short-term solution, we ended up scaling these VMs horizontally, which seemed a bit messy to me, since we have to manage which jobs are executed where.
Is this heap footprint (300-400 MB/per Pig process) consistent with your environment?
Norbert
Prashant Kommireddi 2012-03-21, 14:18
Norbert,
You mean 8GB memory on client side to launch Pig right? That seems like a lot for simply spawning jobs. We use Azkaban to schedule jobs and there are 10s of jobs spawned at once. Pig by itself should not be so memory intensive.
Thanks, Prashant
On Mar 21, 2012, at 6:50 AM, Norbert Burger <[EMAIL PROTECTED]> wrote:
> Folks -- how are folks handling the "productionalization" of their Pig > submit nodes? > > For our PROD environment, I originally thought we'd just have a few VMs > from which Pig jobs would be submitted onto our cluster. But on our 8GB > VMs, I found that we were often hitting heap OOM errors on a relatively > small set of approx. 50 analytics jobs. As a short-term solution, we ended > up scaling these VMs horizontally, which seemed a bit messy to me, since we > have to manage which jobs are executed where. > > Is this heap footprint (300-400 MB/per Pig process) consistent with your > environment? > > Norbert
Norbert Burger 2012-03-21, 14:55
Hi Prashant -- yes, 8 GB total RAM, but we're seeing 300-400 MB heap consumption per Pig invocation client-side.
We're also migrating soon to Azkaban, but it doesn't seem like it'd resolve this issue, since from what I understand it simply wraps Grunt.
Norbert
On Wed, Mar 21, 2012 at 10:18 AM, Prashant Kommireddi <[EMAIL PROTECTED]>wrote:
> Norbert, > > You mean 8GB memory on client side to launch Pig right? That seems > like a lot for simply spawning jobs. We use Azkaban to schedule jobs > and there are 10s of jobs spawned at once. Pig by itself should not be > so memory intensive. > > Thanks, > Prashant > > On Mar 21, 2012, at 6:50 AM, Norbert Burger <[EMAIL PROTECTED]> > wrote: > > > Folks -- how are folks handling the "productionalization" of their Pig > > submit nodes? > > > > For our PROD environment, I originally thought we'd just have a few VMs > > from which Pig jobs would be submitted onto our cluster. But on our 8GB > > VMs, I found that we were often hitting heap OOM errors on a relatively > > small set of approx. 50 analytics jobs. As a short-term solution, we > ended > > up scaling these VMs horizontally, which seemed a bit messy to me, since > we > > have to manage which jobs are executed where. > > > > Is this heap footprint (300-400 MB/per Pig process) consistent with your > > environment? > > > > Norbert >
|
|