|
Malcolm Tye
2013-01-04, 17:35
Jonathan Coveney
2013-01-04, 19:07
Cheolsoo Park
2013-01-04, 19:07
Russell Jurney
2013-01-04, 22:04
Malcolm Tye
2013-01-07, 11:16
Cheolsoo Park
2013-01-07, 19:55
Cheolsoo Park
2013-01-07, 19:56
Dmitriy Ryaboy
2013-01-08, 07:36
Malcolm Tye
2013-01-21, 14:01
|
-
Making Pig run faster in local modeMalcolm Tye 2013-01-04, 17:35
Hi,
Any ideas on how to make Pig run quicker when running it in local mode ? I'm processing 3 files of about 13MB each with 3 group by statements in my script which seem to suck up the time. There's no joins Increasing the heap size has made no difference and it doesn't use all that anyway. I'm on default settings apart from that. Thanks Malc
-
Re: Making Pig run faster in local modeJonathan Coveney 2013-01-04, 19:07
How long is it taking?
2013/1/4 Malcolm Tye <[EMAIL PROTECTED]> > Hi, > > Any ideas on how to make Pig run quicker when running it in > local mode ? > > > > I'm processing 3 files of about 13MB each with 3 group by statements in my > script which seem to suck up the time. There's no joins > > > > Increasing the heap size has made no difference and it doesn't use all that > anyway. > > > > I'm on default settings apart from that. > > > > > > Thanks > > > > Malc > >
-
Re: Making Pig run faster in local modeCheolsoo Park 2013-01-04, 19:07
Hi Malc,
Unless I am mistaken, all operations happen serially in local mode, so a group by will be always performed by a single reducer. Either you can use MR mode to take advantage of parallel, or you can reduce the size of data to be grouped if possible. Hope this is helpful. Thanks, Cheolsoo On Fri, Jan 4, 2013 at 9:35 AM, Malcolm Tye <[EMAIL PROTECTED]>wrote: > Hi, > > Any ideas on how to make Pig run quicker when running it in > local mode ? > > > > I'm processing 3 files of about 13MB each with 3 group by statements in my > script which seem to suck up the time. There's no joins > > > > Increasing the heap size has made no difference and it doesn't use all that > anyway. > > > > I'm on default settings apart from that. > > > > > > Thanks > > > > Malc > >
-
Re: Making Pig run faster in local modeRussell Jurney 2013-01-04, 22:04
+1 wasn't there a slowdown bug a little while ago?
What version of Pig? On Jan 4, 2013 11:07 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > How long is it taking? > > > 2013/1/4 Malcolm Tye <[EMAIL PROTECTED]> > > > Hi, > > > > Any ideas on how to make Pig run quicker when running it > in > > local mode ? > > > > > > > > I'm processing 3 files of about 13MB each with 3 group by statements in > my > > script which seem to suck up the time. There's no joins > > > > > > > > Increasing the heap size has made no difference and it doesn't use all > that > > anyway. > > > > > > > > I'm on default settings apart from that. > > > > > > > > > > > > Thanks > > > > > > > > Malc > > > > >
-
RE: Making Pig run faster in local modeMalcolm Tye 2013-01-07, 11:16
Hi,
It's Pig 0.10.0. Here's some timings I took. I have more than 3 files to process, but I just started out with 3 files to get some numbers. # Files Time(s) 1 28 2 48 3 73 Cheolsoo, the documentation does seem to indicate that you will only get 1 reducer when running in local mode, and I've tested this out using the parallel statement on the group by's to verify that is the case. When you say to use MR mode, do you mean install hadoop onto the node ? Thanks Malc -----Original Message----- From: Russell Jurney [mailto:[EMAIL PROTECTED]] Sent: 04 January 2013 22:05 To: [EMAIL PROTECTED] Subject: Re: Making Pig run faster in local mode +1 wasn't there a slowdown bug a little while ago? What version of Pig? On Jan 4, 2013 11:07 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > How long is it taking? > > > 2013/1/4 Malcolm Tye <[EMAIL PROTECTED]> > > > Hi, > > > > Any ideas on how to make Pig run quicker when running it > in > > local mode ? > > > > > > > > I'm processing 3 files of about 13MB each with 3 group by statements in > my > > script which seem to suck up the time. There's no joins > > > > > > > > Increasing the heap size has made no difference and it doesn't use all > that > > anyway. > > > > > > > > I'm on default settings apart from that. > > > > > > > > > > > > Thanks > > > > > > > > Malc > > > > >
-
Re: Making Pig run faster in local modeCheolsoo Park 2013-01-07, 19:55
Hi Malc,
>> When you say to use MR mode, do you mean install hadoop onto the node ? I meant the cluster mode, but given the size of your input files, it makes much sense to run them in cluster. Instead, you might consider to execute jobs in parallel in local mode if it's possible to process input files in parallel. I uploaded example scripts here <http://people.apache.org/~cheolsoo/pig/>. Please note that you must use Hadoop 0.23.x or 2.0.x for this because the LocalJobRunner of previous Hadoop versions is not thread-safe. Also note that you might have to use installed Hadoop with pig-withouthadoop.jar instead of the standalone pig.jar. When I was testing this with the trunk version, I ran into a problem with pig.jar in Hadoop-2.0.x. (This is a separate issue that I should fix.) Thanks, Cheolsoo On Mon, Jan 7, 2013 at 3:16 AM, Malcolm Tye <[EMAIL PROTECTED]>wrote: > Hi, > It's Pig 0.10.0. Here's some timings I took. I have more than 3 > files to process, but I just started out with 3 files to get some numbers. > > # Files Time(s) > 1 28 > 2 48 > 3 73 > > > Cheolsoo, the documentation does seem to indicate that you will only get 1 > reducer when running in local mode, and I've tested this out using the > parallel statement on the group by's to verify that is the case. When you > say to use MR mode, do you mean install hadoop onto the node ? > > > Thanks > > Malc > > -----Original Message----- > From: Russell Jurney [mailto:[EMAIL PROTECTED]] > Sent: 04 January 2013 22:05 > To: [EMAIL PROTECTED] > Subject: Re: Making Pig run faster in local mode > > +1 wasn't there a slowdown bug a little while ago? > > What version of Pig? > On Jan 4, 2013 11:07 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote: > > > How long is it taking? > > > > > > 2013/1/4 Malcolm Tye <[EMAIL PROTECTED]> > > > > > Hi, > > > > > > Any ideas on how to make Pig run quicker when running > it > > in > > > local mode ? > > > > > > > > > > > > I'm processing 3 files of about 13MB each with 3 group by statements in > > my > > > script which seem to suck up the time. There's no joins > > > > > > > > > > > > Increasing the heap size has made no difference and it doesn't use all > > that > > > anyway. > > > > > > > > > > > > I'm on default settings apart from that. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > Malc > > > > > > > > > >
-
Re: Making Pig run faster in local modeCheolsoo Park 2013-01-07, 19:56
Typo: it makes much sense to run them in cluster => it doesn't make much
sense to run them in cluster. On Mon, Jan 7, 2013 at 11:55 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote: > it makes much sense to run them in cluster.
-
Re: Making Pig run faster in local modeDmitriy Ryaboy 2013-01-08, 07:36
Try jstacking it a few times while it's running. Is it just sitting idly in
a sleep() ? On Mon, Jan 7, 2013 at 11:56 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote: > Typo: it makes much sense to run them in cluster => it doesn't make much > sense to run them in cluster. > > On Mon, Jan 7, 2013 at 11:55 AM, Cheolsoo Park <[EMAIL PROTECTED] > >wrote: > > > it makes much sense to run them in cluster. >
-
RE: Making Pig run faster in local modeMalcolm Tye 2013-01-21, 14:01
Hi Dmitriy,
It's not that it hangs at any point I think. It just seems to be slow in general. I tried jstacking the process, but the output didn't seem to change, so I think it's just a general slowness Thanks for all the input, we've managed to workaround it by rescheduling and if we need to scale any further, we'll install a 1-node Hadoop cluster. Thanks Malc -----Original Message----- From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]] Sent: 08 January 2013 07:36 To: [EMAIL PROTECTED] Subject: Re: Making Pig run faster in local mode Try jstacking it a few times while it's running. Is it just sitting idly in a sleep() ? On Mon, Jan 7, 2013 at 11:56 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote: > Typo: it makes much sense to run them in cluster => it doesn't make > much sense to run them in cluster. > > On Mon, Jan 7, 2013 at 11:55 AM, Cheolsoo Park <[EMAIL PROTECTED] > >wrote: > > > it makes much sense to run them in cluster. > |