Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Making Pig run faster in local mode


Copy link to this message
-
Re: Making Pig run faster in local mode
Hi Malc,

>> When you say to use MR mode, do you mean install hadoop onto the node ?

I meant the cluster mode, but given the size of your input files, it makes
much sense to run them in cluster.

Instead, you might consider to execute jobs in parallel in local mode if
it's possible to process input files in parallel. I uploaded example
scripts here <http://people.apache.org/~cheolsoo/pig/>. Please note that
you must use Hadoop 0.23.x or 2.0.x for this because the LocalJobRunner of
previous Hadoop versions is not thread-safe. Also note that you might have
to use installed Hadoop with pig-withouthadoop.jar instead of the
standalone pig.jar. When I was testing this with the trunk version, I ran
into a problem with pig.jar in Hadoop-2.0.x. (This is a separate issue that
I should fix.)

Thanks,
Cheolsoo

On Mon, Jan 7, 2013 at 3:16 AM, Malcolm Tye <[EMAIL PROTECTED]>wrote:

> Hi,
>            It's Pig 0.10.0. Here's some timings I took. I have more than 3
> files to process, but I just started out with 3 files to get some numbers.
>
> # Files         Time(s)
> 1               28
> 2               48
> 3               73
>
>
> Cheolsoo, the documentation does seem to indicate that you will only get 1
> reducer when running in local mode, and I've tested this out using the
> parallel statement on the group by's to verify that is the case. When you
> say to use MR mode, do you mean install hadoop onto the node ?
>
>
> Thanks
>
> Malc
>
> -----Original Message-----
> From: Russell Jurney [mailto:[EMAIL PROTECTED]]
> Sent: 04 January 2013 22:05
> To: [EMAIL PROTECTED]
> Subject: Re: Making Pig run faster in local mode
>
> +1 wasn't there a slowdown bug a little while ago?
>
> What version of Pig?
> On Jan 4, 2013 11:07 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
>
> > How long is it taking?
> >
> >
> > 2013/1/4 Malcolm Tye <[EMAIL PROTECTED]>
> >
> > > Hi,
> > >
> > >                 Any ideas on how to make Pig run quicker when running
> it
> > in
> > > local mode ?
> > >
> > >
> > >
> > > I'm processing 3 files of about 13MB each with 3 group by statements in
> > my
> > > script which seem to suck up the time. There's no joins
> > >
> > >
> > >
> > > Increasing the heap size has made no difference and it doesn't use all
> > that
> > > anyway.
> > >
> > >
> > >
> > > I'm on default settings apart from that.
> > >
> > >
> > >
> > >
> > >
> > > Thanks
> > >
> > >
> > >
> > > Malc
> > >
> > >
> >
>
>