Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - pig needed?

Copy link to this message
RE: pig needed?
Olga Natkovich 2010-11-16, 18:36
Functions or rather inline macros are coming in Pig 0.9: http://wiki.apache.org/pig/TuringCompletePig


-----Original Message-----
From: Anze [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, November 16, 2010 9:34 AM
Subject: Re: pig needed?
My 0.02 EUR:
I think it's not the learning curve that makes Pig a better tool for some
applications. In my experience the learning curve is even *steeper* for pig
than for raw MR. MR can very easily be learned from Tom White's book while Pig
- well, Pig is in there too, but it's quite a short chapter and lacks good
examples. The online tutorials however are almost non-existent, or at least I
couldn't find any.

Where Pig excels is the power with which you can manipulate data. You can
write complex queries in just a few lines whereas with MR you end up writing
hundreds of lines of code.

The major drawback of Pig however (in my limited experience) is its lack of
functions (or objects :), making any larger piece of code spaghetti-like.
Also, it is still very much evolving so if you are dealing with anything else
than raw HDFS files... well, good luck. :)

While we are at it, I am curios how other users use Pig? I am writing in
PigPen Eclipse plugin and then copy+paste to Pig shell (I wasn't able to make
PigPen work with cluster directly), which is pretty cumbersome. So this is
another downside for me.

But I still love Pig as it makes me control the data much more easily and it
makes writing ad-hoc queries much easier. And it will only get better with

But if your code works in MR, why rewrite it? Let it be, unless you have
problems with the code and needs to be rewritten anyway.


On Tuesday 16 November 2010, Renato Marroquín Mogrovejo wrote:
> Pig has some clear advantages over raw mapreduce code, but IHMO the most
> important is the learning curve. But, if you are just loading, probably you
> don't want to just translate it into pig, well, maybe just for the fun of
> it (: but if you are planing to do some more other operations like joining
> or grouping, it would be a lot more simple to do it from pig.
> Give this a look, it will help you understand better the bigger picture.
> http://www.slideshare.net/hadoop/practical-problem-solving-with-apache-hado
> op-pig
> Renato M.
> If you already have it as a hadoop job, why would you want it pass to pig?
> 2010/11/15 Gerrit van Vuuren <[EMAIL PROTECTED]>
> > Is this a bot?
> >
> > Y si no, si puedes utilizar pig anque te consejo reutilizar lo que ya se
> > ha desarollado y no repetir udfs si existe :)
> >
> >
> > ----- Original Message -----
> > From: Cornelio Iñigo <[EMAIL PROTECTED]>
> > Sent: Mon Nov 15 20:48:35 2010
> > Subject: pig needed?
> >
> > Hi
> >
> > My name is Cornelio Iñigo and I´m a developer just beginning with this of
> > hadoop and pig.
> > I have a doubt about developing an application on pig, I already have my
> > program on hadoop, this program gets just a column from a dataset (csv
> > file)
> > and process this data with some functions (like language analisis,
> > analysis of the content)
> >
> >  note that in the process of the file I dont use FILTERS COUNTS or any
> >
> > built
> > in function of Pig, I think that all the fucntions have to be User
> > Defined Functions
> >
> >  so Is a good idea (has sense ) to develop this program in Pig?
> >
> > Thanks in advice
> > --
> > *Cornelio*