-RE: Pig questions
Amir Youssefi 2008-05-20, 19:59
Please let us know when we have contrib part ready so I kick off
pushing selection of many UDFs we have at Yahoo to contrib.
From: Chris Olston [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, May 20, 2008 12:55 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; Rosie Jones
Subject: fwd: Pig questions
James: forwarding your mail to the pig-user mailing list (probably a
good idea for you and/or your students to subscribe).
Regarding Hadoop and "hadoop-on-demand", I do not know the answer but
will forward your question to the hadoop guys.
Regarding a function library for Pig that would include Tokenize and
other useful functions, I know that such a library does exist within
Yahoo, and there is an effort underway to create a public library of Pig
functions that would include string manipulations such as Tokenize, as
well as some basic math functionality and other items.
The contact person for this effort is Olga Natkovich (olgan@yahoo-
inc.com) -- perhaps you can send a list of functions you'd like to see
to her, and if all goes well they will go into a public library over the
Begin forwarded message:
> From: James Allan <[EMAIL PROTECTED]>
> Date: May 20, 2008 12:19:59 PM PDT
> To: Chris Olston <[EMAIL PROTECTED]>
> Cc: Rosie Jones <[EMAIL PROTECTED]>
> Subject: Re: PIG requests/suggestions/complaints?
> After months of distractions, we're getting back to using PIG for some
> projects this summer. I have an urgent question about hadoop and then
> a less urgent question about PIG for you.
> The urgent question regards getting hadoop running on a cluster that
> has the grid engine running. Our problem is that "hadoop on
> demand" uses torque rather than grid engine (which we use here).
> We're trying to hack h.o.d. to use grid engine, but are running into
> problems. We're wondering if there's someone we could talk with about
> that problem.
> The less urgent question involves PIG and utility functions. Our
> biggest problem with PIG is that the "obvious" (to us) functionality
> that one would expect is missing. For example, we can't find a way to
> use PIG to count the occurrences of every word token in a text
> file--viz., we can't tokenize in PIG. To deal with it, we're writing
> our own little modules to extend PIG (as we can starting with 1.2).
> My question is.... is there a library of such added functionality?
> If not, is there a plan to create such a repository?
> -- james
Christopher Olston, Ph.D.
Sr. Research Scientist