Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Determining the group-by column


Copy link to this message
-
RE: Determining the group-by column
Santhosh Srinivasan 2009-02-17, 05:03
Cogroup has inner plans that compute the group by attributes. Instead of
looking at the predecessor(s), you should navigate the inner plan of
cogroup. Check out the code in
src/org/apache/pig/impl/logicalLayer/validators/TypeCheckingVisitor.java
(visit(LOCogroup ...) method)

Santhosh

-----Original Message-----
From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
Sent: Sunday, February 15, 2009 9:07 PM
To: [EMAIL PROTECTED]
Subject: Determining the group-by column

We are working on the Pig Logical Optimizer, and running into some
difficulty navigating the plan.

If we run explain on a query with a CoGroup, we get something like:

Cogroup
|    |
|    |-- Project [0]
|
|------ ForEach
            |  <etc>

What we want to do is determine that this particular Cogroup operates on
a
projection of field 0.

If we create a new LogicalTransformer that is applied to Cogroup
operators,
and call

mPlan.getPredecessors(ourCogroupOperator) , we only get the ForEach.
Calling getSuccessors results in a null being returned (Cogroup is
indeed
the root).

How do we find the Project operator above? What is its relationship,
plan-wise, with the Cogroup operator?

Thanks a lot,

Dmitriy Ryaboy, Ashutosh Chauhan, Tejal Desai