


Processing hierarchical information in Pig
Hi All, How do I represent hierarchical information in flat file and process it in Pig?
Let’s say I have objects of type A. I want to have a Tree representation with their parentchild relationships.
In scenario 1: A1 points to A2; A2 points to A3; A3 points to A4; A4 points to Am and so on till An. Given above definition; I want to be able to answer following :
Child(A1) = A2 Parent(A4) = A3 Descendant(A1) = A2,A3,A4, Am… An Ancestor(A4) = A3,A2,A1 Ancestor (An) = Am,…A4,A3,A2,A1
Can this be represented in text file and queried in Pig.
Appreciate any pointers/suggestions. Thanks!

Re: Processing hierarchical information in Pig
Prash, you can just model this tree as a simple graph adjacency list:
A1,A2 A2,A3 A3,A4 A4,Am ...
For nodes with more than one child, you simply extend each row horizontally. Child/parent/descendant/ancestor are straightforward applications of a traversal on this graph (BFS would be a good choice).
Norbert
Re: Processing hierarchical information in Pig
Thank You Norbert for your reply.
I am still not sure how/if I can do this through Pig script, though? Given the below adjacency list, How would I find ancestors of R (P,C,A) in Pig?
cat testgraph.txt
A B:C
B D
C P
P Q:R
>From the pig script I can get immediate parent, i.e. C
But wouldn’t I need multiple iterations to get parent of C?
script
A=LOAD 'testgraph.txt' USING PigStorage() AS (subject:chararray, link:chararray);
B = FOREACH A GENERATE subject, FLATTEN(STRSPLIT(link,':',2)) AS L;
C = FILTER B BY $1 == 'P'
Re: Processing hierarchical information in Pig
