


Processing hierarchical information in Pig
Hi All, How do I represent hierarchical information in flat file and process it in Pig?
Let’s say I have objects of type A. I want to have a Tree representation with their parentchild relationships.
In scenario 1: A1 points to A2; A2 points to A3; A3 points to A4; A4 points to Am and so on till An. Given above definition; I want to be able to answer following :
Child(A1) = A2 Parent(A4) = A3 Descendant(A1) = A2,A3,A4, Am… An Ancestor(A4) = A3,A2,A1 Ancestor (An) = Am,…A4,A3,A2,A1
Can this be represented in text file and queried in Pig.
Appreciate any pointers/suggestions. Thanks!
+
prash987 prash987 20120229, 14:02

Re: Processing hierarchical information in Pig
Prash, you can just model this tree as a simple graph adjacency list:
A1,A2 A2,A3 A3,A4 A4,Am ...
For nodes with more than one child, you simply extend each row horizontally. Child/parent/descendant/ancestor are straightforward applications of a traversal on this graph (BFS would be a good choice).
Norbert
On Wed, Feb 29, 2012 at 9:02 AM, prash987 prash987 <[EMAIL PROTECTED]>wrote:
> Hi All, > How do I represent hierarchical information in flat file and process it in > Pig? > > Let’s say I have objects of type A. > I want to have a Tree representation with their parentchild > relationships. > > In scenario 1: > A1 points to A2; A2 points to A3; A3 points to A4; A4 points to Am and > so on till An. > Given above definition; I want to be able to answer following : > > Child(A1) = A2 > Parent(A4) = A3 > Descendant(A1) = A2,A3,A4, Am… An > Ancestor(A4) = A3,A2,A1 > Ancestor (An) = Am,…A4,A3,A2,A1 > > Can this be represented in text file and queried in Pig. > > Appreciate any pointers/suggestions. > Thanks! >
+
Norbert Burger 20120229, 16:46

Re: Processing hierarchical information in Pig
Thank You Norbert for your reply.
I am still not sure how/if I can do this through Pig script, though? Given the below adjacency list, How would I find ancestors of R (P,C,A) in Pig?
cat testgraph.txt
A B:C
B D
C P
P Q:R
>From the pig script I can get immediate parent, i.e. C
But wouldn’t I need multiple iterations to get parent of C?
script
A=LOAD 'testgraph.txt' USING PigStorage() AS (subject:chararray, link:chararray);
B = FOREACH A GENERATE subject, FLATTEN(STRSPLIT(link,':',2)) AS L;
C = FILTER B BY $1 == 'P'
 On Wed, Feb 29, 2012 at 10:16 PM, Norbert Burger <[EMAIL PROTECTED]>wrote:
> Prash, you can just model this tree as a simple graph adjacency list: > > A1,A2 > A2,A3 > A3,A4 > A4,Am > ... > > For nodes with more than one child, you simply extend each row > horizontally. Child/parent/descendant/ancestor are straightforward > applications of a traversal on this graph (BFS would be a good choice). > > Norbert > > On Wed, Feb 29, 2012 at 9:02 AM, prash987 prash987 <[EMAIL PROTECTED] > >wrote: > > > Hi All, > > How do I represent hierarchical information in flat file and process it > in > > Pig? > > > > Let’s say I have objects of type A. > > I want to have a Tree representation with their parentchild > > relationships. > > > > In scenario 1: > > A1 points to A2; A2 points to A3; A3 points to A4; A4 points to Am and > > so on till An. > > Given above definition; I want to be able to answer following : > > > > Child(A1) = A2 > > Parent(A4) = A3 > > Descendant(A1) = A2,A3,A4, Am… An > > Ancestor(A4) = A3,A2,A1 > > Ancestor (An) = Am,…A4,A3,A2,A1 > > > > Can this be represented in text file and queried in Pig. > > > > Appreciate any pointers/suggestions. > > Thanks! > > >
+
prash987 prash987 20120229, 19:33

Re: Processing hierarchical information in Pig
Resurfacing the questions. This sounds like a common use case, butI am still stuck :) Thanks! On Thu, Mar 1, 2012 at 1:03 AM, prash987 prash987 <[EMAIL PROTECTED]>wrote:
> Thank You Norbert for your reply. > > > > I am still not sure how/if I can do this through Pig script, though? > Given the below adjacency list, > How would I find ancestors of R (P,C,A) in Pig? > > > > cat testgraph.txt > > A B:C > > B D > > C P > > P Q:R > > > > From the pig script I can get immediate parent, i.e. C > > But wouldn’t I need multiple iterations to get parent of C? > > > > script > > A=LOAD 'testgraph.txt' USING PigStorage() AS (subject:chararray, > link:chararray); > > B = FOREACH A GENERATE subject, FLATTEN(STRSPLIT(link,':',2)) AS L; > > C = FILTER B BY $1 == 'P' > >  > > > On Wed, Feb 29, 2012 at 10:16 PM, Norbert Burger <[EMAIL PROTECTED] > > wrote: > >> Prash, you can just model this tree as a simple graph adjacency list: >> >> A1,A2 >> A2,A3 >> A3,A4 >> A4,Am >> ... >> >> For nodes with more than one child, you simply extend each row >> horizontally. Child/parent/descendant/ancestor are straightforward >> applications of a traversal on this graph (BFS would be a good choice). >> >> Norbert >> >> On Wed, Feb 29, 2012 at 9:02 AM, prash987 prash987 <[EMAIL PROTECTED] >> >wrote: >> >> > Hi All, >> > How do I represent hierarchical information in flat file and process it >> in >> > Pig? >> > >> > Let’s say I have objects of type A. >> > I want to have a Tree representation with their parentchild >> > relationships. >> > >> > In scenario 1: >> > A1 points to A2; A2 points to A3; A3 points to A4; A4 points to Am and >> > so on till An. >> > Given above definition; I want to be able to answer following : >> > >> > Child(A1) = A2 >> > Parent(A4) = A3 >> > Descendant(A1) = A2,A3,A4, Am… An >> > Ancestor(A4) = A3,A2,A1 >> > Ancestor (An) = Am,…A4,A3,A2,A1 >> > >> > Can this be represented in text file and queried in Pig. >> > >> > Appreciate any pointers/suggestions. >> > Thanks! >> > >> > >
+
shan shan 20120302, 11:47

