Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Schema questions


Copy link to this message
-
Schema questions
Mohit Anchlia 2012-09-15, 01:08
>From what I've learnt that if I have schema like
b:bag{t:tuple(name:charray, address:chararray)} then I can't really
reference tuple by name directly i.e b.t.name. It is b.name and if I
FLATTEN then just name. My question is can I have bag inside a bag so that
I get 2 seprate names that I can reference?

When I read my json it gets broken down into multiple dimensions and I want
to be able to reference each by name. This works fine as long as it
generates one field. But in some cases same dimensions are array so in that
case I need some way where I can reference them. for instance my json might
look like.

name: abc
address: def
customvar: [name:c1,namec2]

If I can put customvar in a different bag in this schema
b:bag{t:tuple(name:charray, address:chararray)}  would be helpful. This
will help me access all tuples in foreach for customvar bag in pig.