Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> newbie just not getting structure

Lauren Blau 2012-08-15, 11:44
Cheolsoo Park 2012-08-15, 23:06
Lauren Blau 2012-08-16, 09:48
Copy link to this message
Re: newbie just not getting structure
Still not getting it. A similar problem is occurring:
I have a file which I believe contains structures like,
and if I load it as  (messageId:chararray, documentName:chararray,
I can dump it, and I can define:
foo = foreach row generate messageId as messageId:chararray,documentName as
documentName:chararray,annot#'prefix' as apre:chararray, annot#'label' as
alabel:chararray ..), and can dump foo and see my results as expected

if I try
x = filter foo by apre == 'VALUE';
I get 0 rows back and I see a warning about

but if I store foo into a file using
store foo into '/filefoo';
and then define
foo2 = load '/filefoo' as
y = filter foo2 by apre == 'VALUE'
I get back the rows I expect.

would some please explain what the difference between the 2 is? Why should
storing and re-reading the data make a difference? What am I missing?

On Wed, Aug 15, 2012 at 7:44 AM, Lauren Blau <

> I'm having problems with understanding storage structures. Here's what I
> did:
> on the cluster I loaded some data and created a relation with one row.
> I output the row using store relation into '/file' using PigStorage('|');
> then I copied it my local workspace, copyToLocal /file ./file
> then I tarred up the local file and scp'd it to my laptop.
> on my laptop I untarred the file into data/file
> then I ran these pig commands:
> b = load 'data/file' using PigStorage('|') as (a:map[]); --because I'm
> expecting a map
> dump b;
> return is successful but result is ().
> then I ran
> c = foreach b generate *;
> dump c;
> return is successful but result is ().
> then I tried
> d = load 'data/file' using PigStorage('|');
> dump d;
> return
> is ([id#ID1,documentDate#1344461328851,source#93931,indexed#false,lastModifiedDate#1344461328851,contexts#{([id#CID1])}])
> since that is a map, I'm not sure why dump b didn't return values. so then
> I tried
> e = foreach d generate $0#'id';
> dump e;
> and the return was ();
> Does anyone see where I'm missing the point? And how do I grab those map
> values?
> Thanks