Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> newbie just not getting structure


Copy link to this message
-
Re: newbie just not getting structure
Still not getting it. A similar problem is occurring:
I have a file which I believe contains structures like,
("string1","string2",{[]})
and if I load it as  (messageId:chararray, documentName:chararray,
 annot:map[])
I can dump it, and I can define:
foo = foreach row generate messageId as messageId:chararray,documentName as
documentName:chararray,annot#'prefix' as apre:chararray, annot#'label' as
alabel:chararray ..), and can dump foo and see my results as expected

if I try
x = filter foo by apre == 'VALUE';
I get 0 rows back and I see a warning about
FIELD_DISCARDED_CONVERSION_FAILED

but if I store foo into a file using
store foo into '/filefoo';
and then define
foo2 = load '/filefoo' as
(messageId:chararray,documentName:chararray,apre:chararray,alabel:chararray
..)
then
y = filter foo2 by apre == 'VALUE'
I get back the rows I expect.

would some please explain what the difference between the 2 is? Why should
storing and re-reading the data make a difference? What am I missing?
Thanks.

On Wed, Aug 15, 2012 at 7:44 AM, Lauren Blau <
[EMAIL PROTECTED]> wrote:

> I'm having problems with understanding storage structures. Here's what I
> did:
>
> on the cluster I loaded some data and created a relation with one row.
> I output the row using store relation into '/file' using PigStorage('|');
> then I copied it my local workspace, copyToLocal /file ./file
> then I tarred up the local file and scp'd it to my laptop.
>
> on my laptop I untarred the file into data/file
> then I ran these pig commands:
>
> b = load 'data/file' using PigStorage('|') as (a:map[]); --because I'm
> expecting a map
> dump b;
>
> return is successful but result is ().
>
> then I ran
> c = foreach b generate *;
> dump c;
>
> return is successful but result is ().
>
> then I tried
>
> d = load 'data/file' using PigStorage('|');
> dump d;
>
> return
> is ([id#ID1,documentDate#1344461328851,source#93931,indexed#false,lastModifiedDate#1344461328851,contexts#{([id#CID1])}])
>
> since that is a map, I'm not sure why dump b didn't return values. so then
> I tried
> e = foreach d generate $0#'id';
> dump e;
>
> and the return was ();
>
> Does anyone see where I'm missing the point? And how do I grab those map
> values?
>
> Thanks
>
>
>
>