Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - newbie just not getting structure


+
Lauren Blau 2012-08-15, 11:44
Copy link to this message
-
Re: newbie just not getting structure
Cheolsoo Park 2012-08-15, 23:06
Hi,

What's the content of data/file like? Given your description, I guess that
it looks as follows:

[id#ID1,documentDate#1344461328851,source#93931,indexed#false,lastModifiedDate#1344461328851,contexts#{([id#CID1])}]

But this is not map literal format. If you change it to:

[id#ID1]
[documentDate#1344461328851]
[source#93931]
[indexed#false]
[lastModifiedDate#1344461328851]
[contexts#{([id#CID1])}]

then you can load it as map:

>> a = load 'data/file'  using PigStorage(',') as (m:map[]);
>> dump a;

([id#ID1])
([documentDate#1344461328851])
([source#93931])
([indexed#false])
([lastModifiedDate#1344461328851])
([contexts#{([id#CID1])}])

Furthermore, you can do:

>> b = foreach a generate $0#'id';
>> dump b;

(ID1)
()
()
()
()
()

This is what you expect, no?

Thanks,
Cheolsoo
On Wed, Aug 15, 2012 at 4:44 AM, Lauren Blau <
[EMAIL PROTECTED]> wrote:

> I'm having problems with understanding storage structures. Here's what I
> did:
>
> on the cluster I loaded some data and created a relation with one row.
> I output the row using store relation into '/file' using PigStorage('|');
> then I copied it my local workspace, copyToLocal /file ./file
> then I tarred up the local file and scp'd it to my laptop.
>
> on my laptop I untarred the file into data/file
> then I ran these pig commands:
>
> b = load 'data/file' using PigStorage('|') as (a:map[]); --because I'm
> expecting a map
> dump b;
>
> return is successful but result is ().
>
> then I ran
> c = foreach b generate *;
> dump c;
>
> return is successful but result is ().
>
> then I tried
>
> d = load 'data/file' using PigStorage('|');
> dump d;
>
> return
> is
> ([id#ID1,documentDate#1344461328851,source#93931,indexed#false,lastModifiedDate#1344461328851,contexts#{([id#CID1])}])
>
> since that is a map, I'm not sure why dump b didn't return values. so then
> I tried
> e = foreach d generate $0#'id';
> dump e;
>
> and the return was ();
>
> Does anyone see where I'm missing the point? And how do I grab those map
> values?
>
> Thanks
>
+
Lauren Blau 2012-08-16, 09:48
+
Lauren Blau 2012-08-21, 12:14