Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> newbie just not getting structure


+
Lauren Blau 2012-08-15, 11:44
Copy link to this message
-
Re: newbie just not getting structure
Hi,

What's the content of data/file like? Given your description, I guess that
it looks as follows:

[id#ID1,documentDate#1344461328851,source#93931,indexed#false,lastModifiedDate#1344461328851,contexts#{([id#CID1])}]

But this is not map literal format. If you change it to:

[id#ID1]
[documentDate#1344461328851]
[source#93931]
[indexed#false]
[lastModifiedDate#1344461328851]
[contexts#{([id#CID1])}]

then you can load it as map:

>> a = load 'data/file'  using PigStorage(',') as (m:map[]);
>> dump a;

([id#ID1])
([documentDate#1344461328851])
([source#93931])
([indexed#false])
([lastModifiedDate#1344461328851])
([contexts#{([id#CID1])}])

Furthermore, you can do:

>> b = foreach a generate $0#'id';
>> dump b;

(ID1)
()
()
()
()
()

This is what you expect, no?

Thanks,
Cheolsoo
On Wed, Aug 15, 2012 at 4:44 AM, Lauren Blau <
[EMAIL PROTECTED]> wrote:

> I'm having problems with understanding storage structures. Here's what I
> did:
>
> on the cluster I loaded some data and created a relation with one row.
> I output the row using store relation into '/file' using PigStorage('|');
> then I copied it my local workspace, copyToLocal /file ./file
> then I tarred up the local file and scp'd it to my laptop.
>
> on my laptop I untarred the file into data/file
> then I ran these pig commands:
>
> b = load 'data/file' using PigStorage('|') as (a:map[]); --because I'm
> expecting a map
> dump b;
>
> return is successful but result is ().
>
> then I ran
> c = foreach b generate *;
> dump c;
>
> return is successful but result is ().
>
> then I tried
>
> d = load 'data/file' using PigStorage('|');
> dump d;
>
> return
> is
> ([id#ID1,documentDate#1344461328851,source#93931,indexed#false,lastModifiedDate#1344461328851,contexts#{([id#CID1])}])
>
> since that is a map, I'm not sure why dump b didn't return values. so then
> I tried
> e = foreach d generate $0#'id';
> dump e;
>
> and the return was ();
>
> Does anyone see where I'm missing the point? And how do I grab those map
> values?
>
> Thanks
>
+
Lauren Blau 2012-08-16, 09:48
+
Lauren Blau 2012-08-21, 12:14
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB