Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> newbie just not getting structure


Copy link to this message
-
Re: newbie just not getting structure
Still not getting it. A similar problem is occurring:
I have a file which I believe contains structures like,
("string1","string2",{[]})
and if I load it as  (messageId:chararray, documentName:chararray,
 annot:map[])
I can dump it, and I can define:
foo = foreach row generate messageId as messageId:chararray,documentName as
documentName:chararray,annot#'prefix' as apre:chararray, annot#'label' as
alabel:chararray ..), and can dump foo and see my results as expected

if I try
x = filter foo by apre == 'VALUE';
I get 0 rows back and I see a warning about
FIELD_DISCARDED_CONVERSION_FAILED

but if I store foo into a file using
store foo into '/filefoo';
and then define
foo2 = load '/filefoo' as
(messageId:chararray,documentName:chararray,apre:chararray,alabel:chararray
..)
then
y = filter foo2 by apre == 'VALUE'
I get back the rows I expect.

would some please explain what the difference between the 2 is? Why should
storing and re-reading the data make a difference? What am I missing?
Thanks.

On Wed, Aug 15, 2012 at 7:44 AM, Lauren Blau <
[EMAIL PROTECTED]> wrote:

> I'm having problems with understanding storage structures. Here's what I
> did:
>
> on the cluster I loaded some data and created a relation with one row.
> I output the row using store relation into '/file' using PigStorage('|');
> then I copied it my local workspace, copyToLocal /file ./file
> then I tarred up the local file and scp'd it to my laptop.
>
> on my laptop I untarred the file into data/file
> then I ran these pig commands:
>
> b = load 'data/file' using PigStorage('|') as (a:map[]); --because I'm
> expecting a map
> dump b;
>
> return is successful but result is ().
>
> then I ran
> c = foreach b generate *;
> dump c;
>
> return is successful but result is ().
>
> then I tried
>
> d = load 'data/file' using PigStorage('|');
> dump d;
>
> return
> is ([id#ID1,documentDate#1344461328851,source#93931,indexed#false,lastModifiedDate#1344461328851,contexts#{([id#CID1])}])
>
> since that is a map, I'm not sure why dump b didn't return values. so then
> I tried
> e = foreach d generate $0#'id';
> dump e;
>
> and the return was ();
>
> Does anyone see where I'm missing the point? And how do I grab those map
> values?
>
> Thanks
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB