Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Having troubles with PigStorage


+
William Oberman 2012-11-06, 20:20
Copy link to this message
-
Re: Having troubles with PigStorage
Hi Will,

>> data = LOAD 'hdfs://ZZZ/tmp/test' USING PigStorage() AS
(key:chararray,columns:bag {column:tuple (name, value)});

Can you please provide some of your data from this file
(hdfs://ZZZ/tmp/test) that can help us to reproduce your problem? 1 ~ 2
rows would be sufficient.

Thanks,
Cheolsoo

On Tue, Nov 6, 2012 at 12:20 PM, William Oberman
<[EMAIL PROTECTED]>wrote:

> I'm trying to play around with Amazon EMR, and I currently have self hosted
> Cassandra as the source of data.  I was going to try to do: Cassandra -> S3
> -> EMR.  I've traced my problems to PigStorage.  At this point I can
> recreate my problem "locally" without involving S3 or Amazon.
>
> In my local test environment I have this script:
>
> data = LOAD 'cassandra://XXX/YYY' USING CassandraStorage() AS
> (key:chararray, columns:bag {column:tuple (name, value)});
>
> STORE data INTO 'hdfs://ZZZ/tmp/test' USING PigStorage();
>
>
> I can verify that HDFS file looks vaguely correct (\t separated fields,
> return separated lines, my data is in the right spots).
>
>
> Then if I do:
>
> data = LOAD 'hdfs://ZZZ/tmp/test' USING PigStorage() AS (key:chararray,
> columns:bag {column:tuple (name, value)});
>
> keys = FOREACH data GENERATE key;
>
> DUMP keys;
>
>
> I can see that data is wrong.  In the dump sometimes I see keys, sometimes
> I see columns, and sometimes I see a mismatch of keys/columns lumped
> together.
>
>
> As far as I can tell PigStorage is unable to parse the data it just
> persisted.  I've tried pig 0.8, 0.9 and 0.10 with the same results.
>
>
> In terms of my data:
>
> key = URI (ASCII)
>
> columns = binary UUID -> JSON (ASCII)
>
>
> Any ideas?  Next I guess I'll see what kind of debugging is in pig in the
> STORE/LOAD processes.
>
>
> Thanks!
>
>
> will
>
+
William Oberman 2012-11-06, 21:29
+
Cheolsoo Park 2012-11-06, 21:35
+
William Oberman 2012-11-06, 21:50
+
William Oberman 2012-11-06, 22:01
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB