Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Unable to load data using PigStorage that was previously stored using PigStorage


Copy link to this message
-
Re: Unable to load data using PigStorage that was previously stored using PigStorage
Hi Jerry,

Map values by default are bytearrays. If you need them to be any other
type, you would need to define it explicitly. In your case, since you want
them to be treated as bags

A = load 'data.txt' as document:map[bag{}];

An issue with your dataset is that the type of values in map is not
consistent with 1 being a chararray/bytearray "hello" and the 2nd a bag
"{([c#11,d#22]),([c#33,d#44])}". This is not permitted as the values all
have to be of the same type.

Instead your dataset should have all values as bags for your query to work,
for eg
[a#{(hello)},b#{([c#11,d#22]),([c#33,d#44])}]

A = load 'data.txt' as document:map[bag{}];
B = foreach A generate document#'b' as b;
C = foreach B generate flatten(b);
dump C;

On Tue, Apr 16, 2013 at 6:28 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:

> Hi pig users,
>
> I tried to load data using PigStorage that was previously stored using
> PigStorage but it failed.
>
> Each line looks like this in the data file that is generated by PigStorage:
> [a#hello,b#{([c#11,d#22]),([c#33,d#44])}]
>
> I did the following:
> A = load 'data.txt' as document:[];
> B = foreach A generate document#'b' as b;
> C = foreach B generate flatten(b);
> dump C;
>
> I expect to see the following output:
> ([c#11,d#22])
> ([c#33,d#44])
>
> Instead, I got:
> java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be
> cast to org.apache.pig.data.DataBag
>
> Anyone encounters this problem before? How can I read the data back?
>
> Thanks,
>
> Jerry
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB