Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Unable to load data using PigStorage that was previously stored using PigStorage


Copy link to this message
-
Re: Unable to load data using PigStorage that was previously stored using PigStorage
Jerry Lam 2013-04-18, 14:43
Hi Prashant:

I read about the map data type in the book "Programming Pig", it says:
"... By default there is no requirement that all values in a map must be of
the same type. It is legitimate to have a map with two keys name and age,
where the value for name is a chararray and the value for age is an int.
Beginning in Pig 0.9, a map can declare its values to all be of the same
type... "

I agree that all values in the map can be of the same type but this is not
required in pig.

Best Regards,

Jerry
On Thu, Apr 18, 2013 at 10:37 AM, Jerry Lam <[EMAIL PROTECTED]> wrote:

> Hi Rusian:
>
> I used PigStorage to store the data that is originally using Pig data
> type. It is strange (or a bug in Pig) that I cannot read the data using
> PigStorage that have been stored using PigStorage, isn't it?
>
> Best Regards,
>
> Jerry
>
>
>
> On Wed, Apr 17, 2013 at 10:52 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>wrote:
>
>> The output:
>> ({ ([c#11,d#22]),([c#33,d#44]) })
>> ()
>> looks weird.
>>
>> Jerry, maybe the problem is in using PigStorage. As its javadoc says:
>>
>> A load function that parses a line of input into fields using a character
>> delimiter
>>
>> So I guess this is just for simple csv lines.
>> But you are trying to load a complicated Map structure as it was formatted
>> by previous storing.
>> Probably you'll need to write your own Loader for this. Another hint:
>> using
>> the -schema paramenter to PigStorage, but I am not sure it can help:(
>>
>> Ruslan
>>
>>
>> On Wed, Apr 17, 2013 at 11:48 PM, Jerry Lam <[EMAIL PROTECTED]> wrote:
>>
>> > Hi Rusian:
>> >
>> > I did a describe B followed by a dump B, the output is:
>> > B: {b: {()}}
>> >
>> > ({ ([c#11,d#22]),([c#33,d#44]) })
>> > ()
>> >
>> > but when I executed
>> >
>> > C = foreach B generate flatten(b);
>> >
>> > dump C;
>> >
>> > I got the exception again...
>> >
>> > 2013-04-17 15:47:39,933 [Thread-26] WARN
>> >  org.apache.hadoop.mapred.LocalJobRunner - job_local_0002
>> > java.lang.Exception: java.lang.ClassCastException:
>> > org.apache.pig.data.DataByteArray cannot be cast to
>> > org.apache.pig.data.DataBag
>> > at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
>> > Caused by: java.lang.ClassCastException:
>> org.apache.pig.data.DataByteArray
>> > cannot be cast to org.apache.pig.data.DataBag
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:586)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:250)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
>> > at
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
>> > at
>> >
>> >
>> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)