Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Load PigStorage with Schema Issues


Copy link to this message
-
Re: Load PigStorage with Schema Issues
That is a problem with using "," as the field delimiter.
PigStorage ends up splitting the whole record by the delimiter and the
second field is also getting split.
If you use some other delimiter for your data (eg,tab or ^A), it should
work fine.
Thanks,
Thejas

On 1/26/12 7:31 AM, Sandopolus wrote:
> Hi there
>
> I am trying to load in some data using the PigStorage with a schema. But i
> can't seem to get the schema right and was hoping someone could point out
> my mistake.
>
> Here is the data being loaded in:
> 2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,{(customer,27651a7d-0871-49a6-8df4-90305f7e840b),(customerClient,b57f9d15-6de7-486b-9761-46246be4abfe),(clientBuild,7376807c-7448-4785-8e2c-49814f6ce2f9),(country,FR)}
>
> Commands used:
> A = LOAD 'testdata.txt' USING PigStorage(',') as (key:chararray,
> columns:bag {column:tuple (name:chararray, value:chararray)});
> DUMP A;
>
> This results in the following warning and output:
> 2012-01-26 15:27:51,860 [main] WARN
>   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 1 time(s).
>
> (2ec00769-dc02-47dc-b2a5-35b6fb1d8e90,)
>
>  From the output it doesn't seem to be picking up bag structure, but if i
> remove the schema it will dump the data out correctly.
> Any help would be much appreciated.
>
> Ta
>
> Sandy
>