Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - runtime exception when load and store multiple files using avro in pig


+
Danfeng Li 2012-08-21, 23:38
Copy link to this message
-
Re: runtime exception when load and store multiple files using avro in pig
Cheolsoo Park 2012-08-22, 00:06
Hi Danfeng,

The "long" is from the 1st AvroStorage store in your script. The
AvroStorage has very funny syntax regarding multiple stores. To apply
different avro schemas to multiple stores, you have to specify their
"index" as follows:

set1 = load 'input1.txt' using PigStorage('|') as ( ... );
*store set1 into 'set1' using
org.apache.pig.piggybank.storage.avro.AvroStorage('index', '1');*

set2 = load 'input2.txt' using PigStorage('|') as ( .. );
*store set2 into 'set2' using
org.apache.pig.piggybank.storage.avro.AvroStorage('index', '2');*

As can be seen, I added the 'index' parameters.

What AvroStorage does is to construct the following string in the frontend:

"1#<1st avro schema>,2#<2nd avro schema>"

and pass it to backend via UdfContext. Now in backend, tasks parse this
string to get output schema for each store.

Thanks,
Cheolsoo

On Tue, Aug 21, 2012 at 4:38 PM, Danfeng Li <[EMAIL PROTECTED]> wrote:

> I run into this strange problem when try to load multiple text formatted
> files and convert them into avro format using pig. However, if I read and
> convert one file at a time in separated runs, everything is fine. The error
> message is following
>
> 2012-08-21 19:15:32,964 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate
> exception from backed error:
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.RuntimeException: Datum 1980-01-01 00:00:00.000 is not in union
> ["null","long"]
>                 at
> org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
>                 at
> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
>                 at
> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:612)
>                 at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>                 at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>                 at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
>                 at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>                 at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>                 at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapB
>
> my code is
> set1 = load '$input_dir/set1.txt' using PigStorage('|') as (
>    id:long,
>    f1:long,
>    f2:chararray,
>    f3:float,
>    f4:float,
>    f5:float,
>    f6:float,
>    f7:float,
>    f8:float,
>    f9:float,
>    f10:float,
>    f11:float,
>    f12:float);
> store set1 into '$output_dir/set1.avro'
> using org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> set2 = load '$input_dir/set2.txt' using PigStorage('|') as (
>    id : int,
>    date : chararray);
> store set2 into '$output_dir/set2.avro'
> using org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> The first file is converted fine, but the 2nd one is failed. The error is
> coming from the 2nd field in the 2nd file, but the strange thing is that I
> don't even have "long" in my schema while the error message is showing
> ["null","long"].
>
> I use pig 0.10.0 and avro-1.7.1.jar.
>
> I wonder if this is a bug or I missed something.
>
> Thanks.
> Dan
>
> Here's set1.txt
>
> 827352|740214|Long|26|0.08731795012183759|1661335.541733333|0|0|0.001057865808239878|0.001059541098077884|0.001059541098077821|0.0514156486228232|0.001043980181757539
>
> 827353|740214|Short|12|-0.05967910581502997|-1135471.22271|0|0|-0.001185620143839061|-0.001187497751909232|-0.001187497751909183|-0.0747641932858414|-0.0001307449002148424
>
> 827354|740214|Total|38|0.02763884430680765|19026277.40819863|0|0|-0.0001277543355991829|-0.0001279566538313473|-0.0001279566538313626|-0.02334854466301821|0.0009132352815426966
+
Danfeng Li 2012-08-22, 00:47
+
Cheolsoo Park 2012-08-22, 01:03
+
Alan Gates 2012-08-22, 01:26
+
Danfeng Li 2012-08-22, 05:43