Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Pig duplicate records

Copy link to this message
Re: Pig duplicate records
You will want to ask the pig user mailing list this question.

org.apache.pig.piggybank.storage.avro.AvroStorage is maintained by the Pig
project and you will get more help from there.

On 9/21/11 4:34 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote:

>Hi all,
>I have a simple schema
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name", "type": "string"},
>    {"name": "id", "type": "int"}
>  ]
>which I use to write 2 records to an Avro file, and my reader code
>(which reads the file and dumps the records) verifies that there are 2
>records in the file:
>When using this file with pig and AvroStorage, pig seems to think
>there are 4 records:
>grunt> REGISTER /app/hadoop/lib/avro-1.5.4.jar;
>grunt> REGISTER /app/pig-0.9.0/contrib/piggybank/java/piggybank.jar;
>grunt> REGISTER /app/pig-0.9.0/build/ivy/lib/Pig/json-simple-1.1.jar;
>grunt> REGISTER
>grunt> REGISTER
>grunt> raw = LOAD 'test.v1.avro' USING
>grunt> dump raw;
>Successfully read 4 records (825 bytes) from:
>Successfully stored 4 records (46 bytes) in:
>Total records written : 4
>Total bytes written : 46
>I'm sure I'm doing something wrong (again)!
>Many thanks,