Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Pig duplicate records


Copy link to this message
-
Re: Pig duplicate records
You will want to ask the pig user mailing list this question.

org.apache.pig.piggybank.storage.avro.AvroStorage is maintained by the Pig
project and you will get more help from there.

On 9/21/11 4:34 AM, "Alex Holmes" <[EMAIL PROTECTED]> wrote:

>Hi all,
>
>I have a simple schema
>
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name", "type": "string"},
>    {"name": "id", "type": "int"}
>  ]
>}
>
>which I use to write 2 records to an Avro file, and my reader code
>(which reads the file and dumps the records) verifies that there are 2
>records in the file:
>
>Record@1e9e5c73[name=r1,id=1]
>Record@ed42d08[name=r2,id=2]
>
>When using this file with pig and AvroStorage, pig seems to think
>there are 4 records:
>
>grunt> REGISTER /app/hadoop/lib/avro-1.5.4.jar;
>grunt> REGISTER /app/pig-0.9.0/contrib/piggybank/java/piggybank.jar;
>grunt> REGISTER /app/pig-0.9.0/build/ivy/lib/Pig/json-simple-1.1.jar;
>grunt> REGISTER
>/app/pig-0.9.0/build/ivy/lib/Pig/jackson-core-asl-1.6.0.jar;
>grunt> REGISTER
>/app/pig-0.9.0/build/ivy/lib/Pig/jackson-mapper-asl-1.6.0.jar;
>grunt> raw = LOAD 'test.v1.avro' USING
>org.apache.pig.piggybank.storage.avro.AvroStorage;
>grunt> dump raw;
>..
>Input(s):
>Successfully read 4 records (825 bytes) from:
>"hdfs://localhost:9000/user/aholmes/test.v1.avro"
>
>Output(s):
>Successfully stored 4 records (46 bytes) in:
>"hdfs://localhost:9000/tmp/temp2039109003/tmp1924774585"
>
>Counters:
>Total records written : 4
>Total bytes written : 46
>..
>(r1,1)
>(r2,2)
>(r1,1)
>(r2,2)
>
>I'm sure I'm doing something wrong (again)!
>
>Many thanks,
>Alex
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB