Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig AvroStorage : storing the data


+
Milind Vaidya 2013-01-11, 16:12
+
Cheolsoo Park 2013-01-11, 19:33
+
Milind Vaidya 2013-01-11, 21:02
Copy link to this message
-
Re: Pig AvroStorage : storing the data
You need to load JSON-simple jar, as he did in his example. Start with
it, not your own.

Russell Jurney http://datasyndrome.com

On Jan 11, 2013, at 1:03 PM, Milind Vaidya <[EMAIL PROTECTED]> wrote:

> As you said I was able to fix the PigStorage ( ) related problem by having
> separate input and output directories.
>
> Sorry, my mistake...! did not realise that the fully qualified name was
> missing.That along with separate output directory solved my AvroStorage()
> problem too.
>
> here are my 2 scripts:
>
> *AvroStorage() usage :*
>
> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar
>
> employee= LOAD '/user/immilind/AvroData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
> DESCRIBE employee;
> DUMP employee;
>
> STORE employee INTO '/user/immilind/AvroStoredData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> employee_new= LOAD '/user/immilind/AvroStoredData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> DESCRIBE employee_new;
> DUMP employee_new;
> *
> PigStorage( ) usage:*
>
> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar
>
> employee= load '/user/immilind/AvroData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
> DESCRIBE employee;
> DUMP employee;
>
> NewEmployee = foreach employee generate name as name, age as age,dept as
> dept,office as office,salary as salary,lastname as lastname;
> STORE NewEmployee INTO '/user/immilind/PlainData' USING PigStorage(',');
>
> employee_new = LOAD '/user/immilind/PlainData' USING PigStorage();
> DESCRIBE employee_new;
> DUMP employee_new;
>
>
> On Fri, Jan 11, 2013 at 11:33 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> Here is a working version of your example.
>>
>>
>> 1) AvroStorage Load -> AvroStorage Store -> AvroStorage Load
>>
>> -----
>> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
>> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
>> REGISTER contrib/piggybank/java/piggybank.jar
>>
>> DEFINE AVRO_LOAD_1
>> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
>> DEFINE AVRO_LOAD_2 org.apache.pig.piggybank.storage.avro.AvroStorage();
>> DEFINE AVRO_STORE
>> org.apache.pig.piggybank.storage.avro.AvroStorage('same',
>> 'AvroData/employee.avro');
>>
>> employee = LOAD 'AvroData' USING AVRO_LOAD_1;
>> DUMP employee;
>>
>> STORE employee INTO 'StoredAvro' USING AVRO_STORE;
>>
>> employee = LOAD 'StoredAvro' USING AVRO_LOAD_2;
>> DUMP employee;
>> -----
>>
>> Please note that:
>> * The 2nd Avro load command defines the schema by the 'same' option. It
>> means it will store the relation 'emplyee' using the same schema of
>> 'AvroData/employee.avro'. Alternatively, you can specify the schema using
>> JSON string by the 'schema' option. For example, AvroStorage('schema',
>> '<JSON string>').
>> * I moved StoredAvro out of AvroData. This is because AvroStorage loads
>> directories recursively. If I run this script multiple times, I will load
>> files not only files in AvroData but also in AvroData/StoredAvor from a
>> previous run. Therefore, I am using separate directories for input and
>> output.
>>
>>
>> 2) AvroStorage Load -> PigStorage Store -> PigStorage Load
>>
>> -----
>> DEFINE AVRO_LOAD
>> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
>>
>> employee = LOAD 'AvroData' USING AVRO_LOAD;
>> DUMP employee;
>>
>> STORE employee INTO 'StoredText' USING PigStorage(',');
>>
>> employee = LOAD 'StoredText' USING PigStorage(',') as (name:chararray,
>> age:int, dept:chararray, office:chararray, salary:int, lastname:chararray);
>> DUMP employee;
>> -----
>>
>>
>> 3) Regarding your errors:
>>
>> * ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve
>> AvroStorage using imports: [, org.apache.pig.builtin.,
>> org.apache.pig.impl.builtin.]
>> This is because you didn't use fully qualified name of AvroStorage in your
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB