Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig AvroStorage : storing the data


+
Milind Vaidya 2013-01-11, 16:12
+
Cheolsoo Park 2013-01-11, 19:33
+
Milind Vaidya 2013-01-11, 21:02
Copy link to this message
-
Re: Pig AvroStorage : storing the data
You need to load JSON-simple jar, as he did in his example. Start with
it, not your own.

Russell Jurney http://datasyndrome.com

On Jan 11, 2013, at 1:03 PM, Milind Vaidya <[EMAIL PROTECTED]> wrote:

> As you said I was able to fix the PigStorage ( ) related problem by having
> separate input and output directories.
>
> Sorry, my mistake...! did not realise that the fully qualified name was
> missing.That along with separate output directory solved my AvroStorage()
> problem too.
>
> here are my 2 scripts:
>
> *AvroStorage() usage :*
>
> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar
>
> employee= LOAD '/user/immilind/AvroData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
> DESCRIBE employee;
> DUMP employee;
>
> STORE employee INTO '/user/immilind/AvroStoredData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> employee_new= LOAD '/user/immilind/AvroStoredData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> DESCRIBE employee_new;
> DUMP employee_new;
> *
> PigStorage( ) usage:*
>
> REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
> REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar
>
> employee= load '/user/immilind/AvroData' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
> DESCRIBE employee;
> DUMP employee;
>
> NewEmployee = foreach employee generate name as name, age as age,dept as
> dept,office as office,salary as salary,lastname as lastname;
> STORE NewEmployee INTO '/user/immilind/PlainData' USING PigStorage(',');
>
> employee_new = LOAD '/user/immilind/PlainData' USING PigStorage();
> DESCRIBE employee_new;
> DUMP employee_new;
>
>
> On Fri, Jan 11, 2013 at 11:33 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> Here is a working version of your example.
>>
>>
>> 1) AvroStorage Load -> AvroStorage Store -> AvroStorage Load
>>
>> -----
>> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
>> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
>> REGISTER contrib/piggybank/java/piggybank.jar
>>
>> DEFINE AVRO_LOAD_1
>> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
>> DEFINE AVRO_LOAD_2 org.apache.pig.piggybank.storage.avro.AvroStorage();
>> DEFINE AVRO_STORE
>> org.apache.pig.piggybank.storage.avro.AvroStorage('same',
>> 'AvroData/employee.avro');
>>
>> employee = LOAD 'AvroData' USING AVRO_LOAD_1;
>> DUMP employee;
>>
>> STORE employee INTO 'StoredAvro' USING AVRO_STORE;
>>
>> employee = LOAD 'StoredAvro' USING AVRO_LOAD_2;
>> DUMP employee;
>> -----
>>
>> Please note that:
>> * The 2nd Avro load command defines the schema by the 'same' option. It
>> means it will store the relation 'emplyee' using the same schema of
>> 'AvroData/employee.avro'. Alternatively, you can specify the schema using
>> JSON string by the 'schema' option. For example, AvroStorage('schema',
>> '<JSON string>').
>> * I moved StoredAvro out of AvroData. This is because AvroStorage loads
>> directories recursively. If I run this script multiple times, I will load
>> files not only files in AvroData but also in AvroData/StoredAvor from a
>> previous run. Therefore, I am using separate directories for input and
>> output.
>>
>>
>> 2) AvroStorage Load -> PigStorage Store -> PigStorage Load
>>
>> -----
>> DEFINE AVRO_LOAD
>> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
>>
>> employee = LOAD 'AvroData' USING AVRO_LOAD;
>> DUMP employee;
>>
>> STORE employee INTO 'StoredText' USING PigStorage(',');
>>
>> employee = LOAD 'StoredText' USING PigStorage(',') as (name:chararray,
>> age:int, dept:chararray, office:chararray, salary:int, lastname:chararray);
>> DUMP employee;
>> -----
>>
>>
>> 3) Regarding your errors:
>>
>> * ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve
>> AvroStorage using imports: [, org.apache.pig.builtin.,
>> org.apache.pig.impl.builtin.]
>> This is because you didn't use fully qualified name of AvroStorage in your