Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig AvroStorage : storing the data


+
Milind Vaidya 2013-01-11, 16:12
+
Cheolsoo Park 2013-01-11, 19:33
Copy link to this message
-
Re: Pig AvroStorage : storing the data
As you said I was able to fix the PigStorage ( ) related problem by having
separate input and output directories.

Sorry, my mistake...! did not realise that the fully qualified name was
missing.That along with separate output directory solved my AvroStorage()
problem too.

here are my 2 scripts:

*AvroStorage() usage :*

REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar

employee= LOAD '/user/immilind/AvroData' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
DESCRIBE employee;
DUMP employee;

STORE employee INTO '/user/immilind/AvroStoredData' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

employee_new= LOAD '/user/immilind/AvroStoredData' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();
DESCRIBE employee_new;
DUMP employee_new;
*
PigStorage( ) usage:*

REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar
REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar

employee= load '/user/immilind/AvroData' USING
org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
DESCRIBE employee;
DUMP employee;

NewEmployee = foreach employee generate name as name, age as age,dept as
dept,office as office,salary as salary,lastname as lastname;
STORE NewEmployee INTO '/user/immilind/PlainData' USING PigStorage(',');

employee_new = LOAD '/user/immilind/PlainData' USING PigStorage();
DESCRIBE employee_new;
DUMP employee_new;
On Fri, Jan 11, 2013 at 11:33 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> Hi,
>
> Here is a working version of your example.
>
>
> 1) AvroStorage Load -> AvroStorage Store -> AvroStorage Load
>
> -----
> REGISTER build/ivy/lib/Pig/avro-1.7.1.jar
> REGISTER build/ivy/lib/Pig/json-simple-1.1.jar
> REGISTER contrib/piggybank/java/piggybank.jar
>
> DEFINE AVRO_LOAD_1
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
> DEFINE AVRO_LOAD_2 org.apache.pig.piggybank.storage.avro.AvroStorage();
> DEFINE AVRO_STORE
>  org.apache.pig.piggybank.storage.avro.AvroStorage('same',
> 'AvroData/employee.avro');
>
> employee = LOAD 'AvroData' USING AVRO_LOAD_1;
> DUMP employee;
>
> STORE employee INTO 'StoredAvro' USING AVRO_STORE;
>
> employee = LOAD 'StoredAvro' USING AVRO_LOAD_2;
> DUMP employee;
> -----
>
> Please note that:
> * The 2nd Avro load command defines the schema by the 'same' option. It
> means it will store the relation 'emplyee' using the same schema of
> 'AvroData/employee.avro'. Alternatively, you can specify the schema using
> JSON string by the 'schema' option. For example, AvroStorage('schema',
> '<JSON string>').
> * I moved StoredAvro out of AvroData. This is because AvroStorage loads
> directories recursively. If I run this script multiple times, I will load
> files not only files in AvroData but also in AvroData/StoredAvor from a
> previous run. Therefore, I am using separate directories for input and
> output.
>
>
> 2) AvroStorage Load -> PigStorage Store -> PigStorage Load
>
> -----
> DEFINE AVRO_LOAD
> org.apache.pig.piggybank.storage.avro.AvroStorage('multiple_schemas');
>
> employee = LOAD 'AvroData' USING AVRO_LOAD;
> DUMP employee;
>
> STORE employee INTO 'StoredText' USING PigStorage(',');
>
> employee = LOAD 'StoredText' USING PigStorage(',') as (name:chararray,
> age:int, dept:chararray, office:chararray, salary:int, lastname:chararray);
> DUMP employee;
> -----
>
>
> 3) Regarding your errors:
>
> * ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve
> AvroStorage using imports: [, org.apache.pig.builtin.,
> org.apache.pig.impl.builtin.]
> This is because you didn't use fully qualified name of AvroStorage in your
> script. Pig assumes default qualifiers if no qualifier is given.
>
> * ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema
> from loadFunc org.apache.pig.piggybank.storage.avro.AvroStorage
> This can happen you load non-Avro files (e.g. text files) using
> AvroStorage. For example, if you store data using AvroStorage() without a
+
Russell Jurney 2013-01-11, 21:04