Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Re: AvroStorage schema_uri pointing to local file doesn't work


Copy link to this message
-
Re: AvroStorage schema_uri pointing to local file doesn't work
Ruslan Al-Fakikh 2013-12-25, 21:19
Thank you, Cheolsoo!

Ok, I'll have Pig 0.12 when my team upgrades to a newer CDH.
For now I am using this workaround:
%declare WORK_DIR `pwd`
%declare SCHEMA_LITERAL `cat $WORK_DIR/schema.avsc`
...
STORE inputs INTO 'output'
    USING com.magnetic.org.apache.pig.piggybank.storage.avro.AvroStorage('{
    "index" : 1,
    "schema": $SCHEMA_LITERAL}');

Best Regards,
Ruslan Al-Fakikh
On Wed, Dec 25, 2013 at 11:48 AM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> avro to bcc:
>
> >> Why can't it use the schema file from front-end invocation?
>
> You're right. It should load the schema file in the front-end and pass it
> to the back-end via properties. Unfortunately, Piggybank AvroStorage
> doesn't do this.
>
> However, the new built-in AvroStorage in Pig 0.12 does exactly what you
> want. Can you use it instead?
>
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/builtin/AvroStorage.java#L120
>
>
> On Tue, Dec 24, 2013 at 10:15 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > Hey guys,
> >
> > I am using AvroStorage like this:
> >
> > STORE alias INTO '$OUTPUT'
> >     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> >     "index" : 1,
> >     "schema_uri": "file://path/schema.avsc"}');
> >
> > so, it is explicit to take the schema.avsc from the local file system,
> not
> > HDFS.
> > It works in a pseudo-distributed cluster, but fails on a normal cluster
> > with java.io.FileNotFoundException for the schema file
> > Looks like this is happening in backend.
> > I assume this is because the backend invocation of AvroStorage on a node,
> > different from the node I am running the pig script from, cannot find the
> > file in the local file system.
> > Why can't it use the schema file from front-end invocation?
> > Does it mean that I am only limited to either HDFS locations for
> > schema_uri or using embedding the schema string in AvroStorage
> parameters?
> >
> > Thanks in advance
> >
> > Ruslan Al-Fakikh
> >
>