Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: STORE USING AvroStorage - ignores Pig field names, only using their position


Copy link to this message
-
Re: STORE USING AvroStorage - ignores Pig field names, only using their position
Unfortunately avro storage is not flexible enough as it could be.
I do keep avro schemas separatly on hdfs and use pointer to file in
AvroStorage for storing.
I do always do explicit projection of relation fields before storing
relation.

The same problem is for reading data. pushing up fields by names is not
working also.
2013/11/17 Russell Jurney <[EMAIL PROTECTED]>

> I think the expected behavior of AvroStorage is to use the tuple-ordered
> fields in the order they exist in the tuple. So to fix your problem, swap
> the order of b/nonsense_name.
>
> Otherwise I can't see a way to map from b to nonsense_name at all. Pig
> can't know how to do that without referencing tuple field order.
>
> On Sat, Nov 16, 2013 at 7:42 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
>
> > including this last message to pig user list
> >
> >
> > On Sun, Nov 17, 2013 at 7:40 AM, Ruslan Al-Fakikh <[EMAIL PROTECTED]
> >wrote:
> >
> >> Russel,
> >>
> >> Actually this problem came from the situation when I had the same names
> >> in pig relation schema and avro schema. And it turned out that
> AvroStorage
> >> switches fields if the order is different.
> >> So, my impression is that it should work this way:
> >> 1) names correspond - then AvroStorage uses them
> >> 2) names do not correspond - then AvroStorage fails to store or does
> some
> >> schema resolution as shown here:
> >> http://avro.apache.org/docs/1.7.5/spec.html#Schema+Resolution
> >>
> >> Thanks
> >>
> >>
> >> On Sun, Nov 17, 2013 at 7:17 AM, Russell Jurney <
> [EMAIL PROTECTED]
> >> > wrote:
> >>
> >>> How can pig map from a to nonsence_name?
> >>>
> >>>
> >>> On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote:
> >>>
> >>>> Thanks, Russel!
> >>>>
> >>>> Do you mean that this is the expected behavior? Shouldn't AvroStorage
> >>>> map the pig fields by their names (not their field order) matching
> them to
> >>>> the names in the avro schema?
> >>>>
> >>>> Thanks,
> >>>> Ruslan Al-Fakikh
> >>>>
> >>>>
> >>>> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney <
> >>>> [EMAIL PROTECTED]> wrote:
> >>>>
> >>>>> Pig tuples have field order. Swap the order of the fields in your
> avro
> >>>>> schema and try again.
> >>>>>
> >>>>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[EMAIL PROTECTED]>
> >>>>> wrote:
> >>>>>
> >>>>>  Hey guys,
> >>>>>
> >>>>> When I store with AvroStorage, the names from Pig tuple fields are
> >>>>> completely ignored. The field values are put to the result file only
> by
> >>>>> their position.
> >>>>> Here is a simplified test case:
> >>>>>
> >>>>> %declare WORKDIR `pwd`
> >>>>> REGISTER ../../../../lib/external/avro-1.7.4.jar
> >>>>> REGISTER ../../../../lib/external/json-simple-1.1.jar
> >>>>> --this is build (manually with Maven) from the latest source:
> >>>>> --
> >>>>>
> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/
> >>>>> REGISTER ../piggybankBuiltFromSource.jar
> >>>>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar
> >>>>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar
> >>>>>
> >>>>> --$ cat input.txt
> >>>>> --data_a data_b
> >>>>> --data_a data_b
> >>>>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray);
> >>>>>
> >>>>> DESCRIBE inputs;
> >>>>> DUMP inputs;
> >>>>>
> >>>>> --output:
> >>>>> --inputs: {a: chararray,b: chararray}
> >>>>> --(data_a,data_b)
> >>>>> --(data_a,data_b)
> >>>>>
> >>>>> STORE inputs INTO 'output'
> >>>>>     USING org.apache.pig.piggybank.storage.avro.AvroStorage('{
> >>>>> "schema":
> >>>>> {
> >>>>>   "type" : "record",
> >>>>>   "name" : "my_schema",
> >>>>>   "namespace" : "com.my_namespace",
> >>>>>   "fields" : [
> >>>>>   {
> >>>>>     "name" : "b",
> >>>>>     "type" : "string"
> >>>>>   },
> >>>>>   {
> >>>>>     "name" : "nonsense_name",
> >>>>>     "type" : "string"
> >>>>>   }
> >>>>>   ]
> >>>>> }
> >>>>> }');
> >>>>>
> >>>>> --output
> >>>>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson