Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Order of the schema in Union


Copy link to this message
-
Re: Order of the schema in Union
Scott Carey 2012-02-23, 01:14
This may be related to
https://issues.apache.org/jira/browse/AVRO-1023

If not, open a new ticket.  If so, please comment there.

What you describe below seems to have something to do with how Schema.Parser
works with tracking a namespace across multiple parse runs.

On 2/21/12 2:49 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote:

> Hi Scott,
>
> Thanks for looking to this.
>
> I created a small schema and did some experiments.
>
> Here is my findings:
>
> 1. If both schemas do not have namespaces, MapReduce job works
> 2. If both schemas have namespaces, MapReduce job works
> 3. if the first schema in the Union does not have namespace, but the second
> one has namespace, MapReduce works
> 4. If the first schema in the Union have namespace, but the second one does
> not, MapReduce fails.
>
> For some reason, it assigns namespace from the first schema to the second
> while running MapReduce.
>
>
> This feels like a bug somewhere.
>
> This is the schema I am setting:
>
> Union schema:
> [ {
>   "type" : "record",
>   "name" : "FacebookUser",
>   "namespace" : "FacebookUser",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> }, {
>   "type" : "record",
>   "name" : "FacebookUser2",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> } ]
>
>
> and this is the schema that MapReduce gets:
>
>  [ {
>   "type" : "record",
>   "name" : "FacebookUser",
>   "namespace" : "FacebookUser",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> }, {
>   "type" : "record",
>   "name" : "FacebookUser2",
>   "namespace" : "FacebookUser",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> } ]
>  
>
> The difference is the second namespace.
>
> I would be more then happy to fix in the code, if you could point me to where
> to look
>
> Regards,
> Serge
>
>
>
> On Tue, Feb 21, 2012 at 9:39 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
>> As for why the union does not seem to match:
>> The Union schemas are not the same as the one in the error ‹ the one in the
>> error does not have a namespace.  It finds "AVRO_NCP_ICM"  but the union has
>> only  "merced.AVRO_NCP_ICM" and "merced. AVRO_IVR_BY_CALLID".
>> The namespace and name must both match.
>>
>> Is your output schema correct?  It looks like you are setting both your
>> MapOutputSchema and OutputSchema to be a Pair schema.  I suspect you only
>> want the Pair schema as a map output and reducer input, but cannot be sure
>> from the below.
>>
>> From the below, your reducer must create Pair objects and output them, and
>> maybe that is related to the error below.  It may also be related to the
>> combiner, does it happen without it?
>>
>>
>>
>> On 2/12/12 11:01 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> I am running into an interesting problem with Union. It seems that order of
>>> the schema in union must be in the same order as input path for different
>>> files.
>>>
>>> This does not look like right behavior. The code and exception are below.
>>>
>>> The moment I change the order in union it works.
>>>
>>>
>>> Thanks
>>> Serge
>>>
>>>
>>>    public int run(String[] strings) throws Exception {
>>>
>>>         JobConf job = new JobConf();
>>>