Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Order of the schema in Union


Copy link to this message
-
Re: Order of the schema in Union
This may be related to
https://issues.apache.org/jira/browse/AVRO-1023

If not, open a new ticket.  If so, please comment there.

What you describe below seems to have something to do with how Schema.Parser
works with tracking a namespace across multiple parse runs.

On 2/21/12 2:49 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote:

> Hi Scott,
>
> Thanks for looking to this.
>
> I created a small schema and did some experiments.
>
> Here is my findings:
>
> 1. If both schemas do not have namespaces, MapReduce job works
> 2. If both schemas have namespaces, MapReduce job works
> 3. if the first schema in the Union does not have namespace, but the second
> one has namespace, MapReduce works
> 4. If the first schema in the Union have namespace, but the second one does
> not, MapReduce fails.
>
> For some reason, it assigns namespace from the first schema to the second
> while running MapReduce.
>
>
> This feels like a bug somewhere.
>
> This is the schema I am setting:
>
> Union schema:
> [ {
>   "type" : "record",
>   "name" : "FacebookUser",
>   "namespace" : "FacebookUser",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> }, {
>   "type" : "record",
>   "name" : "FacebookUser2",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> } ]
>
>
> and this is the schema that MapReduce gets:
>
>  [ {
>   "type" : "record",
>   "name" : "FacebookUser",
>   "namespace" : "FacebookUser",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> }, {
>   "type" : "record",
>   "name" : "FacebookUser2",
>   "namespace" : "FacebookUser",
>   "fields" : [ {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "num_likes",
>     "type" : "int"
>   }, {
>     "name" : "num_photos",
>     "type" : "int"
>   }, {
>     "name" : "num_groups",
>     "type" : "int"
>   } ]
> } ]
>  
>
> The difference is the second namespace.
>
> I would be more then happy to fix in the code, if you could point me to where
> to look
>
> Regards,
> Serge
>
>
>
> On Tue, Feb 21, 2012 at 9:39 AM, Scott Carey <[EMAIL PROTECTED]> wrote:
>> As for why the union does not seem to match:
>> The Union schemas are not the same as the one in the error ‹ the one in the
>> error does not have a namespace.  It finds "AVRO_NCP_ICM"  but the union has
>> only  "merced.AVRO_NCP_ICM" and "merced. AVRO_IVR_BY_CALLID".
>> The namespace and name must both match.
>>
>> Is your output schema correct?  It looks like you are setting both your
>> MapOutputSchema and OutputSchema to be a Pair schema.  I suspect you only
>> want the Pair schema as a map output and reducer input, but cannot be sure
>> from the below.
>>
>> From the below, your reducer must create Pair objects and output them, and
>> maybe that is related to the error below.  It may also be related to the
>> combiner, does it happen without it?
>>
>>
>>
>> On 2/12/12 11:01 PM, "Serge Blazhievsky" <[EMAIL PROTECTED]> wrote:
>>
>>> Hi all,
>>>
>>> I am running into an interesting problem with Union. It seems that order of
>>> the schema in union must be in the same order as input path for different
>>> files.
>>>
>>> This does not look like right behavior. The code and exception are below.
>>>
>>> The moment I change the order in union it works.
>>>
>>>
>>> Thanks
>>> Serge
>>>
>>>
>>>    public int run(String[] strings) throws Exception {
>>>
>>>         JobConf job = new JobConf();
>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB