Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> AW: using multipe avro schemas in globbed files (schema merging)


Copy link to this message
-
AW: using multipe avro schemas in globbed files (schema merging)
Hi Cheolsoo, hi Philipp,
we've already patched Stans original patch to pig-0.9.2-cdh4.0.1 and adjusted it to our needs.

In detail we've removed in the method "org.apache.pig.piggybank.storage.avro.AvroStorageUtils.union(Schema, Schema)" the schema name validation, since in our case the schemas have always the same name.

Code:
// if (x.getName().equals(y.getName())) {
// throw new RuntimeException("Union of two schemas of the same name is not supported");
// }

@Chelsoo: When applying the "merge code" to the piggybank codebase, please consider if this check makes in general sense.

By the way the patch works pretty good for us - Thanks to Stan!

Regards,
Nebo

-----Urspr√ľngliche Nachricht-----
Von: Cheolsoo Park [mailto:[EMAIL PROTECTED]]
Gesendet: Mittwoch, 25. Juli 2012 23:18
An: [EMAIL PROTECTED]
Betreff: Re: using multipe avro schemas in globbed files (schema merging)

Hi Phillipp,

Sure, I put PIG-2579 into my queue. I will start working on it shortly.

Thanks,
Cheolsoo

On Wed, Jul 25, 2012 at 7:35 AM, Philipp Pahl <[EMAIL PROTECTED]>wrote:

> Hi Cheolsoo,
>
> I saw that you integrated the "globs and commas" support into the pig
> code. I was wondering if you are also planning to integrate the
> multiple Avro schema support, which I would greatly appreciate.
>
> Thanks and regards
> Philipp
>
>
> On 07/17/2012 07:03 PM, Cheolsoo Park wrote:
>
>> Hi Markus,
>>
>> Thank you for sharing your problem.
>>
>> Looking at the PIG-2579
>> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
>> he.org/jira/browse/PIG-2579>>patch,
>> it seems to try
>>
>> to address two issues at the same time:
>> 1) Globs support
>> 2) Multiple Avro schemas support
>>
>> I think that it's better to solve one issue at a time. In fact, there
>> is another jira PIG-2492
>> <https://issues.apache.org/**jira/browse/PIG-2492<https://issues.apac
>> he.org/jira/browse/PIG-2492>>
>> that
>>
>> tries to address #1 particularly. Once
>> PIG-2492<https://issues.**apache.org/jira/browse/PIG-**2492<https://i
>> ssues.apache.org/jira/browse/PIG-2492>>is
>> resolved, I
>>
>> think we can rebase/fix the
>> PIG-2579
>> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
>> he.org/jira/browse/PIG-2579>>
>> patch on top of
>>
>> that.
>>
>> I am happy to work on both jiras. Please let me know what you think.
>>
>> Thanks,
>> Cheolsoo
>>
>> On Tue, Jul 17, 2012 at 4:26 AM, Markus Resch <[EMAIL PROTECTED]
>> >wrote:
>>
>>  Hey everyone,
>>>
>>> in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list
>>> I talked about issues we had with pig while testing cdh4 and that we
>>> had trouble in switching back to cdh3. After I figured out the
>>> reason of our pig issue I tried to apply the patch
>>> (https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apa
>>> che.org/jira/browse/PIG-2579>) to the cdh4 version of pig. Sadly
>>> this was much harder then applying this particular patch to the cdh3
>>> version of pig before. Does anyone have this or a similar patch in a
>>> way that is suitable for the cdh4 version of pig? I'm just asking
>>> because doing work twice doesn't help anyone. If this work is
>>> already
>>> done: could this patch be attached to the PIG-2579-ticket as well?
>>>
>>> Thanks
>>>
>>> Markus
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB