Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> AW: using multipe avro schemas in globbed files (schema merging)

Copy link to this message
AW: using multipe avro schemas in globbed files (schema merging)
Hi Cheolsoo, hi Philipp,
we've already patched Stans original patch to pig-0.9.2-cdh4.0.1 and adjusted it to our needs.

In detail we've removed in the method "org.apache.pig.piggybank.storage.avro.AvroStorageUtils.union(Schema, Schema)" the schema name validation, since in our case the schemas have always the same name.

// if (x.getName().equals(y.getName())) {
// throw new RuntimeException("Union of two schemas of the same name is not supported");
// }

@Chelsoo: When applying the "merge code" to the piggybank codebase, please consider if this check makes in general sense.

By the way the patch works pretty good for us - Thanks to Stan!


-----Urspr√ľngliche Nachricht-----
Von: Cheolsoo Park [mailto:[EMAIL PROTECTED]]
Gesendet: Mittwoch, 25. Juli 2012 23:18
Betreff: Re: using multipe avro schemas in globbed files (schema merging)

Hi Phillipp,

Sure, I put PIG-2579 into my queue. I will start working on it shortly.


On Wed, Jul 25, 2012 at 7:35 AM, Philipp Pahl <[EMAIL PROTECTED]>wrote:

> Hi Cheolsoo,
> I saw that you integrated the "globs and commas" support into the pig
> code. I was wondering if you are also planning to integrate the
> multiple Avro schema support, which I would greatly appreciate.
> Thanks and regards
> Philipp
> On 07/17/2012 07:03 PM, Cheolsoo Park wrote:
>> Hi Markus,
>> Thank you for sharing your problem.
>> Looking at the PIG-2579
>> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
>> he.org/jira/browse/PIG-2579>>patch,
>> it seems to try
>> to address two issues at the same time:
>> 1) Globs support
>> 2) Multiple Avro schemas support
>> I think that it's better to solve one issue at a time. In fact, there
>> is another jira PIG-2492
>> <https://issues.apache.org/**jira/browse/PIG-2492<https://issues.apac
>> he.org/jira/browse/PIG-2492>>
>> that
>> tries to address #1 particularly. Once
>> PIG-2492<https://issues.**apache.org/jira/browse/PIG-**2492<https://i
>> ssues.apache.org/jira/browse/PIG-2492>>is
>> resolved, I
>> think we can rebase/fix the
>> PIG-2579
>> <https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apac
>> he.org/jira/browse/PIG-2579>>
>> patch on top of
>> that.
>> I am happy to work on both jiras. Please let me know what you think.
>> Thanks,
>> Cheolsoo
>> On Tue, Jul 17, 2012 at 4:26 AM, Markus Resch <[EMAIL PROTECTED]
>> >wrote:
>>  Hey everyone,
>>> in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list
>>> I talked about issues we had with pig while testing cdh4 and that we
>>> had trouble in switching back to cdh3. After I figured out the
>>> reason of our pig issue I tried to apply the patch
>>> (https://issues.apache.org/**jira/browse/PIG-2579<https://issues.apa
>>> che.org/jira/browse/PIG-2579>) to the cdh4 version of pig. Sadly
>>> this was much harder then applying this particular patch to the cdh3
>>> version of pig before. Does anyone have this or a similar patch in a
>>> way that is suitable for the cdh4 version of pig? I'm just asking
>>> because doing work twice doesn't help anyone. If this work is
>>> already
>>> done: could this patch be attached to the PIG-2579-ticket as well?
>>> Thanks
>>> Markus
>>> --