Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Including wonderdog in Pig contrib


Copy link to this message
-
Re: Including wonderdog in Pig contrib
Jeremy Hanna 2012-07-11, 05:45

On Jul 10, 2012, at 12:58 PM, Daniel Dai wrote:

> Who's the author for Wonderdog?
Jacob Perkins wrote Wonderdog while he was at Infochimps.  I'll CC him on the thread because I don't know that he subscribes to the dev list.  Jacob - did you want to chime in here?

> Can Russell or the author talk about
> it in our next hackthon? Also we need to discuss with the author about
> it.
>
> On Tue, Jul 10, 2012 at 9:23 AM, Alan Gates <[EMAIL PROTECTED]> wrote:
>> From https://issues.apache.org/jira/browse/PIG-2803 posted yesterday by Russell.  I'm copying it here because I think we need to discuss this and decide what we want to do:
>>
>> I propose to add Wonderdog to Pig contrib/
>> Wonderdog is an Apache 2.0 licensed project that adds Hadoop and Pig integration for ElasticSearch. This lets you index any Pig relation with a single UDF call, which is very powerful. Both writing searchable indexes and loading based on search queries is supported.
>> More information on Wonderdog is available at https://github.com/infochimps-labs/wonderdog and a great introduction to ElasticSearch is available at http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html
>> Wonderdog broke in Pig 0.10.0, and was patched to work here: https://github.com/infochimps-labs/wonderdog/pull/9 Even still, there is the issue of Pig creating schema files when storing and loading JSON that must be manually removed to make Wonderdog go.
>> Moving forward, I would like the Pig project to maintain Wonderdog in contrib/ and verify that it works with each version increment. Wonderdog is an incredibly useful library that is license compatible with Pig itself. Along with ElasticSearch, it adds the ability for any user to index his Pig relations and to load subsets of data by pushing search queries down to ElasticSearch.
>> I use Wonderdog in production and in my book, so I volunteer to do the maintenance on contrib/wonderdog.