Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - A major addition to Pig. Working with spatial data


Copy link to this message
-
Re: A major addition to Pig. Working with spatial data
Ahmed Eldawy 2013-05-02, 01:08
Thanks for your response. I was never good at differentiating all those
open source licenses. I mean what is the point making open source licenses
if it blocks me from using a library in an open source project. Any way,
I'm not going into debate here. Just one question, if we use JTS as a
library (jar file) without adding the code in Pig, is it still a violation?
We'll use ivy, for example, to download the jar file when compiling.
 On May 1, 2013 7:50 PM, "Alan Gates" <[EMAIL PROTECTED]> wrote:

> Passing on the technical details for a moment, I see a licensing issue.
>  JTS is licensed under LGPL.  Apache projects cannot contain or ship
> [L]GPL.  Apache does not meet the requirements of GPL and thus we cannot
> repackage their code. If you wanted to go forward using that class this
> would have to be packaged as an add on that was downloaded separately and
> not from Apache.  Another option is to work with the JTS community and see
> if they are willing to dual license their code under BSD or Apache license
> so that Pig could include it.  If neither of those are an option you would
> need to come up with a new class to contain your spatial data.
>
> Alan.
>
> On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
>
> > Hi all,
> >  First, sorry for the long email. I wanted to put all my thoughts here
> and
> > get your feedback.
> >  I'm proposing a major addition to Pig that will greatly increase its
> > functionality and user base. It is simply to add spatial support to the
> > language and the framework. I've already started working on that but I
> > don't want it to be just another branch. I want it, eventually, to be
> > merged with the trunk of Apache Pig. So, I'm sending this email mainly to
> > reach out the main contributors of Pig to see the feasibility of this.
> > This addition is a part of a big project we have been working on in
> > University of Minnesota; the project is called Spatial Hadoop.
> > http://spatialhadoop.cs.umn.edu. It's about building a MapReduce
> framework
> > (Hadoop) that is capable of maintaining and analyzing spatial data
> > efficiently. I'm the main guy behind that project and since we released
> its
> > first version, we received very encouraging responses from different
> groups
> > in the research and industrial community. I'm sure the addition we want
> to
> > make to Pig Latin will be widely accepted by the people in the spatial
> > community.
> > I'm proposing a plan here while we're still in the early phases of this
> > task to be able to discuss it with the main contributors and see its
> > feasibility. First of all, I think that we need to change the core of Pig
> > to be able to support spatial data. Providing a set of UDFs only is not
> > enough. The main reason is that Pig Latin does not provide a way to
> create
> > a new data type which is needed for spatial data. Once we have the
> spatial
> > data types we need, the functionality can be expanded using more UDFs.
> >
> > Here's the plan as I see it.
> > 1- Introduce a new primitive data type Geometry which represents all
> > spatial data types. In the underlying system, this will map to
> > com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology
> > Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable
> and
> > efficient open source Java library for spatial data types and algorithms.
> > It is very popular in the spatial community and a C++ port of it is used
> in
> > PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also
> > conforms with Open Geospatial Consortium (OGC) [
> > http://www.opengeospatial.org/] which is an open standard for the
> spatial
> > data types. The Geometry data type is read from and written to text files
> > using the Well Known Text (WKT) format. There is also a way to convert it
> > to/from binary so that it can work with binary files and streams.
> > 2- Add functions that manipulate spatial data types. These will be added
> as
> > UDFs and we will not need to mess with the internals of Pig. Most