This seems to be a nice and useful contrib project for Pig. Is anyone
actively working on it?
PS: Since I am not on jena-dev list, it might not make it there. Paolo
can you post it there in case it doesnt appear there.
On Fri, Mar 26, 2010 at 03:47, Paolo Castagna
<[EMAIL PROTECTED]> wrote:
> Hi Alexander
> Alexander Schätzle wrote:
>> I'm working on a translation of SPARQL queries into PigLatin Scripts
>> (language used by the Pig System for Hadoop developed by Yahoo).
> Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
> idea. With a small(?) amount of glue code you join two big communities
> delivering value to both. On one hand, there is the need for
> scalable/parallel processing systems. On the other hand, there is
> the aim to support as many data formats as possible.
> I give fully credit for this idea to Peter Mika and Ben Reed that kindly
> shared an unpublished paper  on this topic. You should write them and ask
> for a copy of the paper.
> Here follows a summary of their mapping between the SPARQL algebra
> operators and solution modifiers and the Pig Latin syntax:
> SPARQL algebra Pig Latin syntax
> -------------------------------- ----------------------------------
> BGP operator A set of FILTER operations,
> followed by a number of JOINs
> equal to the number of triple
> patterns and a single FOREACH
> Filter operator FILTER (with the limitation
> that not all expressions are
> directly supported by Pig Latin).
> However, Pig can be extended via
> user-defined functions (UDFs) to
> have a semantically equivalent
> filter behaviour.
> Join operator A series of JOINs (which is an
> inner join in Pig) followed by a
> FOREACH (a projection) to remove
> the duplicated columns.
> LeftJoin operator A series of outer JOINs plus a
> custom filter operator.
> Union operator UNION
> Graph operator ?
> OrderBy modifier ORDER
> Project modifier Achieved by FOREACH.
> Distinct modifier DISTINCT
> Reduced modifier Implemented using DISTINCT.
> Slice modifier Implemented using a custom filter.
> ToList modifier ?
> They conclude:
> "In summary, we have shown a complete translation procedure from
> SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
> interpreter. (This interpreter is not complete yet, but sufficient to
> cover the most commonly used queries, including the ones discussed in
> this paper.) Note that the query plans generated by our interpreter
> are unlikely to be optimal: optimization is left as a task
> for Pig." 
> I suggest you use N-Triples and/or N-Quads (parsers are available in
> TDB) as input/output with Pig.
> Which version of Pig are you planning to use?
> My 2 cents,
>  Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
> store using MapReduce", August 29, 2008 - unpublished
> For the benefit of Pig users/developers (in CC to this email):
> SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
> recommended RDF query language befined by W3C.
> More specifically, the SPARQL Algebra defines six operators and six
> solution modifiers:
> Operators Solution Modifiers