Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Re: [jena-dev] SPARQL: Transformation of SPARQL


Copy link to this message
-
Re: [jena-dev] SPARQL: Transformation of SPARQL
Ashutosh Chauhan 2010-03-26, 15:38
This seems to be a nice and useful contrib project for Pig. Is anyone
actively working on it?

Ashutosh

PS: Since I am not on jena-dev list, it might not make it there. Paolo
can you post it there in case it doesnt appear there.

On Fri, Mar 26, 2010 at 03:47, Paolo Castagna
<[EMAIL PROTECTED]> wrote:
> Hi Alexander
>
> Alexander Schätzle wrote:
>>
>> I'm working on a translation of SPARQL queries into PigLatin Scripts
>> (language used by the Pig System for Hadoop developed by Yahoo).
>
> Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
> idea. With a small(?) amount of glue code you join two big communities
> delivering value to both. On one hand, there is the need for
> scalable/parallel processing systems. On the other hand, there is
> the aim to support as many data formats as possible.
>
> I give fully credit for this idea to Peter Mika and Ben Reed that kindly
> shared an unpublished paper [1] on this topic. You should write them and ask
> for a copy of the paper.
>
> Here follows a summary of their mapping between the SPARQL algebra
> operators and solution modifiers and the Pig Latin syntax:
>
>
>  SPARQL algebra                     Pig Latin syntax
>  --------------------------------   ----------------------------------
>
>  BGP operator                       A set of FILTER operations,
>                                     followed by a number of JOINs
>                                     equal to the number of triple
>                                     patterns and a single FOREACH
>                                     statement.
>
>  Filter operator                    FILTER (with the limitation
>                                     that not all expressions are
>                                     directly supported by Pig Latin).
>                                     However, Pig can be extended via
>                                     user-defined functions (UDFs) to
>                                     have a semantically equivalent
>                                     filter behaviour.
>
>  Join operator                      A series of JOINs (which is an
>                                     inner join in Pig) followed by a
>                                     FOREACH (a projection) to remove
>                                     the duplicated columns.
>
>  LeftJoin operator                  A series of outer JOINs plus a
>                                     custom filter operator.
>
>  Union operator                     UNION
>
>  Graph operator                     ?
>
>  OrderBy modifier                   ORDER
>
>  Project modifier                   Achieved by FOREACH.
>
>  Distinct modifier                  DISTINCT
>
>  Reduced modifier                   Implemented using DISTINCT.
>
>  Slice modifier                     Implemented using a custom filter.
>
>  ToList modifier                    ?
>
>
> They conclude:
>
>  "In summary, we have shown a complete translation procedure from
>   SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
>   interpreter. (This interpreter is not complete yet, but sufficient to
>   cover the most commonly used queries, including the ones discussed in
>   this paper.) Note that the query plans generated by our interpreter
>   are unlikely to be optimal: optimization is left as a task
>   for Pig." [1]
>
> I suggest you use N-Triples and/or N-Quads (parsers are available in
> TDB) as input/output with Pig.
>
> Which version of Pig are you planning to use?
>
> My 2 cents,
> Paolo
>
>  [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
>     store using MapReduce", August 29, 2008 - unpublished
>
>
>
> PS:
> For the benefit of Pig users/developers (in CC to this email):
>
> SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
> recommended RDF query language befined by W3C.
>
> More specifically, the SPARQL Algebra defines six operators and six
> solution modifiers:
>
>  Operators                Solution Modifiers