Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: [jena-dev] SPARQL: Transformation of SPARQL


Copy link to this message
-
Re: [jena-dev] SPARQL: Transformation of SPARQL
This seems to be a nice and useful contrib project for Pig. Is anyone
actively working on it?

Ashutosh

PS: Since I am not on jena-dev list, it might not make it there. Paolo
can you post it there in case it doesnt appear there.

On Fri, Mar 26, 2010 at 03:47, Paolo Castagna
<[EMAIL PROTECTED]> wrote:
> Hi Alexander
>
> Alexander Schätzle wrote:
>>
>> I'm working on a translation of SPARQL queries into PigLatin Scripts
>> (language used by the Pig System for Hadoop developed by Yahoo).
>
> Translating the SPARQL algebra into PigLatin scripts is IMHO a nice
> idea. With a small(?) amount of glue code you join two big communities
> delivering value to both. On one hand, there is the need for
> scalable/parallel processing systems. On the other hand, there is
> the aim to support as many data formats as possible.
>
> I give fully credit for this idea to Peter Mika and Ben Reed that kindly
> shared an unpublished paper [1] on this topic. You should write them and ask
> for a copy of the paper.
>
> Here follows a summary of their mapping between the SPARQL algebra
> operators and solution modifiers and the Pig Latin syntax:
>
>
>  SPARQL algebra                     Pig Latin syntax
>  --------------------------------   ----------------------------------
>
>  BGP operator                       A set of FILTER operations,
>                                     followed by a number of JOINs
>                                     equal to the number of triple
>                                     patterns and a single FOREACH
>                                     statement.
>
>  Filter operator                    FILTER (with the limitation
>                                     that not all expressions are
>                                     directly supported by Pig Latin).
>                                     However, Pig can be extended via
>                                     user-defined functions (UDFs) to
>                                     have a semantically equivalent
>                                     filter behaviour.
>
>  Join operator                      A series of JOINs (which is an
>                                     inner join in Pig) followed by a
>                                     FOREACH (a projection) to remove
>                                     the duplicated columns.
>
>  LeftJoin operator                  A series of outer JOINs plus a
>                                     custom filter operator.
>
>  Union operator                     UNION
>
>  Graph operator                     ?
>
>  OrderBy modifier                   ORDER
>
>  Project modifier                   Achieved by FOREACH.
>
>  Distinct modifier                  DISTINCT
>
>  Reduced modifier                   Implemented using DISTINCT.
>
>  Slice modifier                     Implemented using a custom filter.
>
>  ToList modifier                    ?
>
>
> They conclude:
>
>  "In summary, we have shown a complete translation procedure from
>   SPARQL to Pig Latin scripts, which provides the basis of our SPARQL
>   interpreter. (This interpreter is not complete yet, but sufficient to
>   cover the most commonly used queries, including the ones discussed in
>   this paper.) Note that the query plans generated by our interpreter
>   are unlikely to be optimal: optimization is left as a task
>   for Pig." [1]
>
> I suggest you use N-Triples and/or N-Quads (parsers are available in
> TDB) as input/output with Pig.
>
> Which version of Pig are you planning to use?
>
> My 2 cents,
> Paolo
>
>  [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple
>     store using MapReduce", August 29, 2008 - unpublished
>
>
>
> PS:
> For the benefit of Pig users/developers (in CC to this email):
>
> SPARQL stands for SPARQL Protocol and RDF Query Language and it is the
> recommended RDF query language befined by W3C.
>
> More specifically, the SPARQL Algebra defines six operators and six
> solution modifiers:
>
>  Operators                Solution Modifiers
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB