|
|
-
Re: [jena-dev] SPARQL: Transformation of SPARQLAshutosh Chauhan 2010-03-26, 15:38
This seems to be a nice and useful contrib project for Pig. Is anyone
actively working on it? Ashutosh PS: Since I am not on jena-dev list, it might not make it there. Paolo can you post it there in case it doesnt appear there. On Fri, Mar 26, 2010 at 03:47, Paolo Castagna <[EMAIL PROTECTED]> wrote: > Hi Alexander > > Alexander Schätzle wrote: >> >> I'm working on a translation of SPARQL queries into PigLatin Scripts >> (language used by the Pig System for Hadoop developed by Yahoo). > > Translating the SPARQL algebra into PigLatin scripts is IMHO a nice > idea. With a small(?) amount of glue code you join two big communities > delivering value to both. On one hand, there is the need for > scalable/parallel processing systems. On the other hand, there is > the aim to support as many data formats as possible. > > I give fully credit for this idea to Peter Mika and Ben Reed that kindly > shared an unpublished paper [1] on this topic. You should write them and ask > for a copy of the paper. > > Here follows a summary of their mapping between the SPARQL algebra > operators and solution modifiers and the Pig Latin syntax: > > > SPARQL algebra Pig Latin syntax > -------------------------------- ---------------------------------- > > BGP operator A set of FILTER operations, > followed by a number of JOINs > equal to the number of triple > patterns and a single FOREACH > statement. > > Filter operator FILTER (with the limitation > that not all expressions are > directly supported by Pig Latin). > However, Pig can be extended via > user-defined functions (UDFs) to > have a semantically equivalent > filter behaviour. > > Join operator A series of JOINs (which is an > inner join in Pig) followed by a > FOREACH (a projection) to remove > the duplicated columns. > > LeftJoin operator A series of outer JOINs plus a > custom filter operator. > > Union operator UNION > > Graph operator ? > > OrderBy modifier ORDER > > Project modifier Achieved by FOREACH. > > Distinct modifier DISTINCT > > Reduced modifier Implemented using DISTINCT. > > Slice modifier Implemented using a custom filter. > > ToList modifier ? > > > They conclude: > > "In summary, we have shown a complete translation procedure from > SPARQL to Pig Latin scripts, which provides the basis of our SPARQL > interpreter. (This interpreter is not complete yet, but sufficient to > cover the most commonly used queries, including the ones discussed in > this paper.) Note that the query plans generated by our interpreter > are unlikely to be optimal: optimization is left as a task > for Pig." [1] > > I suggest you use N-Triples and/or N-Quads (parsers are available in > TDB) as input/output with Pig. > > Which version of Pig are you planning to use? > > My 2 cents, > Paolo > > [1] Peter Mika and Ben Reed, "Pearls before Swine: A large-scale triple > store using MapReduce", August 29, 2008 - unpublished > > > > PS: > For the benefit of Pig users/developers (in CC to this email): > > SPARQL stands for SPARQL Protocol and RDF Query Language and it is the > recommended RDF query language befined by W3C. > > More specifically, the SPARQL Algebra defines six operators and six > solution modifiers: > > Operators Solution Modifiers |