Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - GSoC 2013


Copy link to this message
-
Re: GSoC 2013
burakkk 2013-04-02, 16:48
I know that but giraph tries to use bsp. What I'm saying is nothing shared
model except reducers. Besides I don't want to divide iteration. One phase
is still responsible for whole iteration. Every different origin vertex
will be processed in parallel.

Thanks
Best regards...
On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales <[EMAIL PROTECTED]
> wrote:

> FYI, Giraph has a Random Walk implementation.
>
> Pig does not support iteration natively, so any iterative algorithm is not
> a very good fit for it. Just my 2c.
>
> Cheers,
>
> --
> Gianmarco
>
>
> On Tue, Apr 2, 2013 at 10:04 AM, burakkk <[EMAIL PROTECTED]> wrote:
>
> > So what do you suggest? Is it clear?
> >
> >
> > On Mon, Apr 1, 2013 at 9:35 PM, burakkk <[EMAIL PROTECTED]> wrote:
> >
> > > I'm using only WTF graph representation to fit the memory. By the way I
> > > haven't seen any explanation from the pig 0.11 release page about WTF
> or
> > > graph models.
> > > I don't wanna use Cassovary. I believe it can be done with pig. I
> > > implement a graph representation using WTF paper to pig and then I'll
> use
> > > it to implement random walk algorithm. To do that maybe I need to
> improve
> > > some features such as joins(fuzzy join) etc or implement a new
> operator.
> > I
> > > can implement it using either existing operators or new operators.
> That's
> > > up to us and it doesn't really matter. If there is already a
> > implementation
> > > to random walker algorithm, please feel free to tell. Because I haven't
> > > found it.
> > > Are you proposing to create an open-source implementation of those
> > > algorithms?
> > > Yes, I'm proposing to implement a random walk algorithm, new data model
> > > which is representing graph. After that, people can use it coding the
> > pig.
> > >
> > > Do you suggest they should be Pig scripts added to the Pig project, or
> do
> > > you want to create some new operators?
> > > Maybe, it can be UDF or new operator.
> > >
> > > I made a quick example. It may not be completely accurate, I've just
> > tried
> > > to explain it.
> > > Think about you have a graph file just like that
> > > user_id follower
> > > 1 2
> > > 1 3
> > > 1 10
> > > 2 3
> > > 3 4
> > > 3 5
> > > ...
> > >
> > > Vertex List is an array including sorted vertex ids
> > > node List is a matrix including vertex id and its starting position
> > >
> > >
> > > graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
> > > --load the graph file
> > > vertex = COGROUP graph BY (vertex);
> > > list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as
> > > vertexList; --load the whole vertexes from HDFS into the memory
> > > list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
> > > nodeList; --load the whole vertexes from HDFS into the memory
> > > randomWalk = FOREACH vertex GENERATE
> > > flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; --
> > generate a
> > > score using the node list you can traverse the graph to the your
> > finishing
> > > position
> > > store...
> > >
> > >
> > > Thanks
> > > Best Regards...
> > >
> > >
> > > On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> > wrote:
> > >
> > >> I'm somewhat familiar with WTF code (my day job is managing the
> > analytics
> > >> infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in
> > >> fact
> > >> some of the Pig 11 features/improvements are directly due to this
> > >> project...), and mostly has to do with clever algorithms implemented
> in
> > >> Pig
> > >> (an earlier version of WTF loaded the graph into main memory on
> > large-mem
> > >> machines -- that system is open sourced, too, under
> > >> github.com/twitter/cassovary). Are you proposing to create an
> > open-source
> > >> implementation of those algorithms? Do you suggest they should be Pig
> > >> scripts added to the Pig project, or do you want to create some new
> > >> operators? I'm not totally sure where you are going here.
> > >>
> > >> GSoC proposals for Pig are usually made by students who want to work
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*