Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> GSoC 2013


+
burakkk 2013-03-28, 20:28
+
Dmitriy Ryaboy 2013-03-29, 16:10
+
burakkk 2013-03-30, 17:12
+
Dmitriy Ryaboy 2013-04-01, 16:20
+
burakkk 2013-04-01, 18:35
+
burakkk 2013-04-02, 08:04
+
Gianmarco De Francisci Mo... 2013-04-02, 16:20
Copy link to this message
-
Re: GSoC 2013
I know that but giraph tries to use bsp. What I'm saying is nothing shared
model except reducers. Besides I don't want to divide iteration. One phase
is still responsible for whole iteration. Every different origin vertex
will be processed in parallel.

Thanks
Best regards...
On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales <[EMAIL PROTECTED]
> wrote:

> FYI, Giraph has a Random Walk implementation.
>
> Pig does not support iteration natively, so any iterative algorithm is not
> a very good fit for it. Just my 2c.
>
> Cheers,
>
> --
> Gianmarco
>
>
> On Tue, Apr 2, 2013 at 10:04 AM, burakkk <[EMAIL PROTECTED]> wrote:
>
> > So what do you suggest? Is it clear?
> >
> >
> > On Mon, Apr 1, 2013 at 9:35 PM, burakkk <[EMAIL PROTECTED]> wrote:
> >
> > > I'm using only WTF graph representation to fit the memory. By the way I
> > > haven't seen any explanation from the pig 0.11 release page about WTF
> or
> > > graph models.
> > > I don't wanna use Cassovary. I believe it can be done with pig. I
> > > implement a graph representation using WTF paper to pig and then I'll
> use
> > > it to implement random walk algorithm. To do that maybe I need to
> improve
> > > some features such as joins(fuzzy join) etc or implement a new
> operator.
> > I
> > > can implement it using either existing operators or new operators.
> That's
> > > up to us and it doesn't really matter. If there is already a
> > implementation
> > > to random walker algorithm, please feel free to tell. Because I haven't
> > > found it.
> > > Are you proposing to create an open-source implementation of those
> > > algorithms?
> > > Yes, I'm proposing to implement a random walk algorithm, new data model
> > > which is representing graph. After that, people can use it coding the
> > pig.
> > >
> > > Do you suggest they should be Pig scripts added to the Pig project, or
> do
> > > you want to create some new operators?
> > > Maybe, it can be UDF or new operator.
> > >
> > > I made a quick example. It may not be completely accurate, I've just
> > tried
> > > to explain it.
> > > Think about you have a graph file just like that
> > > user_id follower
> > > 1 2
> > > 1 3
> > > 1 10
> > > 2 3
> > > 3 4
> > > 3 5
> > > ...
> > >
> > > Vertex List is an array including sorted vertex ids
> > > node List is a matrix including vertex id and its starting position
> > >
> > >
> > > graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
> > > --load the graph file
> > > vertex = COGROUP graph BY (vertex);
> > > list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as
> > > vertexList; --load the whole vertexes from HDFS into the memory
> > > list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
> > > nodeList; --load the whole vertexes from HDFS into the memory
> > > randomWalk = FOREACH vertex GENERATE
> > > flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; --
> > generate a
> > > score using the node list you can traverse the graph to the your
> > finishing
> > > position
> > > store...
> > >
> > >
> > > Thanks
> > > Best Regards...
> > >
> > >
> > > On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]>
> > wrote:
> > >
> > >> I'm somewhat familiar with WTF code (my day job is managing the
> > analytics
> > >> infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in
> > >> fact
> > >> some of the Pig 11 features/improvements are directly due to this
> > >> project...), and mostly has to do with clever algorithms implemented
> in
> > >> Pig
> > >> (an earlier version of WTF loaded the graph into main memory on
> > large-mem
> > >> machines -- that system is open sourced, too, under
> > >> github.com/twitter/cassovary). Are you proposing to create an
> > open-source
> > >> implementation of those algorithms? Do you suggest they should be Pig
> > >> scripts added to the Pig project, or do you want to create some new
> > >> operators? I'm not totally sure where you are going here.
> > >>
> > >> GSoC proposals for Pig are usually made by students who want to work
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*
+
Dmitriy Ryaboy 2013-04-08, 18:57
+
Steve Bernstein 2013-04-08, 19:22
+
Gianmarco De Francisci Mo... 2013-04-09, 07:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB