Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - GSoC 2013


+
burakkk 2013-03-28, 20:28
+
Dmitriy Ryaboy 2013-03-29, 16:10
+
burakkk 2013-03-30, 17:12
+
Dmitriy Ryaboy 2013-04-01, 16:20
+
burakkk 2013-04-01, 18:35
+
burakkk 2013-04-02, 08:04
+
Gianmarco De Francisci Mo... 2013-04-02, 16:20
+
burakkk 2013-04-02, 16:48
+
Dmitriy Ryaboy 2013-04-08, 18:57
+
Steve Bernstein 2013-04-08, 19:22
Copy link to this message
-
Re: GSoC 2013
Gianmarco De Francisci Mo... 2013-04-09, 07:10
+1 to what Dmitriy says.

Cheers,

--
Gianmarco
On Mon, Apr 8, 2013 at 8:57 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:

> Hi,
> I think this is an interesting project but is not core to "Pig" itself --
> it may be more interesting / viable as a standalone project on github that
> uses Pig to implement graph algorithms.
> At this point in its development, I feel that Pig needs to concentrate on
> doing the things it already does, and do them better (operator efficiency,
> storage efficiency, better MR plan generation, etc) rather than expand to
> specific verticals; we should allow our users to create their own solution
> suites that use Pig for specific purposes. A successful example of such a
> standalone project is PacketPig (https://github.com/packetloop/packetpig)
> ,
> a PCAP network capture analysis tool.
>
> D
>
>
> On Tue, Apr 2, 2013 at 9:48 AM, burakkk <[EMAIL PROTECTED]> wrote:
>
> > I know that but giraph tries to use bsp. What I'm saying is nothing
> shared
> > model except reducers. Besides I don't want to divide iteration. One
> phase
> > is still responsible for whole iteration. Every different origin vertex
> > will be processed in parallel.
> >
> > Thanks
> > Best regards...
> >
> >
> > On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales <
> > [EMAIL PROTECTED]
> > > wrote:
> >
> > > FYI, Giraph has a Random Walk implementation.
> > >
> > > Pig does not support iteration natively, so any iterative algorithm is
> > not
> > > a very good fit for it. Just my 2c.
> > >
> > > Cheers,
> > >
> > > --
> > > Gianmarco
> > >
> > >
> > > On Tue, Apr 2, 2013 at 10:04 AM, burakkk <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > So what do you suggest? Is it clear?
> > > >
> > > >
> > > > On Mon, Apr 1, 2013 at 9:35 PM, burakkk <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > I'm using only WTF graph representation to fit the memory. By the
> > way I
> > > > > haven't seen any explanation from the pig 0.11 release page about
> WTF
> > > or
> > > > > graph models.
> > > > > I don't wanna use Cassovary. I believe it can be done with pig. I
> > > > > implement a graph representation using WTF paper to pig and then
> I'll
> > > use
> > > > > it to implement random walk algorithm. To do that maybe I need to
> > > improve
> > > > > some features such as joins(fuzzy join) etc or implement a new
> > > operator.
> > > > I
> > > > > can implement it using either existing operators or new operators.
> > > That's
> > > > > up to us and it doesn't really matter. If there is already a
> > > > implementation
> > > > > to random walker algorithm, please feel free to tell. Because I
> > haven't
> > > > > found it.
> > > > > Are you proposing to create an open-source implementation of those
> > > > > algorithms?
> > > > > Yes, I'm proposing to implement a random walk algorithm, new data
> > model
> > > > > which is representing graph. After that, people can use it coding
> the
> > > > pig.
> > > > >
> > > > > Do you suggest they should be Pig scripts added to the Pig project,
> > or
> > > do
> > > > > you want to create some new operators?
> > > > > Maybe, it can be UDF or new operator.
> > > > >
> > > > > I made a quick example. It may not be completely accurate, I've
> just
> > > > tried
> > > > > to explain it.
> > > > > Think about you have a graph file just like that
> > > > > user_id follower
> > > > > 1 2
> > > > > 1 3
> > > > > 1 10
> > > > > 2 3
> > > > > 3 4
> > > > > 3 5
> > > > > ...
> > > > >
> > > > > Vertex List is an array including sorted vertex ids
> > > > > node List is a matrix including vertex id and its starting position
> > > > >
> > > > >
> > > > > graph = load 'graph' using PigStorage() (vertex:int, follower:int)
> -
> > > > > --load the graph file
> > > > > vertex = COGROUP graph BY (vertex);
> > > > > list = FOREACH vertex GENERATE
> org.apache.pig.generateVertex(vertex)
> > as
> > > > > vertexList; --load the whole vertexes from HDFS into the memory
> > > > > list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as