Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> GSoC 2013


+
burakkk 2013-03-28, 20:28
+
Dmitriy Ryaboy 2013-03-29, 16:10
+
burakkk 2013-03-30, 17:12
+
Dmitriy Ryaboy 2013-04-01, 16:20
+
burakkk 2013-04-01, 18:35
+
burakkk 2013-04-02, 08:04
+
Gianmarco De Francisci Mo... 2013-04-02, 16:20
+
burakkk 2013-04-02, 16:48
+
Dmitriy Ryaboy 2013-04-08, 18:57
As a long follower, infrequent poster to this list, I agree with this wisdom.

Much as I'm attracted to graph analysis, continuing focus on a rock solid foundation is a good call.

-----Original Message-----
From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
Sent: Monday, April 08, 2013 11:58 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: GSoC 2013

Hi,
I think this is an interesting project but is not core to "Pig" itself -- it may be more interesting / viable as a standalone project on github that uses Pig to implement graph algorithms.
At this point in its development, I feel that Pig needs to concentrate on doing the things it already does, and do them better (operator efficiency, storage efficiency, better MR plan generation, etc) rather than expand to specific verticals; we should allow our users to create their own solution suites that use Pig for specific purposes. A successful example of such a standalone project is PacketPig (https://github.com/packetloop/packetpig) , a PCAP network capture analysis tool.

D
On Tue, Apr 2, 2013 at 9:48 AM, burakkk <[EMAIL PROTECTED]> wrote:

> I know that but giraph tries to use bsp. What I'm saying is nothing
> shared model except reducers. Besides I don't want to divide
> iteration. One phase is still responsible for whole iteration. Every
> different origin vertex will be processed in parallel.
>
> Thanks
> Best regards...
>
>
> On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales <
> [EMAIL PROTECTED]
> > wrote:
>
> > FYI, Giraph has a Random Walk implementation.
> >
> > Pig does not support iteration natively, so any iterative algorithm
> > is
> not
> > a very good fit for it. Just my 2c.
> >
> > Cheers,
> >
> > --
> > Gianmarco
> >
> >
> > On Tue, Apr 2, 2013 at 10:04 AM, burakkk <[EMAIL PROTECTED]> wrote:
> >
> > > So what do you suggest? Is it clear?
> > >
> > >
> > > On Mon, Apr 1, 2013 at 9:35 PM, burakkk <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > I'm using only WTF graph representation to fit the memory. By
> > > > the
> way I
> > > > haven't seen any explanation from the pig 0.11 release page
> > > > about WTF
> > or
> > > > graph models.
> > > > I don't wanna use Cassovary. I believe it can be done with pig.
> > > > I implement a graph representation using WTF paper to pig and
> > > > then I'll
> > use
> > > > it to implement random walk algorithm. To do that maybe I need
> > > > to
> > improve
> > > > some features such as joins(fuzzy join) etc or implement a new
> > operator.
> > > I
> > > > can implement it using either existing operators or new operators.
> > That's
> > > > up to us and it doesn't really matter. If there is already a
> > > implementation
> > > > to random walker algorithm, please feel free to tell. Because I
> haven't
> > > > found it.
> > > > Are you proposing to create an open-source implementation of
> > > > those algorithms?
> > > > Yes, I'm proposing to implement a random walk algorithm, new
> > > > data
> model
> > > > which is representing graph. After that, people can use it
> > > > coding the
> > > pig.
> > > >
> > > > Do you suggest they should be Pig scripts added to the Pig
> > > > project,
> or
> > do
> > > > you want to create some new operators?
> > > > Maybe, it can be UDF or new operator.
> > > >
> > > > I made a quick example. It may not be completely accurate, I've
> > > > just
> > > tried
> > > > to explain it.
> > > > Think about you have a graph file just like that user_id
> > > > follower
> > > > 1 2
> > > > 1 3
> > > > 1 10
> > > > 2 3
> > > > 3 4
> > > > 3 5
> > > > ...
> > > >
> > > > Vertex List is an array including sorted vertex ids node List is
> > > > a matrix including vertex id and its starting position
> > > >
> > > >
> > > > graph = load 'graph' using PigStorage() (vertex:int,
> > > > follower:int) - --load the graph file vertex = COGROUP graph BY
> > > > (vertex); list = FOREACH vertex GENERATE
> > > > org.apache.pig.generateVertex(vertex)
> as
> > > > vertexList; --load the whole vertexes from HDFS into the memory
+
Gianmarco De Francisci Mo... 2013-04-09, 07:10