Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> GSoC 2013


Copy link to this message
-
Re: GSoC 2013
So what do you suggest? Is it clear?
On Mon, Apr 1, 2013 at 9:35 PM, burakkk <[EMAIL PROTECTED]> wrote:

> I'm using only WTF graph representation to fit the memory. By the way I
> haven't seen any explanation from the pig 0.11 release page about WTF or
> graph models.
> I don't wanna use Cassovary. I believe it can be done with pig. I
> implement a graph representation using WTF paper to pig and then I'll use
> it to implement random walk algorithm. To do that maybe I need to improve
> some features such as joins(fuzzy join) etc or implement a new operator. I
> can implement it using either existing operators or new operators. That's
> up to us and it doesn't really matter. If there is already a implementation
> to random walker algorithm, please feel free to tell. Because I haven't
> found it.
> Are you proposing to create an open-source implementation of those
> algorithms?
> Yes, I'm proposing to implement a random walk algorithm, new data model
> which is representing graph. After that, people can use it coding the pig.
>
> Do you suggest they should be Pig scripts added to the Pig project, or do
> you want to create some new operators?
> Maybe, it can be UDF or new operator.
>
> I made a quick example. It may not be completely accurate, I've just tried
> to explain it.
> Think about you have a graph file just like that
> user_id follower
> 1 2
> 1 3
> 1 10
> 2 3
> 3 4
> 3 5
> ...
>
> Vertex List is an array including sorted vertex ids
> node List is a matrix including vertex id and its starting position
>
>
> graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
> --load the graph file
> vertex = COGROUP graph BY (vertex);
> list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as
> vertexList; --load the whole vertexes from HDFS into the memory
> list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
> nodeList; --load the whole vertexes from HDFS into the memory
> randomWalk = FOREACH vertex GENERATE
> flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; -- generate a
> score using the node list you can traverse the graph to the your finishing
> position
> store...
>
>
> Thanks
> Best Regards...
>
>
> On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>
>> I'm somewhat familiar with WTF code (my day job is managing the analytics
>> infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in
>> fact
>> some of the Pig 11 features/improvements are directly due to this
>> project...), and mostly has to do with clever algorithms implemented in
>> Pig
>> (an earlier version of WTF loaded the graph into main memory on large-mem
>> machines -- that system is open sourced, too, under
>> github.com/twitter/cassovary). Are you proposing to create an open-source
>> implementation of those algorithms? Do you suggest they should be Pig
>> scripts added to the Pig project, or do you want to create some new
>> operators? I'm not totally sure where you are going here.
>>
>> GSoC proposals for Pig are usually made by students who want to work on
>> issues labeled as GSoC candidates on the apache jira. The students spend
>> some time to understand the problem stated in the jira, familiarize
>> themselves with the existing codebase, and put a basic technical
>> implementation plan and schedule into their proposal. Since in this case
>> you are proposing something we haven't scoped or defined well for
>> ourselves, we need you to be very clear and specific about what you are
>> trying to do, and how you plan to go about it. I think that Graph
>> processing in Pig (or other Hadoop-based systems) is a really interesting
>> topic and there is a lot of work to be done, but we really need you to be
>> far more detailed to be able to give you good guidance with regards to
>> GSoC.
>>
>> Best,
>> Dmitriy
>>
>>
>> On Sat, Mar 30, 2013 at 10:12 AM, burakkk <[EMAIL PROTECTED]> wrote:
>>
>> > Sure. We can implement a graph model using  "WTF: The Who to Follow
*BURAK ISIKLI** *| *http://burakisikli.wordpress.com*
*
*