Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Streaming Hadoop using C

Copy link to this message
Re: Streaming Hadoop using C
The documentation on Starfish http://www.cs.duke.edu/starfish/index.html
looks promising , I have not used it. I wonder if others on the list have found it more useful than setting mapred.task.profile.
On Feb 29, 2012, at 3:53 PM, Mark question wrote:

> I've used hadoop profiling (.prof) to show the stack trace but it was hard
> to follow. jConsole locally since I couldn't find a way to set a port
> number to child processes when running them remotely. Linux commands
> (top,/proc), showed me that the virtual memory is almost twice as my
> physical which means swapping is happening which is what I'm trying to
> avoid.
> So basically, is there a way to assign a port to child processes to monitor
> them remotely (asked before by Xun) or would you recommend another
> monitoring tool?
> Thank you,
> Mark
> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl <[EMAIL PROTECTED]>wrote:
>> Mark,
>> So if I understand, it is more the memory management that you are
>> interested in, rather than a need to run an existing C or C++ application
>> in MapReduce platform?
>> Have you done profiling of the application?
>> C
>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>>> Thanks Charles .. I'm running Hadoop for research to perform duplicate
>>> detection methods. To go deeper, I need to understand what's slowing my
>>> program, which usually starts with analyzing memory to predict best input
>>> size for map task. So you're saying piping can help me control memory
>> even
>>> though it's running on VM eventually?
>>> Thanks,
>>> Mark
>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <[EMAIL PROTECTED]
>>> wrote:
>>>> Mark,
>>>> Both streaming and pipes allow this, perhaps more so pipes at the level
>> of
>>>> the mapreduce task. Can you provide more details on the application?
>>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>>>>> Hi guys, thought I should ask this before I use it ... will using C
>> over
>>>>> Hadoop give me the usual C memory management? For example, malloc() ,
>>>>> sizeof() ? My guess is no since this all will eventually be turned into
>>>>> bytecode, but I need more control on memory which obviously is hard for
>>>> me
>>>>> to do with Java.
>>>>> Let me know of any advantages you know about streaming in C over
>> hadoop.
>>>>> Thank you,
>>>>> Mark