Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Streaming Hadoop using C


Copy link to this message
-
Re: Streaming Hadoop using C
How was your experience of starfish?
C
On Mar 1, 2012, at 12:35 AM, Mark question wrote:

> Thank you for your time and suggestions, I've already tried starfish, but
> not jmap. I'll check it out.
> Thanks again,
> Mark
>
> On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl <[EMAIL PROTECTED]>wrote:
>
>> I assume you have also just tried running locally and using the jdk
>> performance tools (e.g. jmap) to gain insight by configuring hadoop to run
>> absolute minimum number of tasks?
>> Perhaps the discussion
>>
>> http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
>> might be relevant?
>> On Feb 29, 2012, at 3:53 PM, Mark question wrote:
>>
>>> I've used hadoop profiling (.prof) to show the stack trace but it was
>> hard
>>> to follow. jConsole locally since I couldn't find a way to set a port
>>> number to child processes when running them remotely. Linux commands
>>> (top,/proc), showed me that the virtual memory is almost twice as my
>>> physical which means swapping is happening which is what I'm trying to
>>> avoid.
>>>
>>> So basically, is there a way to assign a port to child processes to
>> monitor
>>> them remotely (asked before by Xun) or would you recommend another
>>> monitoring tool?
>>>
>>> Thank you,
>>> Mark
>>>
>>>
>>> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> Mark,
>>>> So if I understand, it is more the memory management that you are
>>>> interested in, rather than a need to run an existing C or C++
>> application
>>>> in MapReduce platform?
>>>> Have you done profiling of the application?
>>>> C
>>>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>>>>
>>>>> Thanks Charles .. I'm running Hadoop for research to perform duplicate
>>>>> detection methods. To go deeper, I need to understand what's slowing my
>>>>> program, which usually starts with analyzing memory to predict best
>> input
>>>>> size for map task. So you're saying piping can help me control memory
>>>> even
>>>>> though it's running on VM eventually?
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <
>> [EMAIL PROTECTED]
>>>>> wrote:
>>>>>
>>>>>> Mark,
>>>>>> Both streaming and pipes allow this, perhaps more so pipes at the
>> level
>>>> of
>>>>>> the mapreduce task. Can you provide more details on the application?
>>>>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>>>>>>
>>>>>>> Hi guys, thought I should ask this before I use it ... will using C
>>>> over
>>>>>>> Hadoop give me the usual C memory management? For example, malloc() ,
>>>>>>> sizeof() ? My guess is no since this all will eventually be turned
>> into
>>>>>>> bytecode, but I need more control on memory which obviously is hard
>> for
>>>>>> me
>>>>>>> to do with Java.
>>>>>>>
>>>>>>> Let me know of any advantages you know about streaming in C over
>>>> hadoop.
>>>>>>> Thank you,
>>>>>>> Mark
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>