Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Streaming Hadoop using C


Copy link to this message
-
Re: Streaming Hadoop using C
How was your experience of starfish?
C
On Mar 1, 2012, at 12:35 AM, Mark question wrote:

> Thank you for your time and suggestions, I've already tried starfish, but
> not jmap. I'll check it out.
> Thanks again,
> Mark
>
> On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl <[EMAIL PROTECTED]>wrote:
>
>> I assume you have also just tried running locally and using the jdk
>> performance tools (e.g. jmap) to gain insight by configuring hadoop to run
>> absolute minimum number of tasks?
>> Perhaps the discussion
>>
>> http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
>> might be relevant?
>> On Feb 29, 2012, at 3:53 PM, Mark question wrote:
>>
>>> I've used hadoop profiling (.prof) to show the stack trace but it was
>> hard
>>> to follow. jConsole locally since I couldn't find a way to set a port
>>> number to child processes when running them remotely. Linux commands
>>> (top,/proc), showed me that the virtual memory is almost twice as my
>>> physical which means swapping is happening which is what I'm trying to
>>> avoid.
>>>
>>> So basically, is there a way to assign a port to child processes to
>> monitor
>>> them remotely (asked before by Xun) or would you recommend another
>>> monitoring tool?
>>>
>>> Thank you,
>>> Mark
>>>
>>>
>>> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> Mark,
>>>> So if I understand, it is more the memory management that you are
>>>> interested in, rather than a need to run an existing C or C++
>> application
>>>> in MapReduce platform?
>>>> Have you done profiling of the application?
>>>> C
>>>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>>>>
>>>>> Thanks Charles .. I'm running Hadoop for research to perform duplicate
>>>>> detection methods. To go deeper, I need to understand what's slowing my
>>>>> program, which usually starts with analyzing memory to predict best
>> input
>>>>> size for map task. So you're saying piping can help me control memory
>>>> even
>>>>> though it's running on VM eventually?
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl <
>> [EMAIL PROTECTED]
>>>>> wrote:
>>>>>
>>>>>> Mark,
>>>>>> Both streaming and pipes allow this, perhaps more so pipes at the
>> level
>>>> of
>>>>>> the mapreduce task. Can you provide more details on the application?
>>>>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>>>>>>
>>>>>>> Hi guys, thought I should ask this before I use it ... will using C
>>>> over
>>>>>>> Hadoop give me the usual C memory management? For example, malloc() ,
>>>>>>> sizeof() ? My guess is no since this all will eventually be turned
>> into
>>>>>>> bytecode, but I need more control on memory which obviously is hard
>> for
>>>>>> me
>>>>>>> to do with Java.
>>>>>>>
>>>>>>> Let me know of any advantages you know about streaming in C over
>>>> hadoop.
>>>>>>> Thank you,
>>>>>>> Mark
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB