Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Hadoop setup doubts


Copy link to this message
-
Re: Hadoop setup doubts
Adam Kawa 2013-12-15, 16:53
Hi,

> 2.       How does log aggregation work?
>
http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

> 4.       What is the purpose of the webproxy? Is it really required?
>
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html
> 5.       Is there any documentation on how to decide which scheduler type
> based on certain parameters?
>
I am not sure, if I fully understand the question.
You can use only one scheduler at the same time. On run-time, you can
decided which pool or queue, your job should be submitted to, if you use
Fair or Capacity schedule.

> 6.       What is the recommended way of pushing  data into Hadoop cluster
> & submitting  mapred jobs, i.e should we use another client  node, if so is
> there any client daemon to run on it ?
>
> ---- Do you have experiance with UNIX, if so hadoop commands are similer
> to UNIX commands. Ex. below command works fine for me.
>
> hdfs dfs -copyFromLocal <localfiledir> <hdfs file directory>
>
Usually, we push data to the cluster + submit mapreduce jobs, from machines
called "edgenodes". In Hadoop, the edgenode is a machine where the hadoop
client libraries are installed (+ pig, hive, sqoop etc, if you want to use
them), but no Hadoop daemon is running.

Hope this helps a bit!
> On Sat, Dec 14, 2013 at 4:03 PM, Indranil Majumder (imajumde) <
> [EMAIL PROTECTED]> wrote:
>
>>  I stared with Hadoop few days ago, I do have few doubts on the setup,
>>
>>
>>
>> 1.       For name node I do format the name directory, is it recommended
>> to do the same for the data node directories too.
>>
>> 2.       How does log aggregation work?
>>
>> 3.       Does resource manager run on every node (both Name and Data) or
>> it can run as a separate node?
>>
>> 4.       What is the purpose of the webproxy? Is it really required?
>>
>> 5.       Is there any documentation on how to decide which scheduler
>> type based on certain parameters?
>>
>> 6.       What is the recommended way of pushing  data into Hadoop
>> cluster & submitting  mapred jobs, i.e should we use another client  node,
>> if so is there any client daemon to run on it ?
>>
>> 7.       For the following nodes in clustered mode
>>
>> A.      NameNode
>>
>> B.      Secondary NameNode
>>
>> C.      DataNode (2)
>>
>> D.      Resource Manager
>>
>> E.       WebProxy
>>
>> F.       History Server( Map Reduce )
>>
>> I want to write a PID monitor. Does anybody has the list of processes
>> that would run on this clusters when fully operational [may be output of ps
>> –ef | grep “somekeyword” will do]
>>
>>
>>
>> Thanks & Regards,
>>
>> Indranil
>>
>
>