Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Is it possible to run multiple MapReduce against the same HDFS?


Copy link to this message
-
Re: Is it possible to run multiple MapReduce against the same HDFS?
I am not positive how all of that works and I may get some of this wrong, but I believe that the map reduce user has special privileges in relation to HDFS that allows it to become another user and read the data on that users behalf.  I think that these privileges are granted by the user when it connects to the JT. I am not an expert on how the security in Hadoop works and I am likely to have gotten some of this wrong, so if there is someone on the list that wants to correct me or confirm what I have said that would be great.

--
Bobby Evans

On 10/10/11 9:56 PM, "Zhenhua (Gerald) Guo" <[EMAIL PROTECTED]> wrote:

Thanks, Robert.  I will look into hod.

When MapReduce framework accesses data stored in HDFS, which account
is used, the account which MapReduce daemons (e.g. job tracker) run as
or the account of the user who submits the job?  If HDFS and MapReduce
clusters are run with different accounts, can MapReduce cluster be
able to access HDFS directories and files (if authentication in HDFS
is enabled)?

Thanks!

Gerald

On Mon, Oct 10, 2011 at 12:36 PM, Robert Evans <[EMAIL PROTECTED]> wrote:
> It should be possible to use multiple map/reduce clusters sharing the same HDFS, you can look at hod where it launches a JT on demand.  The only change of collision that I can think of would be if by some odd chance both Job Trackers were started at exactly the same millisecond.   The JT uses the time it was started as part of the job id for all jobs.  Those job ids are assumed to be unique and used to create files/directories in HDFS to store data for that job.
>
> --Bobby Evans
>
> On 10/7/11 12:09 PM, "Zhenhua (Gerald) Guo" <[EMAIL PROTECTED]> wrote:
>
> I plan to deploy a HDFS cluster which will be shared by multiple
> MapReduce clusters.
> I wonder whether this is possible.  Will it incur any conflicts among
> MapReduce (e.g. different MapReduce clusters try to use the same temp
> directory in HDFS)?
> If it is possible, how should the security parameters be set up (e.g.
> user identity, file permission)?
>
> Thanks,
>
> Gerald
>
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB