Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - Re: MiniMRCluster usage in dependent projects

Copy link to this message
Re: MiniMRCluster usage in dependent projects
Todd Lipcon 2012-05-10, 21:40
[changing thread name to not hijack the vote thread]

On Thu, May 10, 2012 at 11:23 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
> Hi Todd,
>> Have you seen the new MiniMRClientCluster class? It's meant to be what
>> you describe - a minicluster which only exposes "external" APIs --
>> most importantly a way of getting at a JobClient to submit jobs. We
>> have it implemented in both 1.x and 2.x at this point, though I don't
>> recall if it's in the 1.0.x releases or if it's only slated for 1.1+
> Do you mean the below?
>    /*
>     * A simple interface for a client MR cluster used for testing.
> This interface
>     * provides basic methods which are independent of the underlying
> Mini Cluster (
>     * either through MR1 or MR2).
>     */
>    public interface MiniMRClientCluster {
>      public void start() throws IOException;
>      public void stop() throws IOException;
>      public Configuration getConfig() throws IOException;
>    }
> This doesn't sufficiently encapsulate the mini MR cluster for the
> purposes of a test rig. The issues we've seen are variations in what
> configuration variables are required: their names, and their
> semantics, for finding information about how the cluster is set up.
> Let's take one basic case, how does one find the address of the job
> tracker in a version agnostic way? For example, perhaps:
>    public InetSocketAddress getJobTrackerAddress();

The issue is that MR2 doesn't have a JobTracker address. Neither does
it have TaskTrackers. So there is no real way to expose this.

I don't see any reason that HBase should need to get these things --
so long as it can get a Configuration, it should be able to submit

> or at a higher level of abstraction:
>    public JobTrackerInfo getJobTracker();
>    public TaskTrackerInfo[] getTaskTrackers();
> and, since this a test rig, we'd like to terminate, perhaps abruptly,
> a task tracker, or launch replacements, or launch new ones.
>    public boolean stopTaskTracker(TaskTrackerInfo tracker, boolean force);
>    public TaskTrackerInfo startTaskTracker(... /* some universal
> public parameters TBD */);

The above should only be useful for system-testing MR itself. But for
dependent projects (eg HBase/Hive/etc) what's the use case?

Todd Lipcon
Software Engineer, Cloudera