Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # general - [VOTE] Release hadoop-2.0.0-alpha


Copy link to this message
-
Re: [VOTE] Release hadoop-2.0.0-alpha
Andrew Purtell 2012-05-10, 18:23
Hi Todd,

> Have you seen the new MiniMRClientCluster class? It's meant to be what
> you describe - a minicluster which only exposes "external" APIs --
> most importantly a way of getting at a JobClient to submit jobs. We
> have it implemented in both 1.x and 2.x at this point, though I don't
> recall if it's in the 1.0.x releases or if it's only slated for 1.1+

Do you mean the below?

    /*
     * A simple interface for a client MR cluster used for testing.
This interface
     * provides basic methods which are independent of the underlying
Mini Cluster (
     * either through MR1 or MR2).
     */
    public interface MiniMRClientCluster {
      public void start() throws IOException;
      public void stop() throws IOException;
      public Configuration getConfig() throws IOException;
    }

This doesn't sufficiently encapsulate the mini MR cluster for the
purposes of a test rig. The issues we've seen are variations in what
configuration variables are required: their names, and their
semantics, for finding information about how the cluster is set up.
Let's take one basic case, how does one find the address of the job
tracker in a version agnostic way? For example, perhaps:

    public InetSocketAddress getJobTrackerAddress();

or at a higher level of abstraction:

    public JobTrackerInfo getJobTracker();

    public TaskTrackerInfo[] getTaskTrackers();

and, since this a test rig, we'd like to terminate, perhaps abruptly,
a task tracker, or launch replacements, or launch new ones.

    public boolean stopTaskTracker(TaskTrackerInfo tracker, boolean force);

    public TaskTrackerInfo startTaskTracker(... /* some universal
public parameters TBD */);

And, likewise for HDFS,

    public interface MiniHDFSClientCluster {
      public void start() throws IOException;
      public void stop() throws IOException;
      public Configuration getConfig() throws IOException;
      public NameNodeInfo[] getNameNodes();
      public DataNodeInfo[] getDataNodes();
      public DataNodeInfo startDataNode(...);
      public boolean stopDataNode(DataNodeInfo dn, boolean force);
      // Convenience method for getting the filesystem for the cluster
      // This needs some thought, because we have FileSystem in 1.x
and FileContext in 2.x
      // Here we will use a hypothetical wrapper that uses reflection as needed
      public FileContext getFileContext();
    }

and, perhaps additionally a convenience method for corrupting blocks:

    public void writeBlock(Block block, byte[] data, long offset,
boolean updateChecksum) throws IOException;

and so on.

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)