-Re: MiniMRCluster usage in dependent projects
Andrew Purtell 2012-05-11, 00:38
> The issue is that MR2 doesn't have a JobTracker address. Neither does
> it have TaskTrackers. So there is no real way to expose this.
Yes, I know that, then pick other names that represent the same
functional components. For a MapReduce job, even in V2, there is a job
tracker, and there are task trackers. This would not be an interface
for system testing exactly, it would be for the client to simulate
failures that might happen from its perspective.
> I don't see any reason that HBase should need to get these things --
This isn't just about HBase. What about Pig or Hive or Giraph or ...
any other project layered above MR.
[ Similar discussion about HDFS skipped. ]
> The above should only be useful for system-testing MR itself. But for
> dependent projects (eg HBase/Hive/etc) what's the use case?
Full stack tests, unit tests.
Regarding MR miniclusters, do you think we could transition all of the
MR based HBase tests to MiniMRClientCluster (or even MRUnit)? If so,
then I would agree I'm raising something without a clear use case and
we could do that migration.
Regarding HDFS miniclusters, the interface is already limited-private
and there is no pressing need, but we do have test cases where we need
to simulate DataNode failures. Also, I can conceive of an application
unit test where I would want to set replication to 1 on some file,
then corrupt blocks, then check that repair (at the application level)
was successful. Would some limited public interface for that be
We have Bigtop for end to end testing on real clusters but it's still
in incubation and spinning up a real cluster to run unit tests is not
Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)