Aaron Kimball 2010-03-03, 20:50
Look at implementing your own Partitioner implementation to control which
records are sent to which reduce shards.
On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <[EMAIL PROTECTED]> wrote:
> Hi all,
> I want to generate some datasets with data skew to test my mapreduce jobs.
> I am using TPC-DS but it seems I cannot control the data skew level. There
> is a suite from Microsoft that could generate skewed datasets based on
> TPC-D, but only workable in windows. I haven't succeed make it compilable in
> linux yet. Please tell me how can I get some skewed dataset.