|
|
Gang Luo 2010-03-03, 20:15
Hi all, I want to generate some datasets with data skew to test my mapreduce jobs. I am using TPC-DS but it seems I cannot control the data skew level. There is a suite from Microsoft that could generate skewed datasets based on TPC-D, but only workable in windows. I haven't succeed make it compilable in linux yet. Please tell me how can I get some skewed dataset.
Thanks. -Gang
Aaron Kimball 2010-03-03, 20:50
Look at implementing your own Partitioner implementation to control which records are sent to which reduce shards.
- Aaron
On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <[EMAIL PROTECTED]> wrote:
> Hi all, > I want to generate some datasets with data skew to test my mapreduce jobs. > I am using TPC-DS but it seems I cannot control the data skew level. There > is a suite from Microsoft that could generate skewed datasets based on > TPC-D, but only workable in windows. I haven't succeed make it compilable in > linux yet. Please tell me how can I get some skewed dataset. > > Thanks. > -Gang > > > > >
Gang Luo 2010-03-03, 21:06
That is a good idea, but doesn't work in my case. What I want to do is to test how my partitioner could divide the workload properly. It is supposed to go against skew, but not to generate skew. I still need a skewed data source. Any ideas?
Thanks, -Gang
----- 原始� 始�---- 发件人: Aaron Kimball <[EMAIL PROTECTED]> 收件人: [EMAIL PROTECTED] 发送日期: 2010/3/3 (周三) 3:50:59 下午 主 题: Re: dataset
Look at implementing your own Partitioner implementation to control which records are sent to which reduce shards.
- Aaron
On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo <[EMAIL PROTECTED]> wrote:
> Hi all, > I want to generate some datasets with data skew to test my mapreduce jobs. > I am using TPC-DS but it seems I cannot control the data skew level. There > is a suite from Microsoft that could generate skewed datasets based on > TPC-D, but only workable in windows. I haven't succeed make it compilable in > linux yet. Please tell me how can I get some skewed dataset. > > Thanks. > -Gang > > > > >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext