|
|
-
Extremely slow when loading small amount of data from HBase
某因幡 2012-08-28, 06:49
When I load a range of data from HBase simply using row key range in HBaseStorageHandler, I find that the speed is acceptable when I'm trying to load some tens of millions rows or more, while the only map ends up in a timeout when it's some thousands of rows. What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0. -- language: Chinese, Japanese, English
-
Re: Extremely slow when loading small amount of data from HBase
Dmitriy Ryaboy 2012-08-29, 06:41
Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc)
On Aug 27, 2012, at 11:49 PM, 某因幡 <[EMAIL PROTECTED]> wrote:
> When I load a range of data from HBase simply using row key range in > HBaseStorageHandler, I find that the speed is acceptable when I'm > trying to load some tens of millions rows or more, while the only map > ends up in a timeout when it's some thousands of rows. > What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0. > > > -- > language: Chinese, Japanese, English
+
Dmitriy Ryaboy 2012-08-29, 06:41
-
Re: Extremely slow when loading small amount of data from HBase
某因幡 2012-09-04, 10:39
After merging ~8000 regions to ~4000 on an 8-node cluster the things is getting better. Should I continue merging? 2012/8/29 Dmitriy Ryaboy <[EMAIL PROTECTED]>: > Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc) > > On Aug 27, 2012, at 11:49 PM, 某因幡 <[EMAIL PROTECTED]> wrote: > >> When I load a range of data from HBase simply using row key range in >> HBaseStorageHandler, I find that the speed is acceptable when I'm >> trying to load some tens of millions rows or more, while the only map >> ends up in a timeout when it's some thousands of rows. >> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0. >> >> >> -- >> language: Chinese, Japanese, English
-- language: Chinese, Japanese, English
-
Re: Extremely slow when loading small amount of data from HBase
Dmitriy Ryaboy 2012-09-04, 11:54
I think the hbase folks recommend something like 40 regions per node per table, but I might be misremembering something. Have you tried emailing the hbase users list?
On Sep 4, 2012, at 3:39 AM, 某因幡 <[EMAIL PROTECTED]> wrote:
> After merging ~8000 regions to ~4000 on an 8-node cluster the things > is getting better. > Should I continue merging? > > > 2012/8/29 Dmitriy Ryaboy <[EMAIL PROTECTED]>: >> Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc) >> >> On Aug 27, 2012, at 11:49 PM, 某因幡 <[EMAIL PROTECTED]> wrote: >> >>> When I load a range of data from HBase simply using row key range in >>> HBaseStorageHandler, I find that the speed is acceptable when I'm >>> trying to load some tens of millions rows or more, while the only map >>> ends up in a timeout when it's some thousands of rows. >>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0. >>> >>> >>> -- >>> language: Chinese, Japanese, English > > > > -- > language: Chinese, Japanese, English
+
Dmitriy Ryaboy 2012-09-04, 11:54
|
|