Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Insert into tall table 50% faster than wide table


Copy link to this message
-
Re: Insert into tall table 50% faster than wide table
Perhaps slow wide table insert performance is related to row versioning? If I have a customer row and keep adding order columns one by one, I'm thinking that there might be a version kept of the row for every order I add? If I am simply inserting a new row for every order, there is no versioning going on. Could this be causing performance problems?

On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote:

> It appears to be the same or better, not to derail my original question. The much slower write performance will cause problems for me unless I can resolve that.
>
> On Dec 22, 2010, at 3:52 PM, Peter Haidinyak wrote:
>
>> Interesting, do you know what the time difference would be on the other side, doing a lookup/scan?
>>
>> Thanks
>>
>> -Pete
>>
>> -----Original Message-----
>> From: Bryan Keller [mailto:[EMAIL PROTECTED]]
>> Sent: Wednesday, December 22, 2010 3:41 PM
>> To: [EMAIL PROTECTED]
>> Subject: Insert into tall table 50% faster than wide table
>>
>> I have been testing a couple of different approaches to storing customer orders. One is a tall table, where each order is a row. The other is a wide table where each customer is a row, and orders are columns in the row. I am finding that inserts into the tall table, i.e. adding rows for every order, is roughly 50% faster than inserts into the wide table, i.e. adding a row for a customer and then adding columns for orders.
>>
>> In my test, there are 10,000 customers, each customer has 600 orders and each order has 10 columns. The tall table approach results in 6 mil rows of 10 columns. The wide table approach results is 10,000 rows of 6,000 columns. I'm using hbase 0.89-20100924 and hadoop 0.20.2. I am adding the orders using a Put for each order, submitted in batches of 1000 as a list of Puts.
>>
>> Are there techniques to speed up inserts with the wide table approach that I am perhaps overlooking?
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB