Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Writes With Large Number of Columns


Copy link to this message
-
Re: HBase Writes With Large Number of Columns
For a total of 1.5kb with 4 columns = 384 bytes/column
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
-num_keys 1000000
13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
Current: [keys/s=4097, latency=24 ms], insertedUpTo=-1

For a total of 1.5kb with 100 columns = 15 bytes/column
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:15:100
-num_keys 1000000
13/03/25 16:27:44 INFO util.MultiThreadedAction: [W:100] Keys=999721,
cols=95,3m, time=01:27:46 Overall: [keys/s= 189, latency=525 ms]
Current: [keys/s=162, latency=616 ms], insertedUpTo=-1

So overall, the speed is the same. A bit faster with 100 columns than
with 4. I don't think there is any negative impact on HBase side
because of all those columns. Might be interesting to test the same
thing over Thrift...

JM

2013/3/25 Pankaj Misra <[EMAIL PROTECTED]>:
> Yes Ted, we have been observing Thrift API to clearly outperform Java native Hbase API, due to binary communication protocol, at higher loads.
>
> Tariq, the specs of the machine on which we are performing these tests are as given below.
>
> Processor : i3770K, 8 logical cores (4 physical, with 2 logical per physical core), 3.5 Ghz clock speed
> RAM: 32 GB DDR3
> HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
> HDFS and Hbase deployed in pseudo-distributed mode.
> We are having 4 parallel streams writing to HBase.
>
> We used the same setup for the previous tests as well, and to be very frank, we did expect a bit of drop in performance when we had to test with 40 columns, but did not expect to get half the performance. When we tested with 20 columns, we were consistently getting a performance of 200 mbps of writes. But with 40 columns we are getting 90 mbps of throughput only on the same setup.
>
> Thanks and Regards
> Pankaj Misra
>
>
> ________________________________________
> From: Ted Yu [[EMAIL PROTECTED]]
> Sent: Tuesday, March 26, 2013 1:09 AM
> To: [EMAIL PROTECTED]
> Subject: Re: HBase Writes With Large Number of Columns
>
> bq. These records are being written using batch mutation with thrift API
> This is an important information, I think.
>
> Batch mutation through Java API would incur lower overhead.
>
> On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
> <[EMAIL PROTECTED]>wrote:
>
>> Firstly, Thanks a lot Jean and Ted for your extended help, very much
>> appreciate it.
>>
>> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
>> distributed across these columns.
>>
>> Jean, some columns are storing as small as a single byte value, while few
>> of the columns are storing as much as 80-125 bytes of data. The overall
>> record size is 1.5 KB. These records are being written using batch mutation
>> with thrift API, where in we are writing 100 records per batch mutation.
>>
>> Thanks and Regards
>> Pankaj Misra
>>
>>
>> ________________________________________
>> From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
>> Sent: Monday, March 25, 2013 11:57 PM
>> To: [EMAIL PROTECTED]
>> Subject: Re: HBase Writes With Large Number of Columns
>>
>> I just ran some LoadTest to see if I can reproduce that.
>>
>> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
>> -num_keys 1000000
>> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
>> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
>> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
>>
>> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
>> -num_keys 1000000
>>
>> This one crashed because I don't have enought disk space, so I'm
>> re-running it, but just before it crashed it was showing about 24.5
>> slower. which is coherent since it's writing 25 more columns.
>>
>> What size of data do you have? Big cells? Small cells? I will retry
>> the test above with more lines and keep you posted.
>>
>> 2013/3/25 Pankaj Misra <[EMAIL PROTECTED]>: