Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Writes With Large Number of Columns


+
Pankaj Misra 2013-03-25, 16:55
+
Ted Yu 2013-03-25, 16:59
+
Pankaj Misra 2013-03-25, 17:18
+
Ted Yu 2013-03-25, 17:45
+
Pankaj Misra 2013-03-25, 18:03
+
Ted Yu 2013-03-25, 18:24
+
Jean-Marc Spaggiari 2013-03-25, 18:27
+
Pankaj Misra 2013-03-25, 18:40
+
Ted Yu 2013-03-25, 19:39
Copy link to this message
-
RE: HBase Writes With Large Number of Columns
Yes Ted, we have been observing Thrift API to clearly outperform Java native Hbase API, due to binary communication protocol, at higher loads.

Tariq, the specs of the machine on which we are performing these tests are as given below.

Processor : i3770K, 8 logical cores (4 physical, with 2 logical per physical core), 3.5 Ghz clock speed
RAM: 32 GB DDR3
HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
HDFS and Hbase deployed in pseudo-distributed mode.
We are having 4 parallel streams writing to HBase.

We used the same setup for the previous tests as well, and to be very frank, we did expect a bit of drop in performance when we had to test with 40 columns, but did not expect to get half the performance. When we tested with 20 columns, we were consistently getting a performance of 200 mbps of writes. But with 40 columns we are getting 90 mbps of throughput only on the same setup.

Thanks and Regards
Pankaj Misra
________________________________________
From: Ted Yu [[EMAIL PROTECTED]]
Sent: Tuesday, March 26, 2013 1:09 AM
To: [EMAIL PROTECTED]
Subject: Re: HBase Writes With Large Number of Columns

bq. These records are being written using batch mutation with thrift API
This is an important information, I think.

Batch mutation through Java API would incur lower overhead.

On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
<[EMAIL PROTECTED]>wrote:

> Firstly, Thanks a lot Jean and Ted for your extended help, very much
> appreciate it.
>
> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
> distributed across these columns.
>
> Jean, some columns are storing as small as a single byte value, while few
> of the columns are storing as much as 80-125 bytes of data. The overall
> record size is 1.5 KB. These records are being written using batch mutation
> with thrift API, where in we are writing 100 records per batch mutation.
>
> Thanks and Regards
> Pankaj Misra
>
>
> ________________________________________
> From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
> Sent: Monday, March 25, 2013 11:57 PM
> To: [EMAIL PROTECTED]
> Subject: Re: HBase Writes With Large Number of Columns
>
> I just ran some LoadTest to see if I can reproduce that.
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
> -num_keys 1000000
> 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
> cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
> Current: [keys/s=4413, latency=22 ms], insertedUpTo=-1
>
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:512:100
> -num_keys 1000000
>
> This one crashed because I don't have enought disk space, so I'm
> re-running it, but just before it crashed it was showing about 24.5
> slower. which is coherent since it's writing 25 more columns.
>
> What size of data do you have? Big cells? Small cells? I will retry
> the test above with more lines and keep you posted.
>
> 2013/3/25 Pankaj Misra <[EMAIL PROTECTED]>:
> > Yes Ted, you are right, we are having table regions pre-split, and we
> see that both regions are almost evenly filled in both the tests.
> >
> > This does not seem to be a regression though, since we were getting good
> write rates when we had lesser number of columns.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > ________________________________________
> > From: Ted Yu [[EMAIL PROTECTED]]
> > Sent: Monday, March 25, 2013 11:15 PM
> > To: [EMAIL PROTECTED]
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > Copying Ankit who raised the same question soon after Pankaj's initial
> > question.
> >
> > On one hand I wonder if this was a regression in 0.94.5 (though
> unlikely).
> >
> > Did the region servers receive (relatively) same write load for the
> second
> > test case ? I assume you have pre-split your tables in both cases.
> >
> > Cheers
> >
> > On Mon, Mar 25, 2013 at 10:18 AM, Pankaj Misra
> > <[EMAIL PROTECTED]>wrote:

________________________________
NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
+
Jean-Marc Spaggiari 2013-03-25, 23:49
+
ramkrishna vasudevan 2013-03-26, 06:19
+
Asaf Mesika 2013-03-27, 21:52
+
Ted Yu 2013-03-27, 22:06
+
Asaf Mesika 2013-03-27, 22:28
+
Ted Yu 2013-03-27, 22:33
+
Mohammad Tariq 2013-03-25, 19:30