Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase Writes With Large Number of Columns


+
Pankaj Misra 2013-03-25, 16:55
+
Ted Yu 2013-03-25, 16:59
+
Pankaj Misra 2013-03-25, 17:18
+
Ted Yu 2013-03-25, 17:45
+
Pankaj Misra 2013-03-25, 18:03
+
Ted Yu 2013-03-25, 18:24
+
Jean-Marc Spaggiari 2013-03-25, 18:27
+
Pankaj Misra 2013-03-25, 18:40
+
Ted Yu 2013-03-25, 19:39
+
Pankaj Misra 2013-03-25, 20:54
+
Jean-Marc Spaggiari 2013-03-25, 23:49
Copy link to this message
-
Re: HBase Writes With Large Number of Columns
Hi Pankaj

Is it possible for you to profile the RS when this happens?  Either may be
like the Thrift adds an overhead or it should be some where the code is
spending more time.

As you said there may be a slight decrease in performance of the put
because now more values has to go in but should not be this significant.
We can work on based on the profile output and check what are we doing.

Regards
Ram

On Tue, Mar 26, 2013 at 5:19 AM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> For a total of 1.5kb with 4 columns = 384 bytes/column
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
> -num_keys 1000000
> 13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
> cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
> Current: [keys/s=4097, latency=24 ms], insertedUpTo=-1
>
> For a total of 1.5kb with 100 columns = 15 bytes/column
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 100:15:100
> -num_keys 1000000
> 13/03/25 16:27:44 INFO util.MultiThreadedAction: [W:100] Keys=999721,
> cols=95,3m, time=01:27:46 Overall: [keys/s= 189, latency=525 ms]
> Current: [keys/s=162, latency=616 ms], insertedUpTo=-1
>
> So overall, the speed is the same. A bit faster with 100 columns than
> with 4. I don't think there is any negative impact on HBase side
> because of all those columns. Might be interesting to test the same
> thing over Thrift...
>
> JM
>
> 2013/3/25 Pankaj Misra <[EMAIL PROTECTED]>:
> > Yes Ted, we have been observing Thrift API to clearly outperform Java
> native Hbase API, due to binary communication protocol, at higher loads.
> >
> > Tariq, the specs of the machine on which we are performing these tests
> are as given below.
> >
> > Processor : i3770K, 8 logical cores (4 physical, with 2 logical per
> physical core), 3.5 Ghz clock speed
> > RAM: 32 GB DDR3
> > HDD: Single SATA 2 TB disk, Two 250 GB SATA HDD - Total of 3 disks
> > HDFS and Hbase deployed in pseudo-distributed mode.
> > We are having 4 parallel streams writing to HBase.
> >
> > We used the same setup for the previous tests as well, and to be very
> frank, we did expect a bit of drop in performance when we had to test with
> 40 columns, but did not expect to get half the performance. When we tested
> with 20 columns, we were consistently getting a performance of 200 mbps of
> writes. But with 40 columns we are getting 90 mbps of throughput only on
> the same setup.
> >
> > Thanks and Regards
> > Pankaj Misra
> >
> >
> > ________________________________________
> > From: Ted Yu [[EMAIL PROTECTED]]
> > Sent: Tuesday, March 26, 2013 1:09 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: HBase Writes With Large Number of Columns
> >
> > bq. These records are being written using batch mutation with thrift API
> > This is an important information, I think.
> >
> > Batch mutation through Java API would incur lower overhead.
> >
> > On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
> > <[EMAIL PROTECTED]>wrote:
> >
> >> Firstly, Thanks a lot Jean and Ted for your extended help, very much
> >> appreciate it.
> >>
> >> Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
> >> distributed across these columns.
> >>
> >> Jean, some columns are storing as small as a single byte value, while
> few
> >> of the columns are storing as much as 80-125 bytes of data. The overall
> >> record size is 1.5 KB. These records are being written using batch
> mutation
> >> with thrift API, where in we are writing 100 records per batch mutation.
> >>
> >> Thanks and Regards
> >> Pankaj Misra
> >>
> >>
> >> ________________________________________
> >> From: Jean-Marc Spaggiari [[EMAIL PROTECTED]]
> >> Sent: Monday, March 25, 2013 11:57 PM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: HBase Writes With Large Number of Columns
> >>
> >> I just ran some LoadTest to see if I can reproduce that.
> >>
> >> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
> >> -num_keys 1000000
+
Asaf Mesika 2013-03-27, 21:52
+
Ted Yu 2013-03-27, 22:06
+
Asaf Mesika 2013-03-27, 22:28
+
Ted Yu 2013-03-27, 22:33
+
Mohammad Tariq 2013-03-25, 19:30
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB