Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - HBase Is So Slow To Save Data?


Copy link to this message
-
Re: HBase Is So Slow To Save Data?
Mohammad Tariq 2012-08-29, 15:45
Pseudo-distributed setup could be a cause.

On Wednesday, August 29, 2012, N Keywal <[EMAIL PROTECTED]> wrote:
> Hi Bing,
>
> You should expect HBase to be slower in the generic case:
> 1) it writes much more data (see hbase data model), with extra columns
> qualifiers, timestamps & so on.
> 2) the data is written multiple times: once in the write-ahead-log, once
> per replica on datanode & so on again.
> 3) there are inter process calls & inter machine calls on the critical
path.
>
> This is the cost of the atomicity, reliability and scalability features.
> With these features in mind, HBase is reasonably fast to save data on a
> cluster.
>
> On your specific case (without the points 2 & 3 above), the performance
> seems to be very bad.
>
> You should first look at:
> - how much is spent in the put vs. preparing the list
> - do you have garbage collection going on? even swap?
> - what's the size of your final Array vs. the available memory?
>
> Cheers,
>
> N.
>
>
> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <[EMAIL PROTECTED]> wrote:
>
>> Dear all,
>>
>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>>
>> Best regards,
>> Bing
>>
>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <[EMAIL PROTECTED]> wrote:
>>
>> > Dear all,
>> >
>> > According to my experiences, it is very slow for HBase to save data?
Am I
>> > right?
>> >
>> > For example, today I need to save data in a HashMap to HBase. It took
>> > about more than three hours. However when saving the same HashMap in a
>> file
>> > in the text format with the redirected System.out, it took only 4.5
>> seconds!
>> >
>> > Why is HBase so slow? It is indexing?
>> >
>> > My code to save data in HBase is as follows. I think the code must be
>> > correct.
>> >
>> >         ......
>> >         public synchronized void
>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
>> timingScale)
>> >         {
>> >                 List<Put> puts = new ArrayList<Put>();
>> >
>> >                 String hhNeighborRowKey;
>> >                 Put hubKeyPut;
>> >                 Put groupKeyPut;
>> >                 Put topGroupKeyPut;
>> >                 Put timingScalePut;
>> >                 Put nodeKeyPut;
>> >                 Put hubNeighborTypePut;
>> >
>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
>> > Set<String>>> sourceHubGroupNeighborEntry :
hhOutNeighborMap.entrySet())
>> >                 {
>> >                         for (Map.Entry<String, Set<String>>
>> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
>> >                         {
>> >                                 for (String neighborKey :
>> > groupNeighborEntry.getValue())
>> >                                 {
>> >                                         hhNeighborRowKey >> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
>> >
>> >                                         hubKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
>> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
>> >                                         puts.add(hubKeyPut);
>> >
>> >                                         groupKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> >
groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
>> > Bytes.toBytes(groupNeighborEntry.getKey()));
>> >

--
Regards,
    Mohammad Tariq