Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Is So Slow To Save Data?


Copy link to this message
-
Re: HBase Is So Slow To Save Data?
I see. Thanks so much!

Bing
On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <[EMAIL PROTECTED]> wrote:

> It's not useful here: if you have a memory issue, it's when your using the
> list, not when you have finished with it and set it to null.
> You need to monitor the memory consumption of the jvm, both the client &
> the server.
> Google around these keywords, there are many examples on the web.
> Google as well arrayList initialization.
>
> Note as well that the important is not the memory size of the structure on
> disk but the size of the" List<Put> puts = new ArrayList<Put>();" before
> the table put.
>
> On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <[EMAIL PROTECTED]> wrote:
>
> > Dear N Keywal,
> >
> > Thanks so much for your reply!
> >
> > The total amount of data is about 110M. The available memory is enough,
> 2G.
> >
> > In Java, I just set a collection to NULL to collect garbage. Do you think
> > it is fine?
> >
> > Best regards,
> > Bing
> >
> >
> > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <[EMAIL PROTECTED]> wrote:
> >
> >> Hi Bing,
> >>
> >> You should expect HBase to be slower in the generic case:
> >> 1) it writes much more data (see hbase data model), with extra columns
> >> qualifiers, timestamps & so on.
> >> 2) the data is written multiple times: once in the write-ahead-log, once
> >> per replica on datanode & so on again.
> >> 3) there are inter process calls & inter machine calls on the critical
> >> path.
> >>
> >> This is the cost of the atomicity, reliability and scalability features.
> >> With these features in mind, HBase is reasonably fast to save data on a
> >> cluster.
> >>
> >> On your specific case (without the points 2 & 3 above), the performance
> >> seems to be very bad.
> >>
> >> You should first look at:
> >> - how much is spent in the put vs. preparing the list
> >> - do you have garbage collection going on? even swap?
> >> - what's the size of your final Array vs. the available memory?
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >>
> >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <[EMAIL PROTECTED]> wrote:
> >>
> >>> Dear all,
> >>>
> >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> >>>
> >>> Best regards,
> >>> Bing
> >>>
> >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <[EMAIL PROTECTED]> wrote:
> >>>
> >>> > Dear all,
> >>> >
> >>> > According to my experiences, it is very slow for HBase to save data?
> >>> Am I
> >>> > right?
> >>> >
> >>> > For example, today I need to save data in a HashMap to HBase. It took
> >>> > about more than three hours. However when saving the same HashMap in
> a
> >>> file
> >>> > in the text format with the redirected System.out, it took only 4.5
> >>> seconds!
> >>> >
> >>> > Why is HBase so slow? It is indexing?
> >>> >
> >>> > My code to save data in HBase is as follows. I think the code must be
> >>> > correct.
> >>> >
> >>> >         ......
> >>> >         public synchronized void
> >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
> >>> timingScale)
> >>> >         {
> >>> >                 List<Put> puts = new ArrayList<Put>();
> >>> >
> >>> >                 String hhNeighborRowKey;
> >>> >                 Put hubKeyPut;
> >>> >                 Put groupKeyPut;
> >>> >                 Put topGroupKeyPut;
> >>> >                 Put timingScalePut;
> >>> >                 Put nodeKeyPut;
> >>> >                 Put hubNeighborTypePut;
> >>> >
> >>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> >>> > Set<String>>> sourceHubGroupNeighborEntry :
> >>> hhOutNeighborMap.entrySet())
> >>> >                 {
> >>> >                         for (Map.Entry<String, Set<String>>
> >>> > groupNeighborEntry :
> sourceHubGroupNeighborEntry.getValue().entrySet())
> >>> >                         {
> >>> >                                 for (String neighborKey :
> >>> > groupNeighborEntry.getValue())
> >>> >                                 {
> >>> >                                         hhNeighborRowKey > >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +