Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Is So Slow To Save Data?


Copy link to this message
-
Re: HBase Is So Slow To Save Data?
Dear Cristofer,

Thanks so much for your reminding!

Best regards,
Bing

On Thu, Aug 30, 2012 at 12:32 AM, Cristofer Weber <
[EMAIL PROTECTED]> wrote:

> There's also a lot of conversions from same values to byte array
> representation, eg, your NeighborStructure constants. You should do this
> conversion only once to save time, since you are doing this inside 3 nested
> loops. Not sure about how much this can improve, but you should try this
> also.
>
> Best regards,
> Cristofer
>
> -----Mensagem original-----
> De: Bing Li [mailto:[EMAIL PROTECTED]]
> Enviada em: quarta-feira, 29 de agosto de 2012 13:07
> Para: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Assunto: Re: HBase Is So Slow To Save Data?
>
> I see. Thanks so much!
>
> Bing
>
>
> On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <[EMAIL PROTECTED]> wrote:
>
> > It's not useful here: if you have a memory issue, it's when your using
> > the list, not when you have finished with it and set it to null.
> > You need to monitor the memory consumption of the jvm, both the client
> > & the server.
> > Google around these keywords, there are many examples on the web.
> > Google as well arrayList initialization.
> >
> > Note as well that the important is not the memory size of the
> > structure on disk but the size of the" List<Put> puts = new
> > ArrayList<Put>();" before the table put.
> >
> > On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <[EMAIL PROTECTED]> wrote:
> >
> > > Dear N Keywal,
> > >
> > > Thanks so much for your reply!
> > >
> > > The total amount of data is about 110M. The available memory is
> > > enough,
> > 2G.
> > >
> > > In Java, I just set a collection to NULL to collect garbage. Do you
> > > think it is fine?
> > >
> > > Best regards,
> > > Bing
> > >
> > >
> > > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <[EMAIL PROTECTED]> wrote:
> > >
> > >> Hi Bing,
> > >>
> > >> You should expect HBase to be slower in the generic case:
> > >> 1) it writes much more data (see hbase data model), with extra
> > >> columns qualifiers, timestamps & so on.
> > >> 2) the data is written multiple times: once in the write-ahead-log,
> > >> once per replica on datanode & so on again.
> > >> 3) there are inter process calls & inter machine calls on the
> > >> critical path.
> > >>
> > >> This is the cost of the atomicity, reliability and scalability
> features.
> > >> With these features in mind, HBase is reasonably fast to save data
> > >> on a cluster.
> > >>
> > >> On your specific case (without the points 2 & 3 above), the
> > >> performance seems to be very bad.
> > >>
> > >> You should first look at:
> > >> - how much is spent in the put vs. preparing the list
> > >> - do you have garbage collection going on? even swap?
> > >> - what's the size of your final Array vs. the available memory?
> > >>
> > >> Cheers,
> > >>
> > >> N.
> > >>
> > >>
> > >>
> > >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <[EMAIL PROTECTED]> wrote:
> > >>
> > >>> Dear all,
> > >>>
> > >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> > >>>
> > >>> Best regards,
> > >>> Bing
> > >>>
> > >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <[EMAIL PROTECTED]> wrote:
> > >>>
> > >>> > Dear all,
> > >>> >
> > >>> > According to my experiences, it is very slow for HBase to save
> data?
> > >>> Am I
> > >>> > right?
> > >>> >
> > >>> > For example, today I need to save data in a HashMap to HBase. It
> > >>> > took about more than three hours. However when saving the same
> > >>> > HashMap in
> > a
> > >>> file
> > >>> > in the text format with the redirected System.out, it took only
> > >>> > 4.5
> > >>> seconds!
> > >>> >
> > >>> > Why is HBase so slow? It is indexing?
> > >>> >
> > >>> > My code to save data in HBase is as follows. I think the code
> > >>> > must be correct.
> > >>> >
> > >>> >         ......
> > >>> >         public synchronized void
> > >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> > >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int