Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - larger HFile block size for very wide row?


Copy link to this message
-
Re: larger HFile block size for very wide row?
Wei Tan 2014-01-29, 22:41
Hi Ted and Vladimir, thanks!

I was thinking if using index is a good idea. My scan/get criteria is
something like "get all rows I inserted since end of yesterday". I may
have to use MapReduce + timeRange filter.

Lars and all, I will try to report back some performance data later.
Thanks for the help from you all.

Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan

From:   Ted Yu <[EMAIL PROTECTED]>
To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
Date:   01/29/2014 04:37 PM
Subject:        Re: larger HFile block size for very wide row?

bq. table:family2 holds only row keys (no data) from  table:family1.

Wei:
You can designate family2 as essential column family so that family1 is
brought into heap when needed.
On Wed, Jan 29, 2014 at 1:33 PM, Vladimir Rodionov
<[EMAIL PROTECTED]>wrote:

> Yes, your row will be split by KV boundaries - no need to increase
default
> block size, except, probably, performance.
> You will need to try different sizes to find optimal performance in your
> use case.
> I would not use combination of scan & get on the same table:family with
> very large rows.
> Either some kind of secondary indexing is needed or do scan on different
> family (which has the same row keys)
>
> table:family1 holds original data
> table:family2 holds only row keys (no data) from  table:family1.
> Your scan will be MUCH faster in this case.
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: [EMAIL PROTECTED]
>
> ________________________________________
> From: Wei Tan [[EMAIL PROTECTED]]
> Sent: Wednesday, January 29, 2014 12:52 PM
> To: [EMAIL PROTECTED]
> Subject: Re: larger HFile block size for very wide row?
>
> Sorry, 1000 columns, each 2K, so each row is 2M. I guess HBase will keep
a
> single KV (i.e., a column rather than a row) in a block, so a row will
> span multiple blocks?
>
> My scan pattern is: I will do range scan, find the matching row keys,
and
> fetch the whole row for each row that matches my criteria.
>
> Best regards,
> Wei
>
> ---------------------------------
> Wei Tan, PhD
> Research Staff Member
> IBM T. J. Watson Research Center
> http://researcher.ibm.com/person/us-wtan
>
>
>
> From:   lars hofhansl <[EMAIL PROTECTED]>
> To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
> Date:   01/29/2014 03:49 PM
> Subject:        Re: larger HFile block size for very wide row?
>
>
>
> You 1000 columns? Not 1000k = 1m column, I assume.
> So you'll have 2MB KVs. That's a bit on the large side.
>
> HBase will "grow" the block to fit the KV into it. It means you have
> basically one block per KV.
> I guess you address these rows via point gets (GET), and do not
typically
> scan through them, right?
>
> Do you see any performance issues?
>
> -- Lars
>
>
>
> ________________________________
>  From: Wei Tan <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Wednesday, January 29, 2014 12:35 PM
> Subject: larger HFile block size for very wide row?
>
>
> Hi, I have a HBase table where each row has ~1000k columns, ~2K each. My
> table scan pattern is to use a row key filter but I need to fetch the
> whole row (~1000 k) columns back.
>
> Shall I set HFile block size to be larger than the default 64K?
> Thanks,
> Wei
>
> ---------------------------------
> Wei Tan, PhD
> Research Staff Member
> IBM T. J. Watson Research Center
> http://researcher.ibm.com/person/us-wtan
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to
be
> read only by the individual or entity to whom this message is addressed.
If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any
form,
please