Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Hbase scans taking a lot of time


Copy link to this message
-
Re: Hbase scans taking a lot of time


Try to use caching for query
Regards
§
Shashwat Shriparv
Sent from Samsung GalaxyJean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:You're better to put the data based on the way you will access it.

If you always read data from columns A, B, C and D together, then
bundle them in a single column. And all of that in a single CF...

JM

2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>:
> This is what I think, Sorry for my ignorance.
>
> I want to use the property of Hbase( i.e columnar DB) so that only the
> required columns are accessed. For this I kept a large number of column
> families.
>
> But I am still not understanding....what is happening as there is no disk
> I/O only High CPU and some network activity.
> Why is the scan taking more time than the time to populate the Hbase.
>
> -Vibhav
>
>
> On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Vibhav,
>>
>> Do you really need 13 diffefent columns familly? Can't you find a way
>> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument
>> name?
>>
>> That might help...
>>
>> JM
>>
>> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>:
>> > The number of column families I have is 13, which I guess is okie?
>> >
>> > -Vibhav
>> >
>> >
>> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote:
>> >
>> >> You'll have this problem if you have a large number of column families
>> >> being scanned/populated at the same time. Make sure the data you
>> >> scan/populate frequently are in the same column family (you can have
>> many
>> >> columns in a column family). Unlike BigTable/Hypertable which has the
>> >> concept of locality/access groups, HBase always stores column families
>> in
>> >> separate files, essentially making column family not only a logic
>> >> grouping
>> >> mechanism but also a physical locality group.
>> >>
>> >>
>> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> > I am facing a very strange problem with HBase.
>> >> >
>> >> > This what I did:
>> >> > a) Create a table, using pre partioned splits.
>> >> > b) Also the column familes are zipped with lzo compression.
>> >> > c) Using the above configuration I am able to populate 2 million row
>> >> > per
>> >> > min in the Hbase.
>> >> > d) I have created a table with 300 million odd rows, which roughy
>> >> > took
>> >> me 3
>> >> > hours to populate and the data size is of 25GB.
>> >> >
>> >> > e) But when I query for data the performance I am getting is very
>> >> > bad.
>> >> >    Basically this is what I am seeing: High CPU, no disk I/O and
>> >> > network
>> >> > I/O is happening at the rate of 6~7MB secs.
>> >> >
>> >> >
>> >> > Because of this, if I scan the entries of the table using Hive it is
>> >> taking
>> >> > ages.
>> >> > Basically it is taking around 24 hours to scan the table. Any idea,
>> >> > of
>> >> how
>> >> > to debug.
>> >> >
>> >> >
>> >> > -Vibhav
>> >> >
>> >>
>> >
>>
>