Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> hash function per table


Copy link to this message
-
Re: hash function per table
What's the performance penalty  when scanning with row prefix filter instead
of with start/end key ?
Can it still work (in reasonable processing time) when the table contains
billions of records ?
On Sun, Mar 20, 2011 at 10:03 PM, Pete Haidinyak <[EMAIL PROTECTED]> wrote:

> I went through this discussion a month or so ago and came away with the
> opinion that you can either have an efficient load with random key but then
> have an inefficient 'scan' not using start and end rows, or have an
> inefficient import with sequential key and then scan using start and end
> rows.
>
> -Pete
>
>
>
> On Sun, 20 Mar 2011 12:52:24 -0700, Oleg Ruchovets <[EMAIL PROTECTED]>
> wrote:
>
>  Actually discussion started from this post:
>>
>>
>>
>> http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+
>>
>> Simply inserting the data in which row key <date>_<somedata> I noticed
>> that
>> only one node works (region to which data were writing). In case we have
>> 10-15 nodes I think it is inefficient to write data to only one region. I
>> want to get an effect that data will be inserted to  as much as possible
>> nodes  simultaneously. Correct me guys ,  but in this case  writing job
>> will take less time , am I write?
>>
>> Oleg.
>>
>> On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <[EMAIL PROTECTED]> wrote:
>>
>>  There is none - HBase uses a total order partitioner. The straight key
>>> value itself determines which region a row is put into. This allows for
>>> very
>>> rapid scans of sequential data, among other things but does mean it is
>>> easier to hotspot regions. Key design is very important.
>>>
>>> -chris
>>>
>>> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
>>>
>>> > the hash function that distributes the rows between the regions.
>>> >
>>> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <[EMAIL PROTECTED]> wrote:
>>> >
>>> >> Hash?  Which hash are you referring to sir?
>>> >> St.Ack
>>> >>
>>> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <[EMAIL PROTECTED]
>>> >
>>> >> wrote:
>>> >>> Hi,
>>> >>> What is the API or configuration for changing the default hash
>>> function
>>> >> for
>>> >>> a specific htable.
>>> >>>
>>> >>> thanks,
>>> >>> Lior
>>> >>>
>>> >>
>>>
>>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB