Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Re: split table data into two or more tables


Copy link to this message
-
Re: split table data into two or more tables

On 02/08/2013 01:59 PM, [EMAIL PROTECTED] wrote:
> Hi,
>
> The rationale is that I have a mapred job that adds new records to an hbase table, constantly.
> The next mapred job selects these new records, but it must iterate over all records and check if it is a candidate for selection.
> Since there are too many old records iterating though them in a cluster of 2 nodes +1 master takes about 2 days. So I thought, splitting them into two tables must reduce this time, and as soon as I figure out that there is no more new record left in one of the new tables I will not run mapred job on it.
This use-case is very common and a good practice here is to pre-split
the regions to control exactly where to put your data and the size of
it, keeping
always the numbers of regions more manageable.
>
> Currently, we have 7 regions including ROOT and META.
Can you share your conf/hbase-site.xml ?

>
>
> Thanks.
> Alex.
>
>
>  
>
>  
>
> -----Original Message-----
> From: Ted Yu <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Fri, Feb 8, 2013 10:40 am
> Subject: Re: split table data into two or more tables
>
>
> May I ask the rationale behind this ?
> Were you aiming for higher write throughput ?
>
> Please also tell us how many regions you have in the current table.
>
> Thanks
>
> BTW please consider upgrading to 0.94.4
>
> On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote:
>
>> Hello,
>>
>> I wondered if there is a way of splitting data from one table into two or
>> more tables in hbase with iidentical schemas, i.e. if table A has 100M
>> records put 50M into table B, 50M into table C and delete table A.
>> Currently, I use hbase-0.92.1 and hadoop-1.4.0
>>
>> Thanks.
>> Alex.
>>
>  
>

--
Marcos Ortiz Valmaseda,
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB