Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> split table data into two or more tables


+
alxsss@... 2013-02-08, 18:36
+
Kevin Odell 2013-02-08, 18:41
+
Ted Yu 2013-02-08, 18:40
+
Ted Yu 2013-02-08, 18:51
+
alxsss@... 2013-02-08, 18:59
+
Marcos Ortiz 2013-02-08, 14:52
+
alxsss@... 2013-02-08, 22:16
+
Ted Yu 2013-02-08, 19:04
+
Ted Yu 2013-02-08, 19:22
+
alxsss@... 2013-02-09, 01:47
Copy link to this message
-
Re: split table data into two or more tables
See the following javadoc in Scan.java:

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.
You can search for the above method in unit tests.

In your use case, is family f the only family ?
If not, take a look at HBASE-5416 which is coming in 0.94.5
family f would be the essential column.

Cheers

On Fri, Feb 8, 2013 at 5:47 PM, <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Thanks for suggestions. How a time range scan can be implemented in java
> code. Is there any sample code or tutorials?
> Also, is it possible to select by a value of a column? Let say I know that
> records has family f and column m, and new records has m=5. I need to
> instruct hbase to send only these records to the mapper of mapred jobs.
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Ted Yu <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Fri, Feb 8, 2013 11:05 am
> Subject: Re: split table data into two or more tables
>
>
> bq. in a cluster of 2 nodes +1 master
> I assume you're limited by hardware in the regard.
>
> bq. job selects these new records
> Have you used time-range scan ?
>
> Cheers
>
> On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > The rationale is that I have a mapred job that adds new records to an
> > hbase table, constantly.
> > The next mapred job selects these new records, but it must iterate over
> > all records and check if it is a candidate for selection.
> > Since there are too many old records iterating though them in a cluster
> of
> > 2 nodes +1 master takes about 2 days. So I thought, splitting them into
> two
> > tables must reduce this time, and as soon as I figure out that there is
> no
> > more new record left in one of the new tables I will not run mapred job
> on
> > it.
> >
> > Currently, we have 7 regions including ROOT and META.
> >
> >
> > Thanks.
> > Alex.
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Ted Yu <[EMAIL PROTECTED]>
> > To: user <[EMAIL PROTECTED]>
> > Sent: Fri, Feb 8, 2013 10:40 am
> > Subject: Re: split table data into two or more tables
> >
> >
> > May I ask the rationale behind this ?
> > Were you aiming for higher write throughput ?
> >
> > Please also tell us how many regions you have in the current table.
> >
> > Thanks
> >
> > BTW please consider upgrading to 0.94.4
> >
> > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote:
> >
> > > Hello,
> > >
> > > I wondered if there is a way of splitting data from one table into two
> or
> > > more tables in hbase with iidentical schemas, i.e. if table A has 100M
> > > records put 50M into table B, 50M into table C and delete table A.
> > > Currently, I use hbase-0.92.1 and hadoop-1.4.0
> > >
> > > Thanks.
> > > Alex.
> > >
> >
> >
> >
>
>
>
+
ramkrishna vasudevan 2013-02-09, 10:42
+
alxsss@... 2013-02-21, 02:27
+
ramkrishna vasudevan 2013-02-21, 04:29
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB