Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> split table data into two or more tables


+
alxsss@... 2013-02-08, 18:36
+
Kevin Odell 2013-02-08, 18:41
+
Ted Yu 2013-02-08, 18:40
+
Ted Yu 2013-02-08, 18:51
+
alxsss@... 2013-02-08, 18:59
+
Marcos Ortiz 2013-02-08, 14:52
+
alxsss@... 2013-02-08, 22:16
+
Ted Yu 2013-02-08, 19:04
+
Ted Yu 2013-02-08, 19:22
+
alxsss@... 2013-02-09, 01:47
+
Ted Yu 2013-02-09, 02:22
+
ramkrishna vasudevan 2013-02-09, 10:42
Copy link to this message
-
Re: split table data into two or more tables
Hello,

I see 0.94.5 has already been released, so wondered how can I solve the issue that we have. In more detail we have a table with billions of records. Most of the mapreduce job that we run select from this table records that has a family mk with a given value. For example,

get 'mytable' ,'row1', 'mk'
COLUMN                               CELL
 mk:_genmrk_                         timestamp=1360869679003, value=1360869340-1376304115
 mk:_updmrk_                         timestamp=1360869376272, value=1360869340-1376304115
 mk:dist            

Map of a mapreduce job goes over all records and checks if _genmrk_ is equal to the given value. So, my question is that is it possible to select all records with mk:_genmrk_ =myvalue and feed them to map of mapreduce job instead of iterating over all records?
Thanks in advance.
Alex.

 

 

 

-----Original Message-----
From: Ted Yu <[EMAIL PROTECTED]>
To: user <[EMAIL PROTECTED]>
Sent: Fri, Feb 8, 2013 6:23 pm
Subject: Re: split table data into two or more tables
See the following javadoc in Scan.java:

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.
You can search for the above method in unit tests.

In your use case, is family f the only family ?
If not, take a look at HBASE-5416 which is coming in 0.94.5
family f would be the essential column.

Cheers

On Fri, Feb 8, 2013 at 5:47 PM, <[EMAIL PROTECTED]> wrote:

> Hi,
>
> Thanks for suggestions. How a time range scan can be implemented in java
> code. Is there any sample code or tutorials?
> Also, is it possible to select by a value of a column? Let say I know that
> records has family f and column m, and new records has m=5. I need to
> instruct hbase to send only these records to the mapper of mapred jobs.
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Ted Yu <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Fri, Feb 8, 2013 11:05 am
> Subject: Re: split table data into two or more tables
>
>
> bq. in a cluster of 2 nodes +1 master
> I assume you're limited by hardware in the regard.
>
> bq. job selects these new records
> Have you used time-range scan ?
>
> Cheers
>
> On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > The rationale is that I have a mapred job that adds new records to an
> > hbase table, constantly.
> > The next mapred job selects these new records, but it must iterate over
> > all records and check if it is a candidate for selection.
> > Since there are too many old records iterating though them in a cluster
> of
> > 2 nodes +1 master takes about 2 days. So I thought, splitting them into
> two
> > tables must reduce this time, and as soon as I figure out that there is
> no
> > more new record left in one of the new tables I will not run mapred job
> on
> > it.
> >
> > Currently, we have 7 regions including ROOT and META.
> >
> >
> > Thanks.
> > Alex.
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Ted Yu <[EMAIL PROTECTED]>
> > To: user <[EMAIL PROTECTED]>
> > Sent: Fri, Feb 8, 2013 10:40 am
> > Subject: Re: split table data into two or more tables
> >
> >
> > May I ask the rationale behind this ?
> > Were you aiming for higher write throughput ?
> >
> > Please also tell us how many regions you have in the current table.
> >
> > Thanks
> >
> > BTW please consider upgrading to 0.94.4
> >
> > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote:
> >
> > > Hello,
> > >
> > > I wondered if there is a way of splitting data from one table into two
> or
> > > more tables in hbase with iidentical schemas, i.e. if table A has 100M
> > > records put 50M into table B, 50M into table C and delete table A.
> > > Currently, I use hbase-0.92.1 and hadoop-1.4.0
> > >
> > > Thanks.
> > > Alex.
> > >
> >
> >
> >
>
>
>

 
+
ramkrishna vasudevan 2013-02-21, 04:29