|
Marcos Ortiz
2013-02-08, 14:52
alxsss@...
2013-02-08, 18:36
Ted Yu
2013-02-08, 18:40
Kevin O'dell
2013-02-08, 18:41
Ted Yu
2013-02-08, 18:51
alxsss@...
2013-02-08, 18:59
Ted Yu
2013-02-08, 19:04
Ted Yu
2013-02-08, 19:22
alxsss@...
2013-02-08, 22:16
alxsss@...
2013-02-09, 01:47
Ted Yu
2013-02-09, 02:22
ramkrishna vasudevan
2013-02-09, 10:42
alxsss@...
2013-02-21, 02:27
ramkrishna vasudevan
2013-02-21, 04:29
|
-
Re: split table data into two or more tablesMarcos Ortiz 2013-02-08, 14:52
On 02/08/2013 01:59 PM, [EMAIL PROTECTED] wrote: > Hi, > > The rationale is that I have a mapred job that adds new records to an hbase table, constantly. > The next mapred job selects these new records, but it must iterate over all records and check if it is a candidate for selection. > Since there are too many old records iterating though them in a cluster of 2 nodes +1 master takes about 2 days. So I thought, splitting them into two tables must reduce this time, and as soon as I figure out that there is no more new record left in one of the new tables I will not run mapred job on it. This use-case is very common and a good practice here is to pre-split the regions to control exactly where to put your data and the size of it, keeping always the numbers of regions more manageable. > > Currently, we have 7 regions including ROOT and META. Can you share your conf/hbase-site.xml ? > > > Thanks. > Alex. > > > > > > > -----Original Message----- > From: Ted Yu <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Sent: Fri, Feb 8, 2013 10:40 am > Subject: Re: split table data into two or more tables > > > May I ask the rationale behind this ? > Were you aiming for higher write throughput ? > > Please also tell us how many regions you have in the current table. > > Thanks > > BTW please consider upgrading to 0.94.4 > > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I wondered if there is a way of splitting data from one table into two or >> more tables in hbase with iidentical schemas, i.e. if table A has 100M >> records put 50M into table B, 50M into table C and delete table A. >> Currently, I use hbase-0.92.1 and hadoop-1.4.0 >> >> Thanks. >> Alex. >> > > -- Marcos Ortiz Valmaseda, Product Manager && Data Scientist at UCI Blog: http://marcosluis2186.posterous.com Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>
-
split table data into two or more tablesalxsss@... 2013-02-08, 18:36
Hello,
I wondered if there is a way of splitting data from one table into two or more tables in hbase with iidentical schemas, i.e. if table A has 100M records put 50M into table B, 50M into table C and delete table A. Currently, I use hbase-0.92.1 and hadoop-1.4.0 Thanks. Alex.
-
Re: split table data into two or more tablesTed Yu 2013-02-08, 18:40
May I ask the rationale behind this ?
Were you aiming for higher write throughput ? Please also tell us how many regions you have in the current table. Thanks BTW please consider upgrading to 0.94.4 On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > Hello, > > I wondered if there is a way of splitting data from one table into two or > more tables in hbase with iidentical schemas, i.e. if table A has 100M > records put 50M into table B, 50M into table C and delete table A. > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > Thanks. > Alex. >
-
Re: split table data into two or more tablesKevin O'dell 2013-02-08, 18:41
Alex,
Your best bet would be to do this through either MapReduce or Happybase(python). There is not an innate way to handle that through the shell. On Fri, Feb 8, 2013 at 1:36 PM, <[EMAIL PROTECTED]> wrote: > Hello, > > I wondered if there is a way of splitting data from one table into two or > more tables in hbase with iidentical schemas, i.e. if table A has 100M > records put 50M into table B, 50M into table C and delete table A. > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > Thanks. > Alex. > -- Kevin O'Dell Customer Operations Engineer, Cloudera
-
Re: split table data into two or more tablesTed Yu 2013-02-08, 18:51
BTW I think hadoop-1.4.0 was a typo: it should be 1.0.4
On Fri, Feb 8, 2013 at 10:40 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > May I ask the rationale behind this ? > Were you aiming for higher write throughput ? > > Please also tell us how many regions you have in the current table. > > Thanks > > BTW please consider upgrading to 0.94.4 > > > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I wondered if there is a way of splitting data from one table into two or >> more tables in hbase with iidentical schemas, i.e. if table A has 100M >> records put 50M into table B, 50M into table C and delete table A. >> Currently, I use hbase-0.92.1 and hadoop-1.4.0 >> >> Thanks. >> Alex. >> > >
-
Re: split table data into two or more tablesalxsss@... 2013-02-08, 18:59
Hi,
The rationale is that I have a mapred job that adds new records to an hbase table, constantly. The next mapred job selects these new records, but it must iterate over all records and check if it is a candidate for selection. Since there are too many old records iterating though them in a cluster of 2 nodes +1 master takes about 2 days. So I thought, splitting them into two tables must reduce this time, and as soon as I figure out that there is no more new record left in one of the new tables I will not run mapred job on it. Currently, we have 7 regions including ROOT and META. Thanks. Alex. -----Original Message----- From: Ted Yu <[EMAIL PROTECTED]> To: user <[EMAIL PROTECTED]> Sent: Fri, Feb 8, 2013 10:40 am Subject: Re: split table data into two or more tables May I ask the rationale behind this ? Were you aiming for higher write throughput ? Please also tell us how many regions you have in the current table. Thanks BTW please consider upgrading to 0.94.4 On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > Hello, > > I wondered if there is a way of splitting data from one table into two or > more tables in hbase with iidentical schemas, i.e. if table A has 100M > records put 50M into table B, 50M into table C and delete table A. > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > Thanks. > Alex. >
-
Re: split table data into two or more tablesTed Yu 2013-02-08, 19:04
bq. in a cluster of 2 nodes +1 master
I assume you're limited by hardware in the regard. bq. job selects these new records Have you used time-range scan ? Cheers On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > Hi, > > The rationale is that I have a mapred job that adds new records to an > hbase table, constantly. > The next mapred job selects these new records, but it must iterate over > all records and check if it is a candidate for selection. > Since there are too many old records iterating though them in a cluster of > 2 nodes +1 master takes about 2 days. So I thought, splitting them into two > tables must reduce this time, and as soon as I figure out that there is no > more new record left in one of the new tables I will not run mapred job on > it. > > Currently, we have 7 regions including ROOT and META. > > > Thanks. > Alex. > > > > > > > -----Original Message----- > From: Ted Yu <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Sent: Fri, Feb 8, 2013 10:40 am > Subject: Re: split table data into two or more tables > > > May I ask the rationale behind this ? > Were you aiming for higher write throughput ? > > Please also tell us how many regions you have in the current table. > > Thanks > > BTW please consider upgrading to 0.94.4 > > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > I wondered if there is a way of splitting data from one table into two or > > more tables in hbase with iidentical schemas, i.e. if table A has 100M > > records put 50M into table B, 50M into table C and delete table A. > > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > > > Thanks. > > Alex. > > > > >
-
Re: split table data into two or more tablesTed Yu 2013-02-08, 19:22
In 0.94, there is optimization in StoreFileScanner.requestSeek() where a
real seek is only done when seekTimestamp > maxTimestampInFile. I suggest upgrading to 0.94.4 so that you can utilize this facility. On Fri, Feb 8, 2013 at 11:04 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > bq. in a cluster of 2 nodes +1 master > I assume you're limited by hardware in the regard. > > bq. job selects these new records > Have you used time-range scan ? > > Cheers > > > On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> The rationale is that I have a mapred job that adds new records to an >> hbase table, constantly. >> The next mapred job selects these new records, but it must iterate over >> all records and check if it is a candidate for selection. >> Since there are too many old records iterating though them in a cluster >> of 2 nodes +1 master takes about 2 days. So I thought, splitting them into >> two tables must reduce this time, and as soon as I figure out that there is >> no more new record left in one of the new tables I will not run mapred job >> on it. >> >> Currently, we have 7 regions including ROOT and META. >> >> >> Thanks. >> Alex. >> >> >> >> >> >> >> -----Original Message----- >> From: Ted Yu <[EMAIL PROTECTED]> >> To: user <[EMAIL PROTECTED]> >> Sent: Fri, Feb 8, 2013 10:40 am >> Subject: Re: split table data into two or more tables >> >> >> May I ask the rationale behind this ? >> Were you aiming for higher write throughput ? >> >> Please also tell us how many regions you have in the current table. >> >> Thanks >> >> BTW please consider upgrading to 0.94.4 >> >> On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: >> >> > Hello, >> > >> > I wondered if there is a way of splitting data from one table into two >> or >> > more tables in hbase with iidentical schemas, i.e. if table A has 100M >> > records put 50M into table B, 50M into table C and delete table A. >> > Currently, I use hbase-0.92.1 and hadoop-1.4.0 >> > >> > Thanks. >> > Alex. >> > >> >> >> >
-
Re: split table data into two or more tablesalxsss@... 2013-02-08, 22:16
Hi, here is the hbase-site.xml file. <property> <name>hbase.hregion.majorcompaction</name> <value>0</value> </property> <property> <name>hbase.regionserver.codecs</name> <value>snappy,gz</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave,serverslave</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>dfs.support.append</name> <value>true</value> </property> <property> <name>hbase.hregion.memstore.mslab.enabled</name> <value>true</value> </property> <property> <name>hbase.regionserver.handler.count</name> <value>40</value> </property> <property> <name>hbase.regionserver.global.memstore.upperLimit</name> <value>0.45</value> </property> <property> <name>hbase.regionserver.global.memstore.lowerLimit</name> <value>0.4</value> </property> <property> <name>hfile.block.cache.size</name> <value>0.3</value> </property> <property> <name>mapred.map.tasks.speculative.execution</name> <value>false</value> </property> <property> <name>mapred.reduce.tasks.speculative.execution</name> <value>false</value> </property> <!-- default is 256MB 268435456, this is 1.5GB --> <property> <name>hbase.hregion.max.filesize</name> <value>161061273600</value> </property> <!-- default is 2 --> <property> <name>hbase.hregion.memstore.block.multiplier</name> <value>4</value> </property> <!-- default is 64MB 67108864 --> <property> <name>hbase.hregion.memstore.flush.size</name> <value>134217728</value> </property> <!-- default is 7, should be at least 2x compactionThreshold --> <property> <name>hbase.hstore.blockingStoreFiles</name> <value>200</value> </property> <property> <name>hbase.regionserver.lease.period</name> <value>1800000</value> <!-- 30 minutes --> </property> <property> <name>hbase.rpc.timeout</name> <value>1800000</value> <!-- 30 minutes --> </property> Thanks. Alex. -----Original Message----- From: Marcos Ortiz <[EMAIL PROTECTED]> To: alxsss <[EMAIL PROTECTED]> Cc: user <[EMAIL PROTECTED]> Sent: Fri, Feb 8, 2013 11:52 am Subject: Re: split table data into two or more tables On 02/08/2013 01:59 PM, [EMAIL PROTECTED] wrote: Hi, The rationale is that I have a mapred job that adds new records to an hbase table, constantly. The next mapred job selects these new records, but it must iterate over all records and check if it is a candidate for selection. Since there are too many old records iterating though them in a cluster of 2 nodes +1 master takes about 2 days. So I thought, splitting them into two tables must reduce this time, and as soon as I figure out that there is no more new record left in one of the new tables I will not run mapred job on it. This use-case is very common and a good practice here is to pre-split the regions to control exactly where to put your data and the size of it, keeping always the numbers of regions more manageable. Currently, we have 7 regions including ROOT and META. Can you share your conf/hbase-site.xml ? Thanks. Alex. -----Original Message----- From: Ted Yu <[EMAIL PROTECTED]> To: user <[EMAIL PROTECTED]> Sent: Fri, Feb 8, 2013 10:40 am Subject: Re: split table data into two or more tables May I ask the rationale behind this ? Were you aiming for higher write throughput ? Please also tell us how many regions you have in the current table. Thanks BTW please consider upgrading to 0.94.4 On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: Hello, I wondered if there is a way of splitting data from one table into two or more tables in hbase with iidentical schemas, i.e. if table A has 100M records put 50M into table B, 50M into table C and delete table A. Currently, I use hbase-0.92.1 and hadoop-1.4.0 Thanks. Alex. Marcos Ortiz Valmaseda, Product Manager && Data Scientist at UCI Blog: http://marcosluis2186.posterous.com Twitter: @marcosluis2186
-
Re: split table data into two or more tablesalxsss@... 2013-02-09, 01:47
Hi,
Thanks for suggestions. How a time range scan can be implemented in java code. Is there any sample code or tutorials? Also, is it possible to select by a value of a column? Let say I know that records has family f and column m, and new records has m=5. I need to instruct hbase to send only these records to the mapper of mapred jobs. Thanks. Alex. -----Original Message----- From: Ted Yu <[EMAIL PROTECTED]> To: user <[EMAIL PROTECTED]> Sent: Fri, Feb 8, 2013 11:05 am Subject: Re: split table data into two or more tables bq. in a cluster of 2 nodes +1 master I assume you're limited by hardware in the regard. bq. job selects these new records Have you used time-range scan ? Cheers On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > Hi, > > The rationale is that I have a mapred job that adds new records to an > hbase table, constantly. > The next mapred job selects these new records, but it must iterate over > all records and check if it is a candidate for selection. > Since there are too many old records iterating though them in a cluster of > 2 nodes +1 master takes about 2 days. So I thought, splitting them into two > tables must reduce this time, and as soon as I figure out that there is no > more new record left in one of the new tables I will not run mapred job on > it. > > Currently, we have 7 regions including ROOT and META. > > > Thanks. > Alex. > > > > > > > -----Original Message----- > From: Ted Yu <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Sent: Fri, Feb 8, 2013 10:40 am > Subject: Re: split table data into two or more tables > > > May I ask the rationale behind this ? > Were you aiming for higher write throughput ? > > Please also tell us how many regions you have in the current table. > > Thanks > > BTW please consider upgrading to 0.94.4 > > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > I wondered if there is a way of splitting data from one table into two or > > more tables in hbase with iidentical schemas, i.e. if table A has 100M > > records put 50M into table B, 50M into table C and delete table A. > > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > > > Thanks. > > Alex. > > > > >
-
Re: split table data into two or more tablesTed Yu 2013-02-09, 02:22
See the following javadoc in Scan.java:
* To only retrieve columns within a specific range of version timestamps, * execute {@link #setTimeRange(long, long) setTimeRange}. You can search for the above method in unit tests. In your use case, is family f the only family ? If not, take a look at HBASE-5416 which is coming in 0.94.5 family f would be the essential column. Cheers On Fri, Feb 8, 2013 at 5:47 PM, <[EMAIL PROTECTED]> wrote: > Hi, > > Thanks for suggestions. How a time range scan can be implemented in java > code. Is there any sample code or tutorials? > Also, is it possible to select by a value of a column? Let say I know that > records has family f and column m, and new records has m=5. I need to > instruct hbase to send only these records to the mapper of mapred jobs. > > Thanks. > Alex. > > > > > > > > -----Original Message----- > From: Ted Yu <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Sent: Fri, Feb 8, 2013 11:05 am > Subject: Re: split table data into two or more tables > > > bq. in a cluster of 2 nodes +1 master > I assume you're limited by hardware in the regard. > > bq. job selects these new records > Have you used time-range scan ? > > Cheers > > On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > The rationale is that I have a mapred job that adds new records to an > > hbase table, constantly. > > The next mapred job selects these new records, but it must iterate over > > all records and check if it is a candidate for selection. > > Since there are too many old records iterating though them in a cluster > of > > 2 nodes +1 master takes about 2 days. So I thought, splitting them into > two > > tables must reduce this time, and as soon as I figure out that there is > no > > more new record left in one of the new tables I will not run mapred job > on > > it. > > > > Currently, we have 7 regions including ROOT and META. > > > > > > Thanks. > > Alex. > > > > > > > > > > > > > > -----Original Message----- > > From: Ted Yu <[EMAIL PROTECTED]> > > To: user <[EMAIL PROTECTED]> > > Sent: Fri, Feb 8, 2013 10:40 am > > Subject: Re: split table data into two or more tables > > > > > > May I ask the rationale behind this ? > > Were you aiming for higher write throughput ? > > > > Please also tell us how many regions you have in the current table. > > > > Thanks > > > > BTW please consider upgrading to 0.94.4 > > > > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > > > > > Hello, > > > > > > I wondered if there is a way of splitting data from one table into two > or > > > more tables in hbase with iidentical schemas, i.e. if table A has 100M > > > records put 50M into table B, 50M into table C and delete table A. > > > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > > > > > Thanks. > > > Alex. > > > > > > > > > > > >
-
Re: split table data into two or more tablesramkrishna vasudevan 2013-02-09, 10:42
To your question regarding if you can write a mapper that sends only the
columns that you need: Yes of course you can do it. See the example in Importer.java. It shows you how a simple copytable can be implemented. Use a similar way but before creating the new put for the new table, just check the KVs and then decide. Hope this helps. Regards Ram On Sat, Feb 9, 2013 at 7:52 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > See the following javadoc in Scan.java: > > * To only retrieve columns within a specific range of version timestamps, > > * execute {@link #setTimeRange(long, long) setTimeRange}. > You can search for the above method in unit tests. > > In your use case, is family f the only family ? > If not, take a look at HBASE-5416 which is coming in 0.94.5 > family f would be the essential column. > > Cheers > > On Fri, Feb 8, 2013 at 5:47 PM, <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > Thanks for suggestions. How a time range scan can be implemented in java > > code. Is there any sample code or tutorials? > > Also, is it possible to select by a value of a column? Let say I know > that > > records has family f and column m, and new records has m=5. I need to > > instruct hbase to send only these records to the mapper of mapred jobs. > > > > Thanks. > > Alex. > > > > > > > > > > > > > > > > -----Original Message----- > > From: Ted Yu <[EMAIL PROTECTED]> > > To: user <[EMAIL PROTECTED]> > > Sent: Fri, Feb 8, 2013 11:05 am > > Subject: Re: split table data into two or more tables > > > > > > bq. in a cluster of 2 nodes +1 master > > I assume you're limited by hardware in the regard. > > > > bq. job selects these new records > > Have you used time-range scan ? > > > > Cheers > > > > On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > The rationale is that I have a mapred job that adds new records to an > > > hbase table, constantly. > > > The next mapred job selects these new records, but it must iterate over > > > all records and check if it is a candidate for selection. > > > Since there are too many old records iterating though them in a cluster > > of > > > 2 nodes +1 master takes about 2 days. So I thought, splitting them into > > two > > > tables must reduce this time, and as soon as I figure out that there is > > no > > > more new record left in one of the new tables I will not run mapred job > > on > > > it. > > > > > > Currently, we have 7 regions including ROOT and META. > > > > > > > > > Thanks. > > > Alex. > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Ted Yu <[EMAIL PROTECTED]> > > > To: user <[EMAIL PROTECTED]> > > > Sent: Fri, Feb 8, 2013 10:40 am > > > Subject: Re: split table data into two or more tables > > > > > > > > > May I ask the rationale behind this ? > > > Were you aiming for higher write throughput ? > > > > > > Please also tell us how many regions you have in the current table. > > > > > > Thanks > > > > > > BTW please consider upgrading to 0.94.4 > > > > > > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > > > > > > > Hello, > > > > > > > > I wondered if there is a way of splitting data from one table into > two > > or > > > > more tables in hbase with iidentical schemas, i.e. if table A has > 100M > > > > records put 50M into table B, 50M into table C and delete table A. > > > > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > > > > > > > Thanks. > > > > Alex. > > > > > > > > > > > > > > > > > > > >
-
Re: split table data into two or more tablesalxsss@... 2013-02-21, 02:27
Hello,
I see 0.94.5 has already been released, so wondered how can I solve the issue that we have. In more detail we have a table with billions of records. Most of the mapreduce job that we run select from this table records that has a family mk with a given value. For example, get 'mytable' ,'row1', 'mk' COLUMN CELL mk:_genmrk_ timestamp=1360869679003, value=1360869340-1376304115 mk:_updmrk_ timestamp=1360869376272, value=1360869340-1376304115 mk:dist Map of a mapreduce job goes over all records and checks if _genmrk_ is equal to the given value. So, my question is that is it possible to select all records with mk:_genmrk_ =myvalue and feed them to map of mapreduce job instead of iterating over all records? Thanks in advance. Alex. -----Original Message----- From: Ted Yu <[EMAIL PROTECTED]> To: user <[EMAIL PROTECTED]> Sent: Fri, Feb 8, 2013 6:23 pm Subject: Re: split table data into two or more tables See the following javadoc in Scan.java: * To only retrieve columns within a specific range of version timestamps, * execute {@link #setTimeRange(long, long) setTimeRange}. You can search for the above method in unit tests. In your use case, is family f the only family ? If not, take a look at HBASE-5416 which is coming in 0.94.5 family f would be the essential column. Cheers On Fri, Feb 8, 2013 at 5:47 PM, <[EMAIL PROTECTED]> wrote: > Hi, > > Thanks for suggestions. How a time range scan can be implemented in java > code. Is there any sample code or tutorials? > Also, is it possible to select by a value of a column? Let say I know that > records has family f and column m, and new records has m=5. I need to > instruct hbase to send only these records to the mapper of mapred jobs. > > Thanks. > Alex. > > > > > > > > -----Original Message----- > From: Ted Yu <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Sent: Fri, Feb 8, 2013 11:05 am > Subject: Re: split table data into two or more tables > > > bq. in a cluster of 2 nodes +1 master > I assume you're limited by hardware in the regard. > > bq. job selects these new records > Have you used time-range scan ? > > Cheers > > On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > The rationale is that I have a mapred job that adds new records to an > > hbase table, constantly. > > The next mapred job selects these new records, but it must iterate over > > all records and check if it is a candidate for selection. > > Since there are too many old records iterating though them in a cluster > of > > 2 nodes +1 master takes about 2 days. So I thought, splitting them into > two > > tables must reduce this time, and as soon as I figure out that there is > no > > more new record left in one of the new tables I will not run mapred job > on > > it. > > > > Currently, we have 7 regions including ROOT and META. > > > > > > Thanks. > > Alex. > > > > > > > > > > > > > > -----Original Message----- > > From: Ted Yu <[EMAIL PROTECTED]> > > To: user <[EMAIL PROTECTED]> > > Sent: Fri, Feb 8, 2013 10:40 am > > Subject: Re: split table data into two or more tables > > > > > > May I ask the rationale behind this ? > > Were you aiming for higher write throughput ? > > > > Please also tell us how many regions you have in the current table. > > > > Thanks > > > > BTW please consider upgrading to 0.94.4 > > > > On Fri, Feb 8, 2013 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > > > > > Hello, > > > > > > I wondered if there is a way of splitting data from one table into two > or > > > more tables in hbase with iidentical schemas, i.e. if table A has 100M > > > records put 50M into table B, 50M into table C and delete table A. > > > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > > > > > Thanks. > > > Alex. > > > > > > > > > > > >
-
Re: split table data into two or more tablesramkrishna vasudevan 2013-02-21, 04:29
The Import.java in the package org.apache.hadoop.hbase.mapreduce. This
comes along with the src code. Have you tried the option of using the SingleColumnValueFilter. One thing you need to note that the if you are going for a search on the entire table then all the regions has to be scanned but using this filter will return only the rows that satisfy the specified condition, but as you are trying go with Mapreduce these mapper tasks run paralleley on the regions. Regards Ram On Thu, Feb 21, 2013 at 7:57 AM, <[EMAIL PROTECTED]> wrote: > Hello, > > I see 0.94.5 has already been released, so wondered how can I solve the > issue that we have. In more detail we have a table with billions of > records. Most of the mapreduce job that we run select from this table > records that has a family mk with a given value. For example, > > get 'mytable' ,'row1', 'mk' > COLUMN CELL > mk:_genmrk_ timestamp=1360869679003, > value=1360869340-1376304115 > mk:_updmrk_ timestamp=1360869376272, > value=1360869340-1376304115 > mk:dist > > Map of a mapreduce job goes over all records and checks if _genmrk_ is > equal to the given value. So, my question is that is it possible to select > all records with mk:_genmrk_ =myvalue and feed them to map of mapreduce job > instead of iterating over all records? > > > Thanks in advance. > Alex. > > > > > > > > > > -----Original Message----- > From: Ted Yu <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Sent: Fri, Feb 8, 2013 6:23 pm > Subject: Re: split table data into two or more tables > > > See the following javadoc in Scan.java: > > * To only retrieve columns within a specific range of version timestamps, > > * execute {@link #setTimeRange(long, long) setTimeRange}. > You can search for the above method in unit tests. > > In your use case, is family f the only family ? > If not, take a look at HBASE-5416 which is coming in 0.94.5 > family f would be the essential column. > > Cheers > > On Fri, Feb 8, 2013 at 5:47 PM, <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > Thanks for suggestions. How a time range scan can be implemented in java > > code. Is there any sample code or tutorials? > > Also, is it possible to select by a value of a column? Let say I know > that > > records has family f and column m, and new records has m=5. I need to > > instruct hbase to send only these records to the mapper of mapred jobs. > > > > Thanks. > > Alex. > > > > > > > > > > > > > > > > -----Original Message----- > > From: Ted Yu <[EMAIL PROTECTED]> > > To: user <[EMAIL PROTECTED]> > > Sent: Fri, Feb 8, 2013 11:05 am > > Subject: Re: split table data into two or more tables > > > > > > bq. in a cluster of 2 nodes +1 master > > I assume you're limited by hardware in the regard. > > > > bq. job selects these new records > > Have you used time-range scan ? > > > > Cheers > > > > On Fri, Feb 8, 2013 at 10:59 AM, <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > The rationale is that I have a mapred job that adds new records to an > > > hbase table, constantly. > > > The next mapred job selects these new records, but it must iterate over > > > all records and check if it is a candidate for selection. > > > Since there are too many old records iterating though them in a cluster > > of > > > 2 nodes +1 master takes about 2 days. So I thought, splitting them into > > two > > > tables must reduce this time, and as soon as I figure out that there is > > no > > > more new record left in one of the new tables I will not run mapred job > > on > > > it. > > > > > > Currently, we have 7 regions including ROOT and META. > > > > > > > > > Thanks. > > > Alex. > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Ted Yu <[EMAIL PROTECTED]> > > > To: user <[EMAIL PROTECTED]> > > > Sent: Fri, Feb 8, 2013 10:40 am > > > Subject: Re: split table data into two or more tables > > > > > > > > > May I ask the rationale behind this ? |