|
Vibhav Mundra
2013-01-25, 09:10
Luke Lu
2013-01-25, 17:31
Vibhav Mundra
2013-01-25, 17:59
Adrien Mogenet
2013-01-25, 18:04
Jean-Marc Spaggiari
2013-01-25, 18:06
Vibhav Mundra
2013-01-25, 18:14
Jean-Marc Spaggiari
2013-01-25, 18:23
lars hofhansl
2013-01-25, 22:00
lars hofhansl
2013-01-25, 23:56
Alok Kumar
2013-01-26, 06:07
Shashwat Shriparv
2013-01-25, 19:13
Vibhav Mundra
2013-01-25, 19:25
Shashwat Shriparv
2013-01-25, 19:31
Vibhav Mundra
2013-01-25, 19:37
|
-
Hbase scans taking a lot of timeVibhav Mundra 2013-01-25, 09:10
I am facing a very strange problem with HBase.
This what I did: a) Create a table, using pre partioned splits. b) Also the column familes are zipped with lzo compression. c) Using the above configuration I am able to populate 2 million row per min in the Hbase. d) I have created a table with 300 million odd rows, which roughy took me 3 hours to populate and the data size is of 25GB. e) But when I query for data the performance I am getting is very bad. Basically this is what I am seeing: High CPU, no disk I/O and network I/O is happening at the rate of 6~7MB secs. Because of this, if I scan the entries of the table using Hive it is taking ages. Basically it is taking around 24 hours to scan the table. Any idea, of how to debug. -Vibhav +
Vibhav Mundra 2013-01-25, 09:10
-
Re: Hbase scans taking a lot of timeLuke Lu 2013-01-25, 17:31
You'll have this problem if you have a large number of column families
being scanned/populated at the same time. Make sure the data you scan/populate frequently are in the same column family (you can have many columns in a column family). Unlike BigTable/Hypertable which has the concept of locality/access groups, HBase always stores column families in separate files, essentially making column family not only a logic grouping mechanism but also a physical locality group. On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: > I am facing a very strange problem with HBase. > > This what I did: > a) Create a table, using pre partioned splits. > b) Also the column familes are zipped with lzo compression. > c) Using the above configuration I am able to populate 2 million row per > min in the Hbase. > d) I have created a table with 300 million odd rows, which roughy took me 3 > hours to populate and the data size is of 25GB. > > e) But when I query for data the performance I am getting is very bad. > Basically this is what I am seeing: High CPU, no disk I/O and network > I/O is happening at the rate of 6~7MB secs. > > > Because of this, if I scan the entries of the table using Hive it is taking > ages. > Basically it is taking around 24 hours to scan the table. Any idea, of how > to debug. > > > -Vibhav > +
Luke Lu 2013-01-25, 17:31
-
Re: Hbase scans taking a lot of timeVibhav Mundra 2013-01-25, 17:59
The number of column families I have is 13, which I guess is okie?
-Vibhav On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: > You'll have this problem if you have a large number of column families > being scanned/populated at the same time. Make sure the data you > scan/populate frequently are in the same column family (you can have many > columns in a column family). Unlike BigTable/Hypertable which has the > concept of locality/access groups, HBase always stores column families in > separate files, essentially making column family not only a logic grouping > mechanism but also a physical locality group. > > > On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: > > > I am facing a very strange problem with HBase. > > > > This what I did: > > a) Create a table, using pre partioned splits. > > b) Also the column familes are zipped with lzo compression. > > c) Using the above configuration I am able to populate 2 million row per > > min in the Hbase. > > d) I have created a table with 300 million odd rows, which roughy took > me 3 > > hours to populate and the data size is of 25GB. > > > > e) But when I query for data the performance I am getting is very bad. > > Basically this is what I am seeing: High CPU, no disk I/O and network > > I/O is happening at the rate of 6~7MB secs. > > > > > > Because of this, if I scan the entries of the table using Hive it is > taking > > ages. > > Basically it is taking around 24 hours to scan the table. Any idea, of > how > > to debug. > > > > > > -Vibhav > > > +
Vibhav Mundra 2013-01-25, 17:59
-
Re: Hbase scans taking a lot of timeAdrien Mogenet 2013-01-25, 18:04
Definitely not, you should keep it under 3 maximum. Keep in mind that 1 CF
== 1 Store == at least that many big files to read. On Fri, Jan 25, 2013 at 6:59 PM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: > The number of column families I have is 13, which I guess is okie? > > -Vibhav > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: > > > You'll have this problem if you have a large number of column families > > being scanned/populated at the same time. Make sure the data you > > scan/populate frequently are in the same column family (you can have many > > columns in a column family). Unlike BigTable/Hypertable which has the > > concept of locality/access groups, HBase always stores column families in > > separate files, essentially making column family not only a logic > grouping > > mechanism but also a physical locality group. > > > > > > On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: > > > > > I am facing a very strange problem with HBase. > > > > > > This what I did: > > > a) Create a table, using pre partioned splits. > > > b) Also the column familes are zipped with lzo compression. > > > c) Using the above configuration I am able to populate 2 million row > per > > > min in the Hbase. > > > d) I have created a table with 300 million odd rows, which roughy took > > me 3 > > > hours to populate and the data size is of 25GB. > > > > > > e) But when I query for data the performance I am getting is very bad. > > > Basically this is what I am seeing: High CPU, no disk I/O and > network > > > I/O is happening at the rate of 6~7MB secs. > > > > > > > > > Because of this, if I scan the entries of the table using Hive it is > > taking > > > ages. > > > Basically it is taking around 24 hours to scan the table. Any idea, of > > how > > > to debug. > > > > > > > > > -Vibhav > > > > > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me +
Adrien Mogenet 2013-01-25, 18:04
-
Re: Hbase scans taking a lot of timeJean-Marc Spaggiari 2013-01-25, 18:06
Hi Vibhav,
Do you really need 13 diffefent columns familly? Can't you find a way to bundle that into 1 or 2 max CF? Maybe by prefixing the colument name? That might help... JM 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: > The number of column families I have is 13, which I guess is okie? > > -Vibhav > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: > >> You'll have this problem if you have a large number of column families >> being scanned/populated at the same time. Make sure the data you >> scan/populate frequently are in the same column family (you can have many >> columns in a column family). Unlike BigTable/Hypertable which has the >> concept of locality/access groups, HBase always stores column families in >> separate files, essentially making column family not only a logic >> grouping >> mechanism but also a physical locality group. >> >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: >> >> > I am facing a very strange problem with HBase. >> > >> > This what I did: >> > a) Create a table, using pre partioned splits. >> > b) Also the column familes are zipped with lzo compression. >> > c) Using the above configuration I am able to populate 2 million row >> > per >> > min in the Hbase. >> > d) I have created a table with 300 million odd rows, which roughy took >> me 3 >> > hours to populate and the data size is of 25GB. >> > >> > e) But when I query for data the performance I am getting is very bad. >> > Basically this is what I am seeing: High CPU, no disk I/O and >> > network >> > I/O is happening at the rate of 6~7MB secs. >> > >> > >> > Because of this, if I scan the entries of the table using Hive it is >> taking >> > ages. >> > Basically it is taking around 24 hours to scan the table. Any idea, of >> how >> > to debug. >> > >> > >> > -Vibhav >> > >> > +
Jean-Marc Spaggiari 2013-01-25, 18:06
-
Re: Hbase scans taking a lot of timeVibhav Mundra 2013-01-25, 18:14
This is what I think, Sorry for my ignorance.
I want to use the property of Hbase( i.e columnar DB) so that only the required columns are accessed. For this I kept a large number of column families. But I am still not understanding....what is happening as there is no disk I/O only High CPU and some network activity. Why is the scan taking more time than the time to populate the Hbase. -Vibhav On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari < [EMAIL PROTECTED]> wrote: > Hi Vibhav, > > Do you really need 13 diffefent columns familly? Can't you find a way > to bundle that into 1 or 2 max CF? Maybe by prefixing the colument > name? > > That might help... > > JM > > 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: > > The number of column families I have is 13, which I guess is okie? > > > > -Vibhav > > > > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: > > > >> You'll have this problem if you have a large number of column families > >> being scanned/populated at the same time. Make sure the data you > >> scan/populate frequently are in the same column family (you can have > many > >> columns in a column family). Unlike BigTable/Hypertable which has the > >> concept of locality/access groups, HBase always stores column families > in > >> separate files, essentially making column family not only a logic > >> grouping > >> mechanism but also a physical locality group. > >> > >> > >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> > wrote: > >> > >> > I am facing a very strange problem with HBase. > >> > > >> > This what I did: > >> > a) Create a table, using pre partioned splits. > >> > b) Also the column familes are zipped with lzo compression. > >> > c) Using the above configuration I am able to populate 2 million row > >> > per > >> > min in the Hbase. > >> > d) I have created a table with 300 million odd rows, which roughy took > >> me 3 > >> > hours to populate and the data size is of 25GB. > >> > > >> > e) But when I query for data the performance I am getting is very bad. > >> > Basically this is what I am seeing: High CPU, no disk I/O and > >> > network > >> > I/O is happening at the rate of 6~7MB secs. > >> > > >> > > >> > Because of this, if I scan the entries of the table using Hive it is > >> taking > >> > ages. > >> > Basically it is taking around 24 hours to scan the table. Any idea, of > >> how > >> > to debug. > >> > > >> > > >> > -Vibhav > >> > > >> > > > +
Vibhav Mundra 2013-01-25, 18:14
-
Re: Hbase scans taking a lot of timeJean-Marc Spaggiari 2013-01-25, 18:23
You're better to put the data based on the way you will access it.
If you always read data from columns A, B, C and D together, then bundle them in a single column. And all of that in a single CF... JM 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: > This is what I think, Sorry for my ignorance. > > I want to use the property of Hbase( i.e columnar DB) so that only the > required columns are accessed. For this I kept a large number of column > families. > > But I am still not understanding....what is happening as there is no disk > I/O only High CPU and some network activity. > Why is the scan taking more time than the time to populate the Hbase. > > -Vibhav > > > On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Vibhav, >> >> Do you really need 13 diffefent columns familly? Can't you find a way >> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument >> name? >> >> That might help... >> >> JM >> >> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: >> > The number of column families I have is 13, which I guess is okie? >> > >> > -Vibhav >> > >> > >> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: >> > >> >> You'll have this problem if you have a large number of column families >> >> being scanned/populated at the same time. Make sure the data you >> >> scan/populate frequently are in the same column family (you can have >> many >> >> columns in a column family). Unlike BigTable/Hypertable which has the >> >> concept of locality/access groups, HBase always stores column families >> in >> >> separate files, essentially making column family not only a logic >> >> grouping >> >> mechanism but also a physical locality group. >> >> >> >> >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> >> wrote: >> >> >> >> > I am facing a very strange problem with HBase. >> >> > >> >> > This what I did: >> >> > a) Create a table, using pre partioned splits. >> >> > b) Also the column familes are zipped with lzo compression. >> >> > c) Using the above configuration I am able to populate 2 million row >> >> > per >> >> > min in the Hbase. >> >> > d) I have created a table with 300 million odd rows, which roughy >> >> > took >> >> me 3 >> >> > hours to populate and the data size is of 25GB. >> >> > >> >> > e) But when I query for data the performance I am getting is very >> >> > bad. >> >> > Basically this is what I am seeing: High CPU, no disk I/O and >> >> > network >> >> > I/O is happening at the rate of 6~7MB secs. >> >> > >> >> > >> >> > Because of this, if I scan the entries of the table using Hive it is >> >> taking >> >> > ages. >> >> > Basically it is taking around 24 hours to scan the table. Any idea, >> >> > of >> >> how >> >> > to debug. >> >> > >> >> > >> >> > -Vibhav >> >> > >> >> >> > >> > +
Jean-Marc Spaggiari 2013-01-25, 18:23
-
Re: Hbase scans taking a lot of timelars hofhansl 2013-01-25, 22:00
Enable scan batching in Hive.
You're probably performing 300m RPC requests, i.e. you're mostly measuring network latency. -- Lars ________________________________ From: Vibhav Mundra <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, January 25, 2013 1:10 AM Subject: Hbase scans taking a lot of time I am facing a very strange problem with HBase. This what I did: a) Create a table, using pre partioned splits. b) Also the column familes are zipped with lzo compression. c) Using the above configuration I am able to populate 2 million row per min in the Hbase. d) I have created a table with 300 million odd rows, which roughy took me 3 hours to populate and the data size is of 25GB. e) But when I query for data the performance I am getting is very bad. Basically this is what I am seeing: High CPU, no disk I/O and network I/O is happening at the rate of 6~7MB secs. Because of this, if I scan the entries of the table using Hive it is taking ages. Basically it is taking around 24 hours to scan the table. Any idea, of how to debug. -Vibhav +
lars hofhansl 2013-01-25, 22:00
-
Re: Hbase scans taking a lot of timelars hofhansl 2013-01-25, 23:56
Sorry I meant scan caching. (not batching)
________________________________ From: lars hofhansl <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> Sent: Friday, January 25, 2013 2:00 PM Subject: Re: Hbase scans taking a lot of time Enable scan batching in Hive. You're probably performing 300m RPC requests, i.e. you're mostly measuring network latency. -- Lars ________________________________ From: Vibhav Mundra <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, January 25, 2013 1:10 AM Subject: Hbase scans taking a lot of time I am facing a very strange problem with HBase. This what I did: a) Create a table, using pre partioned splits. b) Also the column familes are zipped with lzo compression. c) Using the above configuration I am able to populate 2 million row per min in the Hbase. d) I have created a table with 300 million odd rows, which roughy took me 3 hours to populate and the data size is of 25GB. e) But when I query for data the performance I am getting is very bad. Basically this is what I am seeing: High CPU, no disk I/O and network I/O is happening at the rate of 6~7MB secs. Because of this, if I scan the entries of the table using Hive it is taking ages. Basically it is taking around 24 hours to scan the table. Any idea, of how to debug. -Vibhav +
lars hofhansl 2013-01-25, 23:56
-
Re: Hbase scans taking a lot of timeAlok Kumar 2013-01-26, 06:07
Vibhav,
Hive submits a map-reduce job to hdfs cluster.* How many node cluster you have?* >>Because of this, if I scan the entries of the table using Hive it is taking >>ages. Do you have 'order by' or 'group by' clause in you query? Queries take longer to execute with these clauses. Try with Hbase Filters if it can fit with your need. It would be comparatively faster with limitations ( no order by' or 'group by' no joins) Regards, Alok On Sat, Jan 26, 2013 at 5:26 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > Sorry I meant scan caching. (not batching) > > > > ________________________________ > From: lars hofhansl <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; "[EMAIL PROTECTED]" > <[EMAIL PROTECTED]> > Sent: Friday, January 25, 2013 2:00 PM > Subject: Re: Hbase scans taking a lot of time > > Enable scan batching in Hive. > You're probably performing 300m RPC requests, i.e. you're mostly measuring > network latency. > > -- Lars > > > > ________________________________ > From: Vibhav Mundra <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Sent: Friday, January 25, 2013 1:10 AM > Subject: Hbase scans taking a lot of time > > I am facing a very strange problem with HBase. > > This what I did: > a) Create a table, using pre partioned splits. > b) Also the column familes are zipped with lzo compression. > c) Using the above configuration I am able to populate 2 million row per > min in the Hbase. > d) I have created a table with 300 million odd rows, which roughy took me 3 > hours to populate and the data size is of 25GB. > > e) But when I query for data the performance I am getting is very bad. > Basically this is what I am seeing: High CPU, no disk I/O and network > I/O is happening at the rate of 6~7MB secs. > > > Because of this, if I scan the entries of the table using Hive it is taking > ages. > Basically it is taking around 24 hours to scan the table. Any idea, of how > to debug. > > > -Vibhav > -- Alok Kumar +
Alok Kumar 2013-01-26, 06:07
-
Re: Hbase scans taking a lot of timeShashwat Shriparv 2013-01-25, 19:13
Try to use caching for query Regards § Shashwat Shriparv Sent from Samsung GalaxyJean-Marc Spaggiari <[EMAIL PROTECTED]> wrote:You're better to put the data based on the way you will access it. If you always read data from columns A, B, C and D together, then bundle them in a single column. And all of that in a single CF... JM 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: > This is what I think, Sorry for my ignorance. > > I want to use the property of Hbase( i.e columnar DB) so that only the > required columns are accessed. For this I kept a large number of column > families. > > But I am still not understanding....what is happening as there is no disk > I/O only High CPU and some network activity. > Why is the scan taking more time than the time to populate the Hbase. > > -Vibhav > > > On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari < > [EMAIL PROTECTED]> wrote: > >> Hi Vibhav, >> >> Do you really need 13 diffefent columns familly? Can't you find a way >> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument >> name? >> >> That might help... >> >> JM >> >> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: >> > The number of column families I have is 13, which I guess is okie? >> > >> > -Vibhav >> > >> > >> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: >> > >> >> You'll have this problem if you have a large number of column families >> >> being scanned/populated at the same time. Make sure the data you >> >> scan/populate frequently are in the same column family (you can have >> many >> >> columns in a column family). Unlike BigTable/Hypertable which has the >> >> concept of locality/access groups, HBase always stores column families >> in >> >> separate files, essentially making column family not only a logic >> >> grouping >> >> mechanism but also a physical locality group. >> >> >> >> >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> >> wrote: >> >> >> >> > I am facing a very strange problem with HBase. >> >> > >> >> > This what I did: >> >> > a) Create a table, using pre partioned splits. >> >> > b) Also the column familes are zipped with lzo compression. >> >> > c) Using the above configuration I am able to populate 2 million row >> >> > per >> >> > min in the Hbase. >> >> > d) I have created a table with 300 million odd rows, which roughy >> >> > took >> >> me 3 >> >> > hours to populate and the data size is of 25GB. >> >> > >> >> > e) But when I query for data the performance I am getting is very >> >> > bad. >> >> > Basically this is what I am seeing: High CPU, no disk I/O and >> >> > network >> >> > I/O is happening at the rate of 6~7MB secs. >> >> > >> >> > >> >> > Because of this, if I scan the entries of the table using Hive it is >> >> taking >> >> > ages. >> >> > Basically it is taking around 24 hours to scan the table. Any idea, >> >> > of >> >> how >> >> > to debug. >> >> > >> >> > >> >> > -Vibhav >> >> > >> >> >> > >> > +
Shashwat Shriparv 2013-01-25, 19:13
-
Re: Hbase scans taking a lot of timeVibhav Mundra 2013-01-25, 19:25
I did use the following but it didnt help either.
SET hbase.client.scanner.caching=30000; SET hive.hbase.client.scanner.caching=30000; -Vibhav On Sat, Jan 26, 2013 at 12:43 AM, Shashwat Shriparv < [EMAIL PROTECTED]> wrote: > > > Try to use caching for query > > > Regards > § > Shashwat Shriparv > > > Sent from Samsung GalaxyJean-Marc Spaggiari <[EMAIL PROTECTED]> > wrote:You're better to put the data based on the way you will access it. > > If you always read data from columns A, B, C and D together, then > bundle them in a single column. And all of that in a single CF... > > JM > > 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: > > This is what I think, Sorry for my ignorance. > > > > I want to use the property of Hbase( i.e columnar DB) so that only the > > required columns are accessed. For this I kept a large number of column > > families. > > > > But I am still not understanding....what is happening as there is no disk > > I/O only High CPU and some network activity. > > Why is the scan taking more time than the time to populate the Hbase. > > > > -Vibhav > > > > > > On Fri, Jan 25, 2013 at 11:36 PM, Jean-Marc Spaggiari < > > [EMAIL PROTECTED]> wrote: > > > >> Hi Vibhav, > >> > >> Do you really need 13 diffefent columns familly? Can't you find a way > >> to bundle that into 1 or 2 max CF? Maybe by prefixing the colument > >> name? > >> > >> That might help... > >> > >> JM > >> > >> 2013/1/25, Vibhav Mundra <[EMAIL PROTECTED]>: > >> > The number of column families I have is 13, which I guess is okie? > >> > > >> > -Vibhav > >> > > >> > > >> > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: > >> > > >> >> You'll have this problem if you have a large number of column > families > >> >> being scanned/populated at the same time. Make sure the data you > >> >> scan/populate frequently are in the same column family (you can have > >> many > >> >> columns in a column family). Unlike BigTable/Hypertable which has the > >> >> concept of locality/access groups, HBase always stores column > families > >> in > >> >> separate files, essentially making column family not only a logic > >> >> grouping > >> >> mechanism but also a physical locality group. > >> >> > >> >> > >> >> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> > >> wrote: > >> >> > >> >> > I am facing a very strange problem with HBase. > >> >> > > >> >> > This what I did: > >> >> > a) Create a table, using pre partioned splits. > >> >> > b) Also the column familes are zipped with lzo compression. > >> >> > c) Using the above configuration I am able to populate 2 million > row > >> >> > per > >> >> > min in the Hbase. > >> >> > d) I have created a table with 300 million odd rows, which roughy > >> >> > took > >> >> me 3 > >> >> > hours to populate and the data size is of 25GB. > >> >> > > >> >> > e) But when I query for data the performance I am getting is very > >> >> > bad. > >> >> > Basically this is what I am seeing: High CPU, no disk I/O and > >> >> > network > >> >> > I/O is happening at the rate of 6~7MB secs. > >> >> > > >> >> > > >> >> > Because of this, if I scan the entries of the table using Hive it > is > >> >> taking > >> >> > ages. > >> >> > Basically it is taking around 24 hours to scan the table. Any idea, > >> >> > of > >> >> how > >> >> > to debug. > >> >> > > >> >> > > >> >> > -Vibhav > >> >> > > >> >> > >> > > >> > > > +
Vibhav Mundra 2013-01-25, 19:25
-
Re: Hbase scans taking a lot of timeShashwat Shriparv 2013-01-25, 19:31
I would suggest u to look onto caching techniques
Regards § Shashwat Shriparv Sent from Samsung GalaxyAdrien Mogenet <[EMAIL PROTECTED]> wrote:Definitely not, you should keep it under 3 maximum. Keep in mind that 1 CF == 1 Store == at least that many big files to read. On Fri, Jan 25, 2013 at 6:59 PM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: > The number of column families I have is 13, which I guess is okie? > > -Vibhav > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: > > > You'll have this problem if you have a large number of column families > > being scanned/populated at the same time. Make sure the data you > > scan/populate frequently are in the same column family (you can have many > > columns in a column family). Unlike BigTable/Hypertable which has the > > concept of locality/access groups, HBase always stores column families in > > separate files, essentially making column family not only a logic > grouping > > mechanism but also a physical locality group. > > > > > > On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: > > > > > I am facing a very strange problem with HBase. > > > > > > This what I did: > > > a) Create a table, using pre partioned splits. > > > b) Also the column familes are zipped with lzo compression. > > > c) Using the above configuration I am able to populate 2 million row > per > > > min in the Hbase. > > > d) I have created a table with 300 million odd rows, which roughy took > > me 3 > > > hours to populate and the data size is of 25GB. > > > > > > e) But when I query for data the performance I am getting is very bad. > > > Basically this is what I am seeing: High CPU, no disk I/O and > network > > > I/O is happening at the rate of 6~7MB secs. > > > > > > > > > Because of this, if I scan the entries of the table using Hive it is > > taking > > > ages. > > > Basically it is taking around 24 hours to scan the table. Any idea, of > > how > > > to debug. > > > > > > > > > -Vibhav > > > > > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me +
Shashwat Shriparv 2013-01-25, 19:31
-
Re: Hbase scans taking a lot of timeVibhav Mundra 2013-01-25, 19:37
I am new to the Hbase-HIve.
Am I missing something. If would be great if you can point me to some documents about caching. -Vibhav On Sat, Jan 26, 2013 at 1:01 AM, Shashwat Shriparv < [EMAIL PROTECTED]> wrote: > I would suggest u to look onto caching techniques > > > > > Regards > § > Shashwat Shriparv > > > Sent from Samsung GalaxyAdrien Mogenet <[EMAIL PROTECTED]> > wrote:Definitely not, you should keep it under 3 maximum. Keep in mind that > 1 CF > == 1 Store == at least that many big files to read. > > > On Fri, Jan 25, 2013 at 6:59 PM, Vibhav Mundra <[EMAIL PROTECTED]> wrote: > > > The number of column families I have is 13, which I guess is okie? > > > > -Vibhav > > > > > > On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[EMAIL PROTECTED]> wrote: > > > > > You'll have this problem if you have a large number of column families > > > being scanned/populated at the same time. Make sure the data you > > > scan/populate frequently are in the same column family (you can have > many > > > columns in a column family). Unlike BigTable/Hypertable which has the > > > concept of locality/access groups, HBase always stores column families > in > > > separate files, essentially making column family not only a logic > > grouping > > > mechanism but also a physical locality group. > > > > > > > > > On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[EMAIL PROTECTED]> > wrote: > > > > > > > I am facing a very strange problem with HBase. > > > > > > > > This what I did: > > > > a) Create a table, using pre partioned splits. > > > > b) Also the column familes are zipped with lzo compression. > > > > c) Using the above configuration I am able to populate 2 million row > > per > > > > min in the Hbase. > > > > d) I have created a table with 300 million odd rows, which roughy > took > > > me 3 > > > > hours to populate and the data size is of 25GB. > > > > > > > > e) But when I query for data the performance I am getting is very > bad. > > > > Basically this is what I am seeing: High CPU, no disk I/O and > > network > > > > I/O is happening at the rate of 6~7MB secs. > > > > > > > > > > > > Because of this, if I scan the entries of the table using Hive it is > > > taking > > > > ages. > > > > Basically it is taking around 24 hours to scan the table. Any idea, > of > > > how > > > > to debug. > > > > > > > > > > > > -Vibhav > > > > > > > > > > > > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me > +
Vibhav Mundra 2013-01-25, 19:37
|