Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Add Columnsize Filter for Scan Operation


+
John 2013-10-24, 14:52
+
Ted Yu 2013-10-24, 15:06
+
John 2013-10-24, 16:24
+
Jean-Marc Spaggiari 2013-10-24, 16:37
+
Dhaval Shah 2013-10-24, 16:53
Copy link to this message
-
Re: Add Columnsize Filter for Scan Operation
For streaming responses, there is this JIRA:

HBASE-8691 High-Throughput Streaming Scan API
On Thu, Oct 24, 2013 at 9:53 AM, Dhaval Shah <[EMAIL PROTECTED]>wrote:

> Jean, if we don't add setBatch to the scan, MR job does cause HBase to
> crash due to OOME. We have run into this in the past as well. Basically the
> problem is - Say I have a region server with 12GB of RAM and a row of size
> 20GB (an extreme example, in practice, HBase runs out of memory way before
> 20GB). If I query the entire row, HBase does not have enough memory to
> hold/process it for the response.
>
> In practice, if your setCaching > 1, then the aggregate of all rows
> growing too big can also cause the same issue.
>
> I think 1 way we can solve this issue is making the HBase server serve
> responses in a streaming fashion somehow (not exactly sure about the
> details on how this can work but if it has to hold the entire row in
> memory, its going to be bound by HBase heap size)
>
> Regards,
> Dhaval
>
>
> ________________________________
>  From: Jean-Marc Spaggiari <[EMAIL PROTECTED]>
> To: user <[EMAIL PROTECTED]>
> Sent: Thursday, 24 October 2013 12:37 PM
> Subject: Re: Add Columnsize Filter for Scan Operation
>
>
> If the MR crash because of the number of columns, then we have an issue
> that we need to fix ;) Please open a JIRA provide details if you are facing
> that.
>
> Thanks,
>
> JM
>
>
>
> 2013/10/24 John <[EMAIL PROTECTED]>
>
> > @Jean-Marc: Sure, I can do that, but thats a little bit complicated
> because
> > the the rows has sometimes Millions of Columns and I have to handle them
> > into different batches because otherwise hbase crashs. Maybe I will try
> it
> > later, but first I want to try the API version. It works okay so far,
> but I
> > want to improve it a little bit.
> >
> > @Ted: I try to modify it, but I have no idea how exactly do this. I've to
> > count the number of columns in that filter (that works obviously with the
> > count field). But there is no Method that is caleld after iterating over
> > all elements, so I can not return the Drop ReturnCode in the
> filterKeyValue
> > Method because I did'nt know when it was the last one. Any ideas?
> >
> > regards
> >
> >
> > 2013/10/24 Ted Yu <[EMAIL PROTECTED]>
> >
> > > Please take a look
> > > at
> > src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java :
> > >
> > >  * Simple filter that returns first N columns on row only.
> > >
> > > You can modify the filter to suit your needs.
> > >
> > > Cheers
> > >
> > >
> > > On Thu, Oct 24, 2013 at 7:52 AM, John <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm write currently a HBase Java programm which iterates over every
> row
> > > in
> > > > a table. I have to modiy some rows if the column size (the amount of
> > > > columns in this row) is bigger than 25000.
> > > >
> > > > Here is my sourcode: http://pastebin.com/njqG6ry6
> > > >
> > > > Is there any way to add a Filter to the scan Operation and load only
> > rows
> > > > where the size is bigger than 25k?
> > > >
> > > > Currently I check the size at the client, but therefore I have to
> load
> > > > every row to the client site. It would be better if the wrong rows
> > > already
> > > > filtered at the "server" site.
> > > >
> > > > thanks
> > > >
> > > > John
> > > >
> > >
> >
>
+
Dhaval Shah 2013-10-24, 17:13
+
Vladimir Rodionov 2013-10-24, 23:09
+
Dhaval Shah 2013-10-24, 23:38
+
lars hofhansl 2013-10-25, 05:00
+
John 2013-10-25, 11:45
+
John 2013-10-25, 12:02
+
Dhaval Shah 2013-10-25, 14:23
+
John 2013-10-25, 23:17
+
Dhaval Shah 2013-10-26, 00:06
+
John 2013-10-26, 10:20
+
Dhaval Shah 2013-10-26, 13:51
+
John 2013-10-26, 14:26
+
John 2013-10-24, 16:49
+
Jean-Marc Spaggiari 2013-10-24, 15:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB