Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Add Columnsize Filter for Scan Operation


Copy link to this message
-
Re: Add Columnsize Filter for Scan Operation
Jean-Marc Spaggiari 2013-10-24, 16:37
If the MR crash because of the number of columns, then we have an issue
that we need to fix ;) Please open a JIRA provide details if you are facing
that.

Thanks,

JM
2013/10/24 John <[EMAIL PROTECTED]>

> @Jean-Marc: Sure, I can do that, but thats a little bit complicated because
> the the rows has sometimes Millions of Columns and I have to handle them
> into different batches because otherwise hbase crashs. Maybe I will try it
> later, but first I want to try the API version. It works okay so far, but I
> want to improve it a little bit.
>
> @Ted: I try to modify it, but I have no idea how exactly do this. I've to
> count the number of columns in that filter (that works obviously with the
> count field). But there is no Method that is caleld after iterating over
> all elements, so I can not return the Drop ReturnCode in the filterKeyValue
> Method because I did'nt know when it was the last one. Any ideas?
>
> regards
>
>
> 2013/10/24 Ted Yu <[EMAIL PROTECTED]>
>
> > Please take a look
> > at
> src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java :
> >
> >  * Simple filter that returns first N columns on row only.
> >
> > You can modify the filter to suit your needs.
> >
> > Cheers
> >
> >
> > On Thu, Oct 24, 2013 at 7:52 AM, John <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hi,
> > >
> > > I'm write currently a HBase Java programm which iterates over every row
> > in
> > > a table. I have to modiy some rows if the column size (the amount of
> > > columns in this row) is bigger than 25000.
> > >
> > > Here is my sourcode: http://pastebin.com/njqG6ry6
> > >
> > > Is there any way to add a Filter to the scan Operation and load only
> rows
> > > where the size is bigger than 25k?
> > >
> > > Currently I check the size at the client, but therefore I have to load
> > > every row to the client site. It would be better if the wrong rows
> > already
> > > filtered at the "server" site.
> > >
> > > thanks
> > >
> > > John
> > >
> >
>