Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Best technique for doing lookup with Secondary Index


+
anil gupta 2012-10-26, 15:14
+
anil gupta 2012-10-26, 16:43
+
fding hbase 2012-10-26, 10:14
+
Jerry Lam 2012-10-26, 14:29
+
Ramkrishna.S.Vasudevan 2012-10-26, 14:33
+
anil gupta 2012-10-26, 06:46
+
Ramkrishna.S.Vasudevan 2012-10-26, 08:13
+
anil gupta 2012-10-24, 21:40
+
Anoop Sam John 2012-10-25, 04:44
+
Ramkrishna.S.Vasudevan 2012-10-25, 05:16
+
anil gupta 2012-10-25, 22:10
+
Ramkrishna.S.Vasudevan 2012-10-26, 04:20
+
Anoop Sam John 2012-10-26, 04:33
+
Anoop Sam John 2012-10-26, 04:06
Copy link to this message
-
Re: Best technique for doing lookup with Secondary Index
Hi Anoop,

Yes i use bulk loading for loading table A. I wrote my own mapper as
Importtsv wont suffice my requirements. :) No, i dont call HTable#put()
from my mapper. I was thinking about trying out calling HTable#put() from
my mapper and see the outcome.

 I meant to say that when we use MR job (ex. importtsv) then WAL is not
used. Sorry, if i misunderstood someone.

Thanks,
Anil

On Thu, Oct 25, 2012 at 9:06 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote:

> Hi Anil,
>               Some confusion after seeing your reply.
> You use bulk loading?  You created your own mapper?  You call HTable#put()
> from mappers?
>
> I think confusion in another thread also..  I was refering to the
> HFileOutputReducer.. There is a TableOutputFormat also... In
> TableOutputFormat it will try put to the HTable...  Here write to WAL is
> applicable...
>
>
> [HFileOutputReducer] : As we discussed in another thread, in case of bulk
> loading the aproach is like MR job create KVs and write to files and this
> file is written as an HFile. Yes this will contain all meta information,
> trailer etc... Finally only HBase cluster need to be contacted just to load
> this HFile(s) into HBase cluster.. Under corresponding regions.  This will
> be the fastest way for bulk loading of huge data...
>
>
> -Anoop-
> ________________________________________
> From: anil gupta [[EMAIL PROTECTED]]
> Sent: Friday, October 26, 2012 3:40 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Best technique for doing lookup with Secondary Index
>
> Anoop:  In prePut hook u call HTable#put()?
> Anil: Yes i call HTable#put() in prePut. Is there better way of doing it?
>
> Anoop: Why use the network calls from server side here then?
> Anil: I thought this is a cleaner approach since i am using BulkLoader. I
> decided not to run two jobs since i am generating a UniqueIdentifier at
> runtime in bulkloader.
>
> Anoop: can not handle it from client alone?
> Anil: I cannot handle it from client since i am using BulkLoader. Is it a
> good idea to create Htable instance on "B" and do put in my mapper? I might
> try this idea.
>
> Anoop: You can have a look at Lily project.
> Anil: It's little late for us to evaluate Lily now and at present we dont
> need complex secondary index since our data is immutable.
>
> Ram: what is rowkey B here?
> Anil: Suppose i am storing customer events in table A. I have two
> requirement for data query:
> 1. Query customer events on basis of customer_Id and event_ID.
> 2. Query customer events on basis of event_timestamp and customer_ID.
>
> 70% of querying is done by query#1, so i will create
> <customer_Id><event_ID> as row key of Table A.
> Now, in order to support fast results for query#2, i need to create a
> secondary index on A. I store that secondary index in B, rowkey of B is
> <event_timestamp><customer_ID>  .Every row stores the corresponding rowkey
> of A.
>
> Ram:How is the startRow determined for every query?
> Anil: Its determined by a very simple application logic.
>
> Thanks,
> Anil Gupta
>
> On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > Just out of curiosity,
> > > The secondary index is stored in table "B" as rowkey B -->
> > > family:<rowkey
> > > A>
> > what is rowkey B here?
> > > 1. Scan the secondary table by using prefix filter and startRow.
> > How is the startRow determined for every query ?
> >
> > Regards
> > Ram
> >
> > > -----Original Message-----
> > > From: Anoop Sam John [mailto:[EMAIL PROTECTED]]
> > > Sent: Thursday, October 25, 2012 10:15 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: RE: Best technique for doing lookup with Secondary Index
> > >
> > > >I build the secondary table "B" using a prePut RegionObserver.
> > >
> > > Anil,
> > >        In prePut hook u call HTable#put()?  Why use the network calls
> > > from server side here then? can not handle it from client alone? You
> > > can have a look at Lily project.   Thoughts after seeing ur idea on put

Thanks & Regards,
Anil Gupta
+
Doug Meil 2012-10-27, 00:35
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB