Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Designing table with auto increment key


Copy link to this message
-
Re: Designing table with auto increment key
Thanks everyone for the excellent ideas.

Ryan - I kinda understand your suggestion to a point.  If time permits,
please explain further.

What you are suggesting is to create a table with 99 rows with keys 'c_1',
'c_2'... thru 'c_99'.  Row c_1 would generate ids 1, 101, 201.. so on, and
row c_99 would generate 99, 199, & so on.  I got it this far.

But hypothetically speaking, let's say I am running a MapReduce to process a
huge log file.  Each line of the log would be passed to a Map function.
 Trying to figure out how I would distribute load evenly amongst c_1 thru
c_99.  Please explain.
On Sun, Feb 13, 2011 at 10:18 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:

> you can also stripe, eg:
>
> c_1 starts at 1, skip=100
> c_2 starts at 2, skip=100
> c_$i starts at $i, skip=100 for 3..99
>
> now you have 100x speed/parallelism.  If single regionserver
> assignment becomes a problem, use multiple tables.
>
> On Sun, Feb 13, 2011 at 10:12 PM, Lars George <[EMAIL PROTECTED]>
> wrote:
> > Hi SS,
> >
> > Some people that do not need strict contiguous IDs also use block
> > increments of say 100. Each app server then gets 100 IDs to hand out
> > and in case it dies it gets its next assigned 100 IDs and leaves a
> > small gap behind. That way you can take the pressure of the counter if
> > that is going to be an issue for you. Depends on your insert frequency
> > obviously.
> >
> > Lars
> >
> > On Sun, Feb 13, 2011 at 7:10 PM, Something Something
> > <[EMAIL PROTECTED]> wrote:
> >> Hello,
> >>
> >> Can you please tell me if this is the proper way of designing a table
> that's
> >> got an auto increment key?  If there's a better way please let me know
> that
> >> as well.
> >>
> >> After reading the mail archives, I learned that the best way is to use
> the
> >> 'incrementColumnValue' method of HTable.
> >>
> >> So hypothetically speaking let's say I have to create a "User -> Orders"
> >> relationship.  Every time user creates an order we will assign a system
> >> generated (auto increment) id as primary key for the order.
> >>
> >> I am thinking I could do this:
> >>
> >> 1)  Create a table of Ids for various objects such as "Order".  It will
> have
> >> just a single row with key "1" and column families for various objects.
> >>  When it's time to add a new order for a user I can do something like
> this:
> >>
> >> HTable tableIds = new HTable("IDs");
> >> Get get = new Get(Bytes.toBytes("1"));
> >> Result result = tableIds.get(get);
> >> long newOrderId = tableIds.incrementColumnValue(result.getRow(),
> "orders",
> >> "orderId", 1);
> >>
> >> // In future I could use the same table for other objects as follows
> >> // long newInvoiceId = tableIds.incrementColumnValue(result.getRow(),
> >> "invoices", "invoiceId", 1);
> >>
> >> 2)  Once the newOrderId is retrieved I can add the info about order to
> >> UserOrder table with a key of format:  userId + "*" + newOrderId.  The
> >> "info" family of this table will have columns such as "orderAmount" ,
> >> "orderDate" etc.
> >>
> >>
> >> As per the documentation, the 'incrementColumnValue' is done in
> exclusive
> >> and serial fashion for each row with a rowlock.  In other words, even in
> >> multi-threading environment we are guaranteed to get a unique key per
> >> thread, correct?
> >>
> >> Is this a correct/good design for a table that needs auto increment key?
> >>  Please let me know.  Thanks.
> >>
> >
>