-Notes from yesterday's Contributors Pow Wow [WAS --> Re: HBase Developer's Pow-wow]
Stack 2012-09-12, 18:21
Here are notes from yesterday's contributor's meetup:
The notes below are spotty. I kept forgetting to take them. I also
was unable to keep up w/ the rate of exchange so in many parts the
reporting drops or a speaker's nuanced argument is crassly rendered.
We'll hire a stenographer for the next one.
We started at 2:00PM. Meeting lasted till almost 6:00PM
Devaraj Das of HortonWorks, our host, welcomed everyone.
Jimmy Xiang presented on recent changes to Assignment Manager in
Good Q&A during the presentation and after included:
+ What kind of tests do we have in place for the new AM patches?
+ We should have tests to ensure we don't lose performance
+ Make sure we don't lose operator facility; e.g. abiilty to override
assignment state from shell
A general question was posed on what do we all think of the current
state of the Assignment Manger? Could we make it pluggable (Francis
Liu is looking at making the AM pluggable so can add "Groups")? Will
we ever be able to make the AM rock solid?
Its complex. Its hard making it pluggable? Or how about adding
support for different kinds of policies?
Jon Hsieh: Rules that were there originally in design, are they being
Elliott: AM stuff bled into the Master; stuff is bleeding all over.
Need to clean up Master.
Enis: Splitting state is split between zk, RS, and HM.... too complex.
Could AM/Master just do it? Some one entity should be the source of
truth [for region assignment].
Lars Hofhansl: We need to write out the state machine and just rewrite
the AM or hack code to get same result
Andrew Purtell: We should do both. Testable changes. We have gone
through a bunch of master rewrites and suspect that another rewrite
would just land us with a new set of issues.
Lars: If we had AM state machine, then could [do both rewrite and/or
patch it to a state of robustness].
Jon Hsieh: Is Master2 close to its original design? Maybe we should
do assignment in another way? Maybe Ram[krishna] understands the AM
but it seems like no one else here really does. Yeah, Ram is the AM
(he can assign to us how to fix it all).
Ted Yu: AM should be able to do colocation for secondary indices and
Francis Liu: On what region grouping is, if could do different AM,
could then assign tables to a group, could make it so they don't
affect another application running in a different group all on a big
cluster. Same for workloads. Were thinking of doing grouping first,
then attack multi-tenancy later.
LarsH: Do you need the whole thing pluggable or do you want to rewrite it.
Francis: Pluggable would be nice. In the past have subclassed AM to
add functionality (would like to avoid that).
JDCryans: Could you build grouping on top of HBase by just disabling
the balancer and use the move command?
Elliott: Could you do your own balancer? Would that work?
Francis: Balancer is given a plan only
Andrew: Why not do as Karthik suggested in the past, and just run
Francis: Its too complicated
Ted: (Said something about explaining Francis's situation)
Jacques: Seems like you want a placement strategy only? Or do you
need to change timeouts, etc? Or is it just placement?
Elliott: Would it work if we added more facility to the balancer?
Maybe make it more pluggable with more levers?
Andrew: Yeah, balancer might be way to go for 2ndary indices because
want to colocate regions.
Enis presented on the new Integration Test patch:
Andrew: Intends to use IT internally. Is looking at porting some of
his internal use cases; web crawling, etc., to use the IT framework.
Enis then presented on HBase on Window work: See above slides.
There are scripts to comparable to the linux bash scripts to run
instead launching HBase on Windows, etc.
Patches are on their way in.
Jesse Yates presented on 2ndary indices:
A discussion ensued (Lots of questions and comments during this talk)
Jon: We need to add a checkAndMutate... to complete our current CAS
family of methods.
Lars: Index per region means need to farm out to all regions
querying... need to do both [this and the index that is non-aligned on
Jon: Cassandra 2ndary indexing does per node... turns into
optimization on scanning whole table
Matt Corgan: We have all sorts of indices [at our shop] and all on the
same table; we write the main table and secondary indices all from the
client... it seems to keeps up. Its hard to predefine types.
Client-side does... doesn't have to be strongly-consistent in his
case.. likes it w/o schema... that its all just kvs. This keeps it
Jesse: What if you lose write part-way through. Analytics use case.
Matt: Building kvs all in a put and then at read time doing reconciles...
Lars: Giving tools to make it so you can store floating point as
sortable bytes...etc....give you building blocks to help you build
your secondary indices as you need them [rather than prescribe a
single secondary index soln.]
Enis: If we have these building blocks, then we could have hive/pig go
against these codecs
Lars: Add tools, api and facility to hbase.. you do a bunch of puts, a
tool that makes sure all applied....Or some way of getting back all
timestamps back for the index puts and then use the returned
timestamps to write main table. Figuring what little building blocks
to add to HBase....
Matt: All client side, is that right?
Jon: What about building indices on qualifiers?
Elliott: Building secondary indices could be built on replication
where service makes sure all indices are updated
Lars: Could do fixup at read time.
Jacques: .... wanted to step back (missed most of his question)....
wanted to learn more about what are the use cases people have built
2ndary indices for.