I'm looking at Panthera, I'll check out Kiji too. Inferring the schema from
the first record and creating a table it what is done in Voldemort's
build/push job, so I'll look into that.
Russell Jurney http://datasyndrome.com
On Jan 30, 2013, at 6:33 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote:
Great question. Kiji is more strongly typed than systems like MongoDB.
While your schema can evolve (using Avro evolution) without structurally
updating existing data, you still need to specify your Avro schemas in a
data dictionary. It's challenging to author systems in Java (as is typical
of HBase/HDFS/MapReduce-facing applications) without some strong typing in
the persistence layer. You wind up reading a lot of other peoples' code to
figure out what types were written -- assuming you can find the code (or
the hbase columns) in the first place.
You can create table schemas either "manually" by filling out a JSON /
Avro-based table layout specification, or you can use the DDL shell which
lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the
table's set up, then you can write to it. I think the DDL shell included
with the bento box makes this a reasonably low-overhead process.
We don't currently have any Pig integration. We've made some initial
proof-of-concept progress on a StorageHandler that lets Hive query Kiji,
but it's not in a ready state yet. Someone (you? :) could write a Pig
integration; Pig already supports Avro I think. And you could even make it
analyze the first output tuple and use that to infer types/column names to
set up a result table with the appropriate table schema by invoking the DDL
Sorry I don't have a "magic wand" answer for you -- for the use cases we
target, these sorts of setup costs often pay off in the long run, so that's
the case we've optimized the design around. Let me know if there's anything
else I can help with.
On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
> Aaron - is there a way to create a Kiji table from Pig? I'm in the habit
> of not specifying schemas with Voldemort and MongoDB, just storing a Pig
> relation and the schema is set in the store. If I can arrange that somehow,
> I'm all over Kiji. Panthera is a fork :/
> On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <[EMAIL PROTECTED]>wrote:
>> Hi ccleve,
>> I'd definitely urge you to try out Kiji -- we who work on it think it's a
>> pretty good fit for this specific use case. If you've got further questions
>> about Kiji and how to use it, please send them to me, or ask the kiji user
>> mailing list: http://www.kiji.org/getinvolved#Mailing_Lists
>> - Aaron
>> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>>> Avro and Trevni files do not support record update or delete.
>>> For large changing datasets you might use Kiji (http://www.kiji.org/)
>>> to store Avro data in HBase.
>>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <[EMAIL PROTECTED]> wrote:
>>> > I've gone through the documentation, but haven't been able to get a
>>> > answer: is Avro, or specifically Trevni, only for read-only data?
>>> > Is it possible to update or delete records?
>>> > If records can be deleted, is there any code that will merge row sets
>>> to get
>>> > rid of the unused space?
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.