Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Is Avro/Trevni strictly read-only?


Copy link to this message
-
Re: Is Avro/Trevni strictly read-only?
interesting -- thanks for the link! Let me know if you have any more Kiji
questions.
Cheers
- Aaron
On Wed, Jan 30, 2013 at 6:49 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> I'm looking at Panthera, I'll check out Kiji too. Inferring the schema
> from the first record and creating a table it what is done in Voldemort's
> build/push job, so I'll look into that.
>
>
> https://github.com/voldemort/voldemort/wiki/Build-and-Push-Jobs-for-Voldemort-Read-Only-Stores
>
> Russell Jurney http://datasyndrome.com
>
> On Jan 30, 2013, at 6:33 PM, Aaron Kimball <[EMAIL PROTECTED]> wrote:
>
> Hi Russell,
>
> Great question.  Kiji is more strongly typed than systems like MongoDB.
> While your schema can evolve (using Avro evolution) without structurally
> updating existing data, you still need to specify your Avro schemas in a
> data dictionary. It's challenging to author systems in Java (as is typical
> of HBase/HDFS/MapReduce-facing applications) without some strong typing in
> the persistence layer. You wind up reading a lot of other peoples' code to
> figure out what types were written -- assuming you can find the code (or
> the hbase columns) in the first place.
>
> You can create table schemas either "manually" by filling out a JSON /
> Avro-based table layout specification, or you can use the DDL shell which
> lets you CREATE TABLE, ALTER TABLE, etc. in a pretty quick way. Once the
> table's set up, then you can write to it.  I think the DDL shell included
> with the bento box makes this a reasonably low-overhead process.
>
> We don't currently have any Pig integration. We've made some initial
> proof-of-concept progress on a StorageHandler that lets Hive query Kiji,
> but it's not in a ready state yet. Someone (you? :) could write a Pig
> integration; Pig already supports Avro I think. And you could even make it
> analyze the first output tuple and use that to infer types/column names to
> set up a result table with the appropriate table schema by invoking the DDL
> procedurally.
>
> Sorry I don't have a "magic wand" answer for you -- for the use cases we
> target, these sorts of setup costs often pay off in the long run, so that's
> the case we've optimized the design around. Let me know if there's anything
> else I can help with.
> Thanks,
> - Aaron
>
>
> On Wed, Jan 30, 2013 at 5:48 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:
>
>> Aaron - is there a way to create a Kiji table from Pig? I'm in the habit
>> of not specifying schemas with Voldemort and MongoDB, just storing a Pig
>> relation and the schema is set in the store. If I can arrange that somehow,
>> I'm all over Kiji. Panthera is a fork :/
>>
>>
>> On Wed, Jan 30, 2013 at 3:20 PM, Aaron Kimball <[EMAIL PROTECTED]>wrote:
>>
>>> Hi ccleve,
>>>
>>> I'd definitely urge you to try out Kiji -- we who work on it think it's
>>> a pretty good fit for this specific use case. If you've got further
>>> questions about Kiji and how to use it, please send them to me, or ask the
>>> kiji user mailing list: http://www.kiji.org/getinvolved#Mailing_Lists
>>>
>>> cheers,
>>> - Aaron
>>>
>>>
>>> On Tue, Jan 29, 2013 at 3:24 PM, Doug Cutting <[EMAIL PROTECTED]>wrote:
>>>
>>>> Avro and Trevni files do not support record update or delete.
>>>>
>>>> For large changing datasets you might use Kiji (http://www.kiji.org/)
>>>> to store Avro data in HBase.
>>>>
>>>> Doug
>>>>
>>>> On Mon, Jan 28, 2013 at 12:00 PM, ccleve <[EMAIL PROTECTED]> wrote:
>>>> > I've gone through the documentation, but haven't been able to get a
>>>> definite
>>>> > answer: is Avro, or specifically Trevni, only for read-only data?
>>>> >
>>>> > Is it possible to update or delete records?
>>>> >
>>>> > If records can be deleted, is there any code that will merge row sets
>>>> to get
>>>> > rid of the unused space?
>>>> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>>
>> --
>> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.
>> com
>>
>
>