|
Tatu Saloranta
2012-08-12, 22:26
Russell Jurney
2012-08-12, 22:34
Bill Graham
2012-08-13, 02:42
Russell Jurney
2012-08-13, 03:03
Knoke, Jeff
2012-08-13, 12:46
Knoke, Jeff
2012-08-13, 12:49
Tatu Saloranta
2012-08-13, 17:47
Tatu Saloranta
2012-08-13, 17:50
Bill Graham
2012-08-13, 22:59
Russell Jurney
2012-08-14, 00:31
Tatu Saloranta
2012-08-14, 02:33
|
-
Re: Avro vs JsonTatu Saloranta 2012-08-12, 22:26
I would ask questions from specific subset of users: those with actual
experience in using both, to compare approaches. If you ask someone who is only used one, all you get to know is that both can be made to work well enough. Which of course may be enough for your needs. :-) -+ Tatu +- On Sun, Aug 12, 2012 at 10:32 AM, Harsh J <[EMAIL PROTECTED]> wrote: > Moving this to the user@avro lists. Please use the right lists for the > best answers and the right people. > > I'd pick Avro out of the two - it is very well designed for typed data > and has a very good implementation of the serializer/deserializer, > aside of the schema advantages. FWIW, Avro has a tojson CLI tool to > dump Avro binary format out as JSON structures, which would be of help > if you seek readability and/or integration with apps/systems that > already depend on JSON. > > On Sun, Aug 12, 2012 at 10:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >> We get data in Json format. I was initially thinking of simply storing Json >> in hdfs for processing. I see there is Avro that does the similar thing but >> most likely stores it in more optimized format. I wanted to get users >> opinion on which one is better. > > > > -- > Harsh J
-
Re: Avro vs JsonRussell Jurney 2012-08-12, 22:34
You'll need to compress JSON. Avro can compress itself. Avro
represents more types, you'll need to serialize your types beyond what json supports with annotation or by convention. JSON is simpler. Short answer: use JSON if it's types are expressive enough for your data, and if you don't mind compressing it yourself. Avro has more types, has the schema onboard and self compresses. Russell Jurney On Aug 12, 2012, at 3:27 PM, Tatu Saloranta <[EMAIL PROTECTED]> wrote: > I would ask questions from specific subset of users: those with actual > experience in using both, to compare approaches. If you ask someone > who is only used one, all you get to know is that both can be made to > work well enough. Which of course may be enough for your needs. :-) > > -+ Tatu +- > > On Sun, Aug 12, 2012 at 10:32 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> Moving this to the user@avro lists. Please use the right lists for the >> best answers and the right people. >> >> I'd pick Avro out of the two - it is very well designed for typed data >> and has a very good implementation of the serializer/deserializer, >> aside of the schema advantages. FWIW, Avro has a tojson CLI tool to >> dump Avro binary format out as JSON structures, which would be of help >> if you seek readability and/or integration with apps/systems that >> already depend on JSON. >> >> On Sun, Aug 12, 2012 at 10:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >>> We get data in Json format. I was initially thinking of simply storing Json >>> in hdfs for processing. I see there is Avro that does the similar thing but >>> most likely stores it in more optimized format. I wanted to get users >>> opinion on which one is better. >> >> >> >> -- >> Harsh J
-
Re: Avro vs JsonBill Graham 2012-08-13, 02:42
The benefit of having a schema associated with your data should not be
understated. I think when debating whether to use JSON or some other data serialization format that has a schema (like Avro), you should choose the later. The schema support alone will pay dividends over the long run. On Sun, Aug 12, 2012 at 3:34 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > You'll need to compress JSON. Avro can compress itself. Avro > represents more types, you'll need to serialize your types beyond what > json supports with annotation or by convention. JSON is simpler. > > Short answer: use JSON if it's types are expressive enough for your > data, and if you don't mind compressing it yourself. Avro has more > types, has the schema onboard and self compresses. > > Russell Jurney > > On Aug 12, 2012, at 3:27 PM, Tatu Saloranta <[EMAIL PROTECTED]> wrote: > > > I would ask questions from specific subset of users: those with actual > > experience in using both, to compare approaches. If you ask someone > > who is only used one, all you get to know is that both can be made to > > work well enough. Which of course may be enough for your needs. :-) > > > > -+ Tatu +- > > > > On Sun, Aug 12, 2012 at 10:32 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Moving this to the user@avro lists. Please use the right lists for the > >> best answers and the right people. > >> > >> I'd pick Avro out of the two - it is very well designed for typed data > >> and has a very good implementation of the serializer/deserializer, > >> aside of the schema advantages. FWIW, Avro has a tojson CLI tool to > >> dump Avro binary format out as JSON structures, which would be of help > >> if you seek readability and/or integration with apps/systems that > >> already depend on JSON. > >> > >> On Sun, Aug 12, 2012 at 10:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > >>> We get data in Json format. I was initially thinking of simply storing > Json > >>> in hdfs for processing. I see there is Avro that does the similar > thing but > >>> most likely stores it in more optimized format. I wanted to get users > >>> opinion on which one is better. > >> > >> > >> > >> -- > >> Harsh J > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.*
-
Re: Avro vs JsonRussell Jurney 2012-08-13, 03:03
To be fair, you can test types as you parse JSON. But only a few.
The Avro schemas even include comments... huge win. Russell Jurney http://datasyndrome.com On Aug 12, 2012, at 7:42 PM, Bill Graham <[EMAIL PROTECTED]> wrote: The benefit of having a schema associated with your data should not be understated. I think when debating whether to use JSON or some other data serialization format that has a schema (like Avro), you should choose the later. The schema support alone will pay dividends over the long run. On Sun, Aug 12, 2012 at 3:34 PM, Russell Jurney <[EMAIL PROTECTED]>wrote: > You'll need to compress JSON. Avro can compress itself. Avro > represents more types, you'll need to serialize your types beyond what > json supports with annotation or by convention. JSON is simpler. > > Short answer: use JSON if it's types are expressive enough for your > data, and if you don't mind compressing it yourself. Avro has more > types, has the schema onboard and self compresses. > > Russell Jurney > > On Aug 12, 2012, at 3:27 PM, Tatu Saloranta <[EMAIL PROTECTED]> wrote: > > > I would ask questions from specific subset of users: those with actual > > experience in using both, to compare approaches. If you ask someone > > who is only used one, all you get to know is that both can be made to > > work well enough. Which of course may be enough for your needs. :-) > > > > -+ Tatu +- > > > > On Sun, Aug 12, 2012 at 10:32 AM, Harsh J <[EMAIL PROTECTED]> wrote: > >> Moving this to the user@avro lists. Please use the right lists for the > >> best answers and the right people. > >> > >> I'd pick Avro out of the two - it is very well designed for typed data > >> and has a very good implementation of the serializer/deserializer, > >> aside of the schema advantages. FWIW, Avro has a tojson CLI tool to > >> dump Avro binary format out as JSON structures, which would be of help > >> if you seek readability and/or integration with apps/systems that > >> already depend on JSON. > >> > >> On Sun, Aug 12, 2012 at 10:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > >>> We get data in Json format. I was initially thinking of simply storing > Json > >>> in hdfs for processing. I see there is Avro that does the similar > thing but > >>> most likely stores it in more optimized format. I wanted to get users > >>> opinion on which one is better. > >> > >> > >> > >> -- > >> Harsh J > -- *Note that I'm no longer using my Yahoo! email address. Please email me at [EMAIL PROTECTED] going forward.*
-
Re: Avro vs JsonKnoke, Jeff 2012-08-13, 12:46
----- Original Message ----- From: Russell Jurney [mailto:[EMAIL PROTECTED]] Sent: Sunday, August 12, 2012 06:34 PM To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Subject: Re: Avro vs Json You'll need to compress JSON. Avro can compress itself. Avro represents more types, you'll need to serialize your types beyond what json supports with annotation or by convention. JSON is simpler. Short answer: use JSON if it's types are expressive enough for your data, and if you don't mind compressing it yourself. Avro has more types, has the schema onboard and self compresses. Russell Jurney On Aug 12, 2012, at 3:27 PM, Tatu Saloranta <[EMAIL PROTECTED]> wrote: > I would ask questions from specific subset of users: those with actual > experience in using both, to compare approaches. If you ask someone > who is only used one, all you get to know is that both can be made to > work well enough. Which of course may be enough for your needs. :-) > > -+ Tatu +- > > On Sun, Aug 12, 2012 at 10:32 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> Moving this to the user@avro lists. Please use the right lists for the >> best answers and the right people. >> >> I'd pick Avro out of the two - it is very well designed for typed data >> and has a very good implementation of the serializer/deserializer, >> aside of the schema advantages. FWIW, Avro has a tojson CLI tool to >> dump Avro binary format out as JSON structures, which would be of help >> if you seek readability and/or integration with apps/systems that >> already depend on JSON. >> >> On Sun, Aug 12, 2012 at 10:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >>> We get data in Json format. I was initially thinking of simply storing Json >>> in hdfs for processing. I see there is Avro that does the similar thing but >>> most likely stores it in more optimized format. I wanted to get users >>> opinion on which one is better. >> >> >> >> -- >> Harsh J
-
Re: Avro vs JsonKnoke, Jeff 2012-08-13, 12:49
ÉE
----- Original Message ----- From: Russell Jurney [mailto:[EMAIL PROTECTED]] Sent: Sunday, August 12, 2012 06:34 PM To: [EMAIL PROTECTED] <[EMAIL PROTECTED]> Subject: Re: Avro vs Json You'll need to compress JSON. Avro can compress itself. Avro represents more types, you'll need to serialize your types beyond what json supports with annotation or by convention. JSON is simpler. Short answer: use JSON if it's types are expressive enough for your data, and if you don't mind compressing it yourself. Avro has more types, has the schema onboard and self compresses. Russell Jurney On Aug 12, 2012, at 3:27 PM, Tatu Saloranta <[EMAIL PROTECTED]> wrote: > I would ask questions from specific subset of users: those with actual > experience in using both, to compare approaches. If you ask someone > who is only used one, all you get to know is that both can be made to > work well enough. Which of course may be enough for your needs. :-) > > -+ Tatu +- > > On Sun, Aug 12, 2012 at 10:32 AM, Harsh J <[EMAIL PROTECTED]> wrote: >> Moving this to the user@avro lists. Please use the right lists for the >> best answers and the right people. >> >> I'd pick Avro out of the two - it is very well designed for typed data >> and has a very good implementation of the serializer/deserializer, >> aside of the schema advantages. FWIW, Avro has a tojson CLI tool to >> dump Avro binary format out as JSON structures, which would be of help >> if you seek readability and/or integration with apps/systems that >> already depend on JSON. >> >> On Sun, Aug 12, 2012 at 10:41 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >>> We get data in Json format. I was initially thinking of simply storing Json >>> in hdfs for processing. I see there is Avro that does the similar thing but >>> most likely stores it in more optimized format. I wanted to get users >>> opinion on which one is better. >> >> >> >> -- >> Harsh J
-
Re: Avro vs JsonTatu Saloranta 2012-08-13, 17:47
On Sun, Aug 12, 2012 at 7:42 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
> The benefit of having a schema associated with your data should not be > understated. I think when debating whether to use JSON or some other data > serialization format that has a schema (like Avro), you should choose the > later. The schema support alone will pay dividends over the long run. I would argue it is one of those things that is overstated due to intuitive attractiveness. It is worth keeping in mind that explicit external schema is another cost in not just designing but also maintaining the system. As such, it is most useful for closely-coupled internal system, where one controls both ends. This may be the case for computing pipelines a single team owns. Put another way: both benefits and costs of schemas accumulate over long run, and the ratio ultimately determines which one wins. And yet it is very hard to forecast in advance. What can be said is that maintenance of no-schema is cheaper than mainteinance of schema. Value of schema, on the other hand, is much harder to estimate a priori. -+ Tatu +-
-
Re: Avro vs JsonTatu Saloranta 2012-08-13, 17:50
On Sun, Aug 12, 2012 at 8:03 PM, Russell Jurney
<[EMAIL PROTECTED]> wrote: > To be fair, you can test types as you parse JSON. But only a few. ... Difference between external/explicit schema typed formats and schema-free (optional schema, as in JSON) formats is similar to that between statically and dynamically typed languages. Testing and handling differ, as well as trade-offs. -+ Tatu +-
-
Re: Avro vs JsonBill Graham 2012-08-13, 22:59
>
> It is worth keeping in mind that explicit external schema is another > cost in not just designing but also maintaining the system. As such, > it is most useful for closely-coupled internal system, where one > controls both ends. This may be the case for computing pipelines a > single team owns. Our experiences have been quite the opposite. When the developer producing data was the same as the developer writing code to consume it, json worked fine since the developer knew what fields to expect. As our company grew, this turned into tribal knowledge and the approach did not scale. That's when having schemas is critical: when one team produces data and many others consume it. The cost is that the producer needs to publish the schema for others to discover. On Mon, Aug 13, 2012 at 10:50 AM, Tatu Saloranta <[EMAIL PROTECTED]>wrote: > On Sun, Aug 12, 2012 at 8:03 PM, Russell Jurney > <[EMAIL PROTECTED]> wrote: > > To be fair, you can test types as you parse JSON. But only a few. > ... > > Difference between external/explicit schema typed formats and > schema-free (optional schema, as in JSON) formats is similar to that > between statically and dynamically typed languages. > Testing and handling differ, as well as trade-offs. > > -+ Tatu +- >
-
Re: Avro vs JsonRussell Jurney 2012-08-14, 00:31
This is consistent with my experience. As a user of HDFS, I would find data
produced by others and not know the semantics well enough to use it. On board schemas, with comments, make this data more useable, although a system like HCatalog is useful in facilitating this kind of discovery. Avro enables and encourages the preparation of shared data sets among users, which saves cycles and improves productivity. Russell Jurney http://datasyndrome.com On Aug 13, 2012, at 4:00 PM, Bill Graham <[EMAIL PROTECTED]> wrote: It is worth keeping in mind that explicit external schema is another > cost in not just designing but also maintaining the system. As such, > it is most useful for closely-coupled internal system, where one > controls both ends. This may be the case for computing pipelines a > single team owns. Our experiences have been quite the opposite. When the developer producing data was the same as the developer writing code to consume it, json worked fine since the developer knew what fields to expect. As our company grew, this turned into tribal knowledge and the approach did not scale. That's when having schemas is critical: when one team produces data and many others consume it. The cost is that the producer needs to publish the schema for others to discover. On Mon, Aug 13, 2012 at 10:50 AM, Tatu Saloranta <[EMAIL PROTECTED]>wrote: > On Sun, Aug 12, 2012 at 8:03 PM, Russell Jurney > <[EMAIL PROTECTED]> wrote: > > To be fair, you can test types as you parse JSON. But only a few. > ... > > Difference between external/explicit schema typed formats and > schema-free (optional schema, as in JSON) formats is similar to that > between statically and dynamically typed languages. > Testing and handling differ, as well as trade-offs. > > -+ Tatu +- >
-
Re: Avro vs JsonTatu Saloranta 2012-08-14, 02:33
1On Mon, Aug 13, 2012 at 3:59 PM, Bill Graham <[EMAIL PROTECTED]> wrote:
>> It is worth keeping in mind that explicit external schema is another >> cost in not just designing but also maintaining the system. As such, >> it is most useful for closely-coupled internal system, where one >> controls both ends. This may be the case for computing pipelines a >> single team owns. > > > Our experiences have been quite the opposite. When the developer producing > data was the same as the developer writing code to consume it, json worked > fine since the developer knew what fields to expect. As our company grew, > this turned into tribal knowledge and the approach did not scale. That's > when having schemas is critical: when one team produces data and many others > consume it. The cost is that the producer needs to publish the schema for > others to discover. Interesting, good point. I was rather thinking of main cost being in maintenance, i.e. if and when format changes, not so much upfront effort (although that's more visible). And that cost depends on amount of change, if any, as well as effort for other systems to adapt. Avro does have better support for schema evolution, at least in theory. So that could help too. -+ Tatu +- |