Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - JSON format files versus AVRO


Copy link to this message
-
Re: JSON format files versus AVRO
Sushanth Sowmyan 2013-10-08, 18:39
Have you had a look at the JsonSerDe in hcatalog to see if it suits your
need?

It does not support the format you are suggesting directly, but if you made
the unique I'd part of the json object, so that each line was a json
record, it would. It's made to be used in conjunction with text tables.

Also, even if it proves to not be what you want directly, it already
provides a serializer/deserializer
On Oct 7, 2013 4:41 PM, "Sanjay Subramanian" <
[EMAIL PROTECTED]> wrote:

>   Sorry if the subject sounds really stupid !
>
>  Basically I am re-architecting our web log record format
>
>  Currently we have "Multiple lines = 1 Record " format (I have Hadoop
> jobs that parse the files and create columnar output for Hive tables)
>
>  [begin_unique_id]
> Pipe delimited Blah………………..
> Pipe delimited Blah………………..
> Pipe delimited Blah………………..
> Pipe delimited Blah………………..
> Pipe delimited Blah………………..
> [end_unique_id]
>
>
>  I have created JSON serializers that will log records in the following
> way going forward
>  <unique_id>     <JSON-string>
>
>  This is the plan
> - I will store the records in a two column table in Hive
> - Write JSON deserializers in hive HDFs that will take these tables and
>  create hive tables pertaining to specific requirements
> - Modify current aggregation scripts in Hive
>
>  I was seeing AVRO format but I don't see the value of using AVO when I
> feel JSON gives me pretty much the same thing ?
>
>  Please poke holes in my thinking ! Rip me apart !
>
>
>   Thanks
> Regards
>
>  sanjay
>
>
>
> CONFIDENTIALITY NOTICE
> =====================> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>