Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - JSON format files versus AVRO


Copy link to this message
-
JSON format files versus AVRO
Sanjay Subramanian 2013-10-07, 23:40
Sorry if the subject sounds really stupid !

Basically I am re-architecting our web log record format

Currently we have "Multiple lines = 1 Record " format (I have Hadoop jobs that parse the files and create columnar output for Hive tables)

[begin_unique_id]
Pipe delimited Blah....................
Pipe delimited Blah....................
Pipe delimited Blah....................
Pipe delimited Blah....................
Pipe delimited Blah....................
[end_unique_id]
I have created JSON serializers that will log records in the following way going forward
<unique_id>     <JSON-string>

This is the plan
- I will store the records in a two column table in Hive
- Write JSON deserializers in hive HDFs that will take these tables and  create hive tables pertaining to specific requirements
- Modify current aggregation scripts in Hive

I was seeing AVRO format but I don't see the value of using AVO when I feel JSON gives me pretty much the same thing ?

Please poke holes in my thinking ! Rip me apart !
Thanks
Regards

sanjay

CONFIDENTIALITY NOTICE
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.