Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hive, mail # user - Snappy with HIve

Copy link to this message
Snappy with HIve
Sanjay Subramanian 2013-05-21, 23:30
Hi guys

I have an MR job that creates Snappy Codec Output files.
My table definition is as follows
CREATE EXTERNAL TABLE IF NOT EXISTS outpdir_header_hive_only(hbase_pk STRING,header_servername_donotquerySTRING,header_date_donotquery STRING, header_id STRING, header_hbpk STRING,header_channelId INT,header_searchAnnotation STRING,header_continuedSearchFlag INT,header_prodLow INT,header_prodTotal INT,header_sort INT,header_view INT,header_adNodes INT,header_spellingSuggestion STRING,header_queryType INT,header_nodeId INT,header_pinpointPtitleId INT,header_firedSearchRulesSTRING,header_rbAbsentSellers INT,header_shuffled INT,header_searchSessionId STRING,header_normalizationFlag STRING,header_relatedItemResultCount INT,header_unrankedSelectedPtitleIds INT,header_normKeyword STRING,header_kplEntry INT,header_isSaved STRING,header_rawProfileScore DOUBLE,header_normalizedProfileScore INT,header_scorerInfo STRING,header_contextNode INT,header_fbId STRING,norm_stem_keyword STRING, attrs_origNodeId INT,attrs_mfrId INT,attrs_sellerId INT,attrs_otherAttrs STRING,attrs_ptitleId INT,cached_date STRING,cached_recordId STRING,cached_visitorId STRING,cached_visit_id STRING,cached_appStyle STRING,cached_publisherId INT,cached_IP STRING,cached_source STRING,cached_refkw STRING,cached_pixeled INT,cached_searchRefineAttrImps STRING,cached_pageType STRING,cached_zipCode STRING,cached_zipType STRING,cached_perpage INT) PARTITIONED BY (header_date STRING, header_servername STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

Do I have to give some INPUTFORMAT directive to make the Hive Table read Snappy Codec files ?
For example for LZO its
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
For Hive scripts that will READ Snappy files and Output Snappy Files to Hive Tables are the following settings enough ?
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;



=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
bejoy_ks@... 2013-05-23, 14:31
Sanjay Subramanian 2013-05-23, 16:49