Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> Special characters in web log file causing issues


Copy link to this message
-
Re: Special characters in web log file causing issues
U may have to remove non-printable chars first, save an intermediate file and then load into Hive

tr -cd '[:print:]\r\n\t'

Or if u have strings function that will only output printable chars
From: Raj Hadoop <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Reply-To: "[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>" <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>, Raj Hadoop <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Date: Monday, July 8, 2013 1:52 PM
To: Hive <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>
Subject: Special characters in web log file causing issues
Hi ,

The log file that I am trying to load throuh Hive has some special characters

The field is shown below and the special characters ¿¿are also shown.

    Shockwave Flash;Chrome Remote Desktop Viewer;Native Client;Chrome PDF Viewer;Adobe Acrobat;Microsoft Office 2010;Motive Plug-
    in;Motive Management Plug-in;Google Update;Java(TM) Platform SE 7 U21;McAfee SiteAdvisor;McAfee Virtual Technician;Windows     Live¿¿ Photo Gallery;McAfee SecurityCenter;Silverlig
The above is causing the record to be terminated and loading another line.  How can I avoid this type of issues and how to load the proper data ? Any suggestions please.

Thanks,
Raj

CONFIDENTIALITY NOTICE
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.