Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Dropping embedded newlines for csv


+
David Kincaid 2012-09-20, 17:55
+
Jarek Jarcec Cecho 2012-09-20, 18:03
+
Chalcy 2012-09-20, 18:04
+
Chalcy 2012-09-20, 18:07
+
Jarek Jarcec Cecho 2012-09-20, 18:17
+
Chalcy 2012-09-20, 18:23
Copy link to this message
-
Re: Dropping embedded newlines for csv
Well, I have to admit that it's quite confusing especially when we have this command line parameter mentioned in user guide only in HIVE section.

I would probably prefer not to change the parameter name currently as it would break backward compatibility. Instead we might consider adjusting our User Guide. Please file a JIRA if you feel so as well.

Jarcec

On Thu, Sep 20, 2012 at 02:23:35PM -0400, Chalcy wrote:
> I got that, Jarcec.  If the parameter does not need hive, then why call
> this as --hive-import-drop-delims.  Instead can be called,
> --import-drop-delims, right?
>
> hive-import in the name causes confusion :)  that was my point.
>
> Sorry I did not spell your name right, Jarcec.
>
> --Chalcy
> On Thu, Sep 20, 2012 at 2:17 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:
>
> > Hi Chalcy,
> > I'm glad that you're enjoying sqoop a lot :-)
> >
> > I'm sorry for the confusion I've mistakenly caused. Name of the parameter
> > is --hive-import-drop-delims in all cases. What I meant is that this
> > argument can be used independently on argument --hive-import. So that you
> > can drop HIVE delimiters (\n, \r, \0) and still be importing data directly
> > into HDFS without any other HIVE interaction - I believe that you even do
> > not need HIVE installation for doing so at all. Hope that this helps to
> > clarify the confusion a bit.
> >
> > Jarcec
> >
> > On Thu, Sep 20, 2012 at 02:07:16PM -0400, Chalcy wrote:
> > > Hi Jarec,
> > >
> > > I did not know that hive-import-drop-delims works wihout hive-import.  In
> > > that case, do we want to call this parameter as just --drop-import-delims
> > > instead of hive-drop-import-delims?
> > >
> > > Thanks,
> > > Chalcy
> > >
> > > On Thu, Sep 20, 2012 at 2:04 PM, Chalcy <[EMAIL PROTECTED]> wrote:
> > >
> > > > I use the hive-drop-import-delims for hive import and that was the
> > problem
> > > > I had to solve a year ago.  Since you want the data in hdfs, you can
> > do a
> > > > workaround, like do hive import and use the underlying hdfs, like
> > > > /user/hive/warehouse/mynewlineremoveddata.
> > > >
> > > > Sqoop is a great tool.  Using sqoop for all database imports.
> > > >
> > > > Thanks,
> > > > Chalcy
> > > >
> > > >
> > > > On Thu, Sep 20, 2012 at 1:55 PM, David Kincaid <[EMAIL PROTECTED]
> > >wrote:
> > > >
> > > >> I'm brand new to Sqoop and am working on importing data from an
> > Oracle database
> > > >> into HDFS. It is going to solve a number of problems I've been trying
> > to
> > > >> solve, so I'm really excited about it. I have it working great right
> > now
> > > >> except for one thing. One of the columns in one of that tables has
> > > >> newline characters in it. I'm importing to comma delimited files and
> > > >> need to strip off those embedded newline characters since the tool I'm
> > > >> reading the .csv files with isn't handling those well.
> > > >>
> > > >> I saw the option --hive-drop-import-delims which is exactly what I
> > want,
> > > >> but I assume that only works when importing to Hive. How have others
> > > >> solved this problem?
> > > >>
> > > >> Thanks,
> > > >> Dave
> > > >>
> > > >
> > > >
> >
+
David Kincaid 2012-09-20, 20:24