Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> sqoop merge question


Copy link to this message
-
Re: sqoop merge question
Oh got it.  Thanks! Jarec.  I'll try and let you know if I get it working.

On Tue, Oct 23, 2012 at 12:58 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Chalcy,
> let me try :-)
>
> Merge is taking two directories on HDFS and updating logical rows in those
> files. Those files are most likely in form of CSV files without any
> additional metadata. However sqoop needs those metadata - for example how
> many columns are there? What are the column names and data types? What are
> the delimiters? Normally, such information is retrieved from database,
> however in merge case, there is no connection to database (as you correctly
> guessed). And therefore you need to supply previously generated class.
>
> Does that help in understanding the issue you're facing?
>
> Jarcec
>
> On Tue, Oct 23, 2012 at 12:00:55PM -0400, Chalcy wrote:
> > Hi Jarec,
> >
> > If we are merging two hdfs data, I do not understand why we would need
> > database connection. Could you explain?
> >
> > Thanks,
> > Chalcy
> >
> >
> > On Tue, Oct 23, 2012 at 10:59 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]
> >wrote:
> >
> > > Hi Chalcy,
> > > Sqoop needs to be able to parse the files you're trying to merge as
> newer
> > > entries must be updated. Usually Sqoop generate special class for this
> > > purpose based on connection in use, however in merge case there is no
> > > connection to the database and therefore you need to specify such class
> > > manually. This class is automatically generated for you in case of an
> > > import tool and might be manually generated using codegen tool [1]. You
> > > might get additional information about those two arguments in merge
> tool in
> > > our user guide [2].
> > >
> > > Jarcec
> > >
> > > Links:
> > > 1:
> > >
> http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal
> > > 2:
> > >
> http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_merge_literal
> > >
> > > On Tue, Oct 23, 2012 at 09:41:07AM -0400, Chalcy wrote:
> > > > Hello Sqoop users,
> > > >
> > > > I tried to use sqoop merge and understand all the parameters except
> > > > --class-name and --jar-file.  What should that be?  Sqoop errors out
> if I
> > > > do not specify them.
> > > >
> > > > The command I am using is
> > > > sqoop merge --new-data user/hadoop/testincrement --onto
> > > > /user/hadoop/exisitngdata --target-dir /user/hadoop/mergeddir
> --merge-key
> > > > rowid
> > > >
> > > > Thanks,
> > > > Chalcy
> > >
>