|
Chalcy
2012-10-23, 13:41
Jarek Jarcec Cecho
2012-10-23, 14:59
Chalcy
2012-10-23, 16:00
Jarek Jarcec Cecho
2012-10-23, 16:58
Chalcy
2012-10-23, 18:22
|
-
sqoop merge questionChalcy 2012-10-23, 13:41
Hello Sqoop users,
I tried to use sqoop merge and understand all the parameters except --class-name and --jar-file. What should that be? Sqoop errors out if I do not specify them. The command I am using is sqoop merge --new-data user/hadoop/testincrement --onto /user/hadoop/exisitngdata --target-dir /user/hadoop/mergeddir --merge-key rowid Thanks, Chalcy
-
Re: sqoop merge questionJarek Jarcec Cecho 2012-10-23, 14:59
Hi Chalcy,
Sqoop needs to be able to parse the files you're trying to merge as newer entries must be updated. Usually Sqoop generate special class for this purpose based on connection in use, however in merge case there is no connection to the database and therefore you need to specify such class manually. This class is automatically generated for you in case of an import tool and might be manually generated using codegen tool [1]. You might get additional information about those two arguments in merge tool in our user guide [2]. Jarcec Links: 1: http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal 2: http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_merge_literal On Tue, Oct 23, 2012 at 09:41:07AM -0400, Chalcy wrote: > Hello Sqoop users, > > I tried to use sqoop merge and understand all the parameters except > --class-name and --jar-file. What should that be? Sqoop errors out if I > do not specify them. > > The command I am using is > sqoop merge --new-data user/hadoop/testincrement --onto > /user/hadoop/exisitngdata --target-dir /user/hadoop/mergeddir --merge-key > rowid > > Thanks, > Chalcy
-
Re: sqoop merge questionChalcy 2012-10-23, 16:00
Hi Jarec,
If we are merging two hdfs data, I do not understand why we would need database connection. Could you explain? Thanks, Chalcy On Tue, Oct 23, 2012 at 10:59 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi Chalcy, > Sqoop needs to be able to parse the files you're trying to merge as newer > entries must be updated. Usually Sqoop generate special class for this > purpose based on connection in use, however in merge case there is no > connection to the database and therefore you need to specify such class > manually. This class is automatically generated for you in case of an > import tool and might be manually generated using codegen tool [1]. You > might get additional information about those two arguments in merge tool in > our user guide [2]. > > Jarcec > > Links: > 1: > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal > 2: > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_merge_literal > > On Tue, Oct 23, 2012 at 09:41:07AM -0400, Chalcy wrote: > > Hello Sqoop users, > > > > I tried to use sqoop merge and understand all the parameters except > > --class-name and --jar-file. What should that be? Sqoop errors out if I > > do not specify them. > > > > The command I am using is > > sqoop merge --new-data user/hadoop/testincrement --onto > > /user/hadoop/exisitngdata --target-dir /user/hadoop/mergeddir --merge-key > > rowid > > > > Thanks, > > Chalcy >
-
Re: sqoop merge questionJarek Jarcec Cecho 2012-10-23, 16:58
Hi Chalcy,
let me try :-) Merge is taking two directories on HDFS and updating logical rows in those files. Those files are most likely in form of CSV files without any additional metadata. However sqoop needs those metadata - for example how many columns are there? What are the column names and data types? What are the delimiters? Normally, such information is retrieved from database, however in merge case, there is no connection to database (as you correctly guessed). And therefore you need to supply previously generated class. Does that help in understanding the issue you're facing? Jarcec On Tue, Oct 23, 2012 at 12:00:55PM -0400, Chalcy wrote: > Hi Jarec, > > If we are merging two hdfs data, I do not understand why we would need > database connection. Could you explain? > > Thanks, > Chalcy > > > On Tue, Oct 23, 2012 at 10:59 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi Chalcy, > > Sqoop needs to be able to parse the files you're trying to merge as newer > > entries must be updated. Usually Sqoop generate special class for this > > purpose based on connection in use, however in merge case there is no > > connection to the database and therefore you need to specify such class > > manually. This class is automatically generated for you in case of an > > import tool and might be manually generated using codegen tool [1]. You > > might get additional information about those two arguments in merge tool in > > our user guide [2]. > > > > Jarcec > > > > Links: > > 1: > > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal > > 2: > > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_merge_literal > > > > On Tue, Oct 23, 2012 at 09:41:07AM -0400, Chalcy wrote: > > > Hello Sqoop users, > > > > > > I tried to use sqoop merge and understand all the parameters except > > > --class-name and --jar-file. What should that be? Sqoop errors out if I > > > do not specify them. > > > > > > The command I am using is > > > sqoop merge --new-data user/hadoop/testincrement --onto > > > /user/hadoop/exisitngdata --target-dir /user/hadoop/mergeddir --merge-key > > > rowid > > > > > > Thanks, > > > Chalcy > >
-
Re: sqoop merge questionChalcy 2012-10-23, 18:22
Oh got it. Thanks! Jarec. I'll try and let you know if I get it working.
On Tue, Oct 23, 2012 at 12:58 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi Chalcy, > let me try :-) > > Merge is taking two directories on HDFS and updating logical rows in those > files. Those files are most likely in form of CSV files without any > additional metadata. However sqoop needs those metadata - for example how > many columns are there? What are the column names and data types? What are > the delimiters? Normally, such information is retrieved from database, > however in merge case, there is no connection to database (as you correctly > guessed). And therefore you need to supply previously generated class. > > Does that help in understanding the issue you're facing? > > Jarcec > > On Tue, Oct 23, 2012 at 12:00:55PM -0400, Chalcy wrote: > > Hi Jarec, > > > > If we are merging two hdfs data, I do not understand why we would need > > database connection. Could you explain? > > > > Thanks, > > Chalcy > > > > > > On Tue, Oct 23, 2012 at 10:59 AM, Jarek Jarcec Cecho <[EMAIL PROTECTED] > >wrote: > > > > > Hi Chalcy, > > > Sqoop needs to be able to parse the files you're trying to merge as > newer > > > entries must be updated. Usually Sqoop generate special class for this > > > purpose based on connection in use, however in merge case there is no > > > connection to the database and therefore you need to specify such class > > > manually. This class is automatically generated for you in case of an > > > import tool and might be manually generated using codegen tool [1]. You > > > might get additional information about those two arguments in merge > tool in > > > our user guide [2]. > > > > > > Jarcec > > > > > > Links: > > > 1: > > > > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal > > > 2: > > > > http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_merge_literal > > > > > > On Tue, Oct 23, 2012 at 09:41:07AM -0400, Chalcy wrote: > > > > Hello Sqoop users, > > > > > > > > I tried to use sqoop merge and understand all the parameters except > > > > --class-name and --jar-file. What should that be? Sqoop errors out > if I > > > > do not specify them. > > > > > > > > The command I am using is > > > > sqoop merge --new-data user/hadoop/testincrement --onto > > > > /user/hadoop/exisitngdata --target-dir /user/hadoop/mergeddir > --merge-key > > > > rowid > > > > > > > > Thanks, > > > > Chalcy > > > > |