|
|
-
Sqoop import Format Issue
Vineet Mishra 2013-01-18, 11:20
Hallo All,
I am working with Sqoop to query from the database to for my Map/Reduce job, the issue I am struggling with is
1) Is there a way to directly pass the Sqoop Import Result to the Map for processing rather than storing it to file system and then processing.
If above can't be done then let me know whether,
2) I want the format of the file which is coming from the sqoop import command to be comma(,) separated single line file, but my output is coming as a multi row file. I am querying from the Database fetching a single column of some random values, which is coming in the format,
1234 1235 1564 1674 1546 2546 5653 6434
the output which I was expecting was,
1234,1235,1564,1674,1546,2546,5653,6434
and so on. . .
Urgent!!! Sqoop Developers/Users please reply! -- Thanks and Regards Vineet Mishra
+
Vineet Mishra 2013-01-18, 11:20
-
Re: Sqoop import Format Issue
Vineet Mishra 2013-01-21, 11:20
Hallo All,
I am working with Sqoop for writing data back to Database from HDFS, I am currently stuck in an issue, the scenerio is like,
I have a Postgres database with a table TestSqoop consisting of three fields, *id integer SERIAL, volume numeric(8,2), zip integer* where id is the autoincrement field, I am having a data file in HDFS where I want to insert only last two rows(except id which is already auto incremented) so if my data file contains the data in this format
1,28.00,10005 2,403.71,90210 3,20.02,95014
where I am specifying the value even for the id which I don't want as it should be auto incremented. Kindly let me know if there is a way round so that I can neglect the id attribute value in my data file so that it should auto increment in Database. Urgent!!! On Fri, Jan 18, 2013 at 4:50 PM, Vineet Mishra <[EMAIL PROTECTED]>wrote:
> Hallo All, > > I am working with Sqoop to query from the database to for my Map/Reduce > job, the issue I am struggling with is > > 1) Is there a way to directly pass the Sqoop Import Result to the Map for > processing rather than storing it to file system and then processing. > > If above can't be done then let me know whether, > > 2) I want the format of the file which is coming from the sqoop import > command to be comma(,) separated single line file, but my output is coming > as a multi row file. > > > I am querying from the Database fetching a single column of some random > values, which is coming in the format, > > 1234 > 1235 > 1564 > 1674 > 1546 > 2546 > 5653 > 6434 > > the output which I was expecting was, > > 1234,1235,1564,1674,1546,2546,5653,6434 > > and so on. . . > > Urgent!!! > Sqoop Developers/Users please reply! > > > -- > Thanks and Regards > Vineet Mishra >
-- Thanks and Regards Vineet Mishra
+
Vineet Mishra 2013-01-21, 11:20
-
Re: Sqoop import Format Issue
Jarek Jarcec Cecho 2013-01-22, 00:49
Hi Vineet, I'm afraid that Sqoop is currently not supporting such use case.
Jarcec
On Mon, Jan 21, 2013 at 04:50:54PM +0530, Vineet Mishra wrote: > Hallo All, > > I am working with Sqoop for writing data back to Database from HDFS, I am > currently stuck in an issue, the scenerio is like, > > I have a Postgres database with a table TestSqoop consisting of three > fields, > *id integer SERIAL, volume numeric(8,2), zip integer* > > > where id is the autoincrement field, I am having a data file in HDFS where > I want to insert only last two rows(except id which is already auto > incremented) so if my data file contains the data in this format > > 1,28.00,10005 > 2,403.71,90210 > 3,20.02,95014 > > where I am specifying the value even for the id which I don't want as it > should be auto incremented. > Kindly let me know if there is a way round so that I can neglect the id > attribute value in my data file so that it should auto increment in > Database. > > > Urgent!!! > > > On Fri, Jan 18, 2013 at 4:50 PM, Vineet Mishra > <[EMAIL PROTECTED]>wrote: > > > Hallo All, > > > > I am working with Sqoop to query from the database to for my Map/Reduce > > job, the issue I am struggling with is > > > > 1) Is there a way to directly pass the Sqoop Import Result to the Map for > > processing rather than storing it to file system and then processing. > > > > If above can't be done then let me know whether, > > > > 2) I want the format of the file which is coming from the sqoop import > > command to be comma(,) separated single line file, but my output is coming > > as a multi row file. > > > > > > I am querying from the Database fetching a single column of some random > > values, which is coming in the format, > > > > 1234 > > 1235 > > 1564 > > 1674 > > 1546 > > 2546 > > 5653 > > 6434 > > > > the output which I was expecting was, > > > > 1234,1235,1564,1674,1546,2546,5653,6434 > > > > and so on. . . > > > > Urgent!!! > > Sqoop Developers/Users please reply! > > > > > > -- > > Thanks and Regards > > Vineet Mishra > > > > > > -- > Thanks and Regards > Vineet Mishra
+
Jarek Jarcec Cecho 2013-01-22, 00:49
-
Re: Sqoop import Format Issue
Vineet Mishra 2013-01-21, 07:31
Hallo Everyone,
I am working with the *Sqoop Export* command and while I was trying to export the data from HDFS to PostgreSQL Database using the command, * sudo -u hdfs sqoop export --connect jdbc:postgresql://localhost/mydatabase --username vineet -P -m 1 --table testsqoop --export-dir /user/hdfs/Sqoop/Sqooptest --input-fields-terminated-by '\0001'*
I was getting following error,
13/01/21 08:18:23 INFO manager.SqlManager: Using default fetchSize of 1000 13/01/21 08:18:23 INFO tool.CodeGenTool: Beginning code generation 13/01/21 08:18:23 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "testsqoop" AS t LIMIT 1 13/01/21 08:18:23 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop/libexec/.. 13/01/21 08:18:23 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop/libexec/../hadoop-core.jar Note: /tmp/sqoop-hdfs/compile/32830bc954c996cde42e46f6a8599883/sqooptest.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 13/01/21 08:18:23 ERROR orm.CompilationManager: Could not make directory: /home/serendio/. 13/01/21 08:18:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/32830bc954c996cde42e46f6a8599883/sqooptest.jar 13/01/21 08:18:23 INFO mapreduce.ExportJobBase: Beginning export of testsqoop 13/01/21 08:18:25 INFO input.FileInputFormat: Total input paths to process : 1 13/01/21 08:18:25 INFO input.FileInputFormat: Total input paths to process : 1 13/01/21 08:18:25 INFO mapred.JobClient: Running job: job_201301150939_0042 13/01/21 08:18:26 INFO mapred.JobClient: map 0% reduce 0% 13/01/21 08:18:36 INFO mapred.JobClient: Task Id : attempt_201301150939_0042_m_000000_0, Status : FAILED java.lang.NumberFormatException at java.math.BigDecimal.<init>(BigDecimal.java:459) at java.math.BigDecimal.<init>(BigDecimal.java:728) at sqooptest.__loadFromFields(sqooptest.java:191) at sqooptest.parse(sqooptest.java:143) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77) at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122) at org.apache.hadoop.mapred.Child.main(Child.java:249) I even googled but could not find any relevant solution, has anybody worked on this issue?
Thanks in advance!
On Fri, Jan 18, 2013 at 4:50 PM, Vineet Mishra <[EMAIL PROTECTED]>wrote:
> Hallo All, > > I am working with Sqoop to query from the database to for my Map/Reduce > job, the issue I am struggling with is > > 1) Is there a way to directly pass the Sqoop Import Result to the Map for > processing rather than storing it to file system and then processing. > > If above can't be done then let me know whether, > > 2) I want the format of the file which is coming from the sqoop import > command to be comma(,) separated single line file, but my output is coming > as a multi row file. > > > I am querying from the Database fetching a single column of some random > values, which is coming in the format, > > 1234 > 1235 > 1564 > 1674 > 1546 > 2546 > 5653 > 6434 > > the output which I was expecting was, > > 1234,1235,1564,1674,1546,2546,5653,6434 > > and so on. . . > > Urgent!!! > Sqoop Developers/Users please reply! > > > -- > Thanks and Regards > Vineet Mishra >
-- Thanks and Regards Vineet Mishra
+
Vineet Mishra 2013-01-21, 07:31
-
Re: Sqoop import Format Issue
abhijeet gaikwad 2013-01-21, 08:38
Seems like a data type mismatch. E.g. Trying to insert char data in numeric column!
It could also mean that you are specifying wrong delim.
Can you share 2 sample rows from data on hdfs and table schema?
Thanks, Abhijeet On Jan 21, 2013 1:02 PM, "Vineet Mishra" <[EMAIL PROTECTED]> wrote:
> Hallo Everyone, > > I am working with the *Sqoop Export* command and while I was trying to > export the data from HDFS to PostgreSQL Database using the command, > * > sudo -u hdfs sqoop export --connect jdbc:postgresql://localhost/mydatabase > --username vineet -P -m 1 --table testsqoop --export-dir > /user/hdfs/Sqoop/Sqooptest --input-fields-terminated-by '\0001'* > > I was getting following error, > > 13/01/21 08:18:23 INFO manager.SqlManager: Using default fetchSize of 1000 > 13/01/21 08:18:23 INFO tool.CodeGenTool: Beginning code generation > 13/01/21 08:18:23 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM "testsqoop" AS t LIMIT 1 > 13/01/21 08:18:23 INFO orm.CompilationManager: HADOOP_HOME is > /usr/lib/hadoop/libexec/.. > 13/01/21 08:18:23 INFO orm.CompilationManager: Found hadoop core jar at: > /usr/lib/hadoop/libexec/../hadoop-core.jar > Note: > /tmp/sqoop-hdfs/compile/32830bc954c996cde42e46f6a8599883/sqooptest.java > uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. > 13/01/21 08:18:23 ERROR orm.CompilationManager: Could not make directory: > /home/serendio/. > 13/01/21 08:18:23 INFO orm.CompilationManager: Writing jar file: > /tmp/sqoop-hdfs/compile/32830bc954c996cde42e46f6a8599883/sqooptest.jar > 13/01/21 08:18:23 INFO mapreduce.ExportJobBase: Beginning export of > testsqoop > 13/01/21 08:18:25 INFO input.FileInputFormat: Total input paths to process > : 1 > 13/01/21 08:18:25 INFO input.FileInputFormat: Total input paths to process > : 1 > 13/01/21 08:18:25 INFO mapred.JobClient: Running job: job_201301150939_0042 > 13/01/21 08:18:26 INFO mapred.JobClient: map 0% reduce 0% > 13/01/21 08:18:36 INFO mapred.JobClient: Task Id : > attempt_201301150939_0042_m_000000_0, Status : FAILED > java.lang.NumberFormatException > at java.math.BigDecimal.<init>(BigDecimal.java:459) > at java.math.BigDecimal.<init>(BigDecimal.java:728) > at sqooptest.__loadFromFields(sqooptest.java:191) > at sqooptest.parse(sqooptest.java:143) > at > org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77) > at > org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at > org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > > I even googled but could not find any relevant solution, has anybody > worked on this issue? > > Thanks in advance! > > > > On Fri, Jan 18, 2013 at 4:50 PM, Vineet Mishra <[EMAIL PROTECTED] > > wrote: > >> Hallo All, >> >> I am working with Sqoop to query from the database to for my Map/Reduce >> job, the issue I am struggling with is >> >> 1) Is there a way to directly pass the Sqoop Import Result to the Map for >> processing rather than storing it to file system and then processing. >> >> If above can't be done then let me know whether, >> >> 2) I want the format of the file which is coming from the sqoop import >> command to be comma(,) separated single line file, but my output is coming >> as a multi row file. >> >> >> I am querying from the Database fetching a single column of some random >> values, which is coming in the format,
+
abhijeet gaikwad 2013-01-21, 08:38
-
Re: Sqoop import Format Issue
Vineet Mishra 2013-01-21, 08:43
On Mon, Jan 21, 2013 at 2:08 PM, abhijeet gaikwad <[EMAIL PROTECTED]>wrote:
> and table sche Ya sure, here is the data rows,
28.00,10005,\0001 403.71,90210,\0001
I even tried with single inverted values, but not positivity.
and this is the Table Schema,
* Column | Type | Modifiers | Storage | Description --------+--------------+-----------+---------+------------- volume | numeric(8,2) | | main | zip | integer | | plain | * -- Thanks and Regards Vineet Mishra
+
Vineet Mishra 2013-01-21, 08:43
-
Re: Sqoop import Format Issue
abhijeet gaikwad 2013-01-21, 11:53
Your fields are terminated by comma. Your command directs otherwise - '\0000'.
Thanks, Abhijeet On Jan 21, 2013 2:13 PM, "Vineet Mishra" <[EMAIL PROTECTED]> wrote:
> > On Mon, Jan 21, 2013 at 2:08 PM, abhijeet gaikwad <[EMAIL PROTECTED]>wrote: > >> and table sche > > > Ya sure, > here is the data rows, > > 28.00,10005,\0001 > 403.71,90210,\0001 > > I even tried with single inverted values, but not positivity. > > and this is the Table Schema, > > * Column | Type | Modifiers | Storage | Description > --------+--------------+-----------+---------+------------- > volume | numeric(8,2) | | main | > zip | integer | | plain | > * > > > -- > Thanks and Regards > Vineet Mishra >
+
abhijeet gaikwad 2013-01-21, 11:53
-
Re: Sqoop import Format Issue
Vineet Mishra 2013-01-21, 12:01
Ya Abhijeet, I removed '\0001' so its working fine but still the issue is database id field as till I pass the *id* parameter value in the data file it works well but I can't hold of the id value and unlike Mysql, Postgres don't even support null values which can be passed for the auto increment value. On Mon, Jan 21, 2013 at 5:23 PM, abhijeet gaikwad <[EMAIL PROTECTED]>wrote:
> Your fields are terminated by comma. Your command directs otherwise - > '\0000'. > > Thanks, > Abhijeet > On Jan 21, 2013 2:13 PM, "Vineet Mishra" <[EMAIL PROTECTED]> > wrote: > >> >> On Mon, Jan 21, 2013 at 2:08 PM, abhijeet gaikwad <[EMAIL PROTECTED] >> > wrote: >> >>> and table sche >> >> >> Ya sure, >> here is the data rows, >> >> 28.00,10005,\0001 >> 403.71,90210,\0001 >> >> I even tried with single inverted values, but not positivity. >> >> and this is the Table Schema, >> >> * Column | Type | Modifiers | Storage | Description >> --------+--------------+-----------+---------+------------- >> volume | numeric(8,2) | | main | >> zip | integer | | plain | >> * >> >> >> -- >> Thanks and Regards >> Vineet Mishra >> > -- Thanks and Regards Vineet Mishra
+
Vineet Mishra 2013-01-21, 12:01
-
Re: Sqoop import Format Issue
Vineet Mishra 2013-01-18, 12:31
Hallo All,
Can we customize the *--split-by* option in Sqoop import statement so that the output would be single file with comma delimited and having a fixed maximum number of values in each rows, like for fetching a single Column from the DB the output should be like,
1234,1235,1564,1674 1546,2546,5653,6434 6543,6534,7763,6567
so it will be single file that can be done with option *-m 1* and each row will have a maximum no. of values(comma separated), i.e in this case its 4.
Kindly let me if any one of you has worked this case. . .
Thanks in Advance! On Fri, Jan 18, 2013 at 4:50 PM, Vineet Mishra <[EMAIL PROTECTED]>wrote:
> Hallo All, > > I am working with Sqoop to query from the database to for my Map/Reduce > job, the issue I am struggling with is > > 1) Is there a way to directly pass the Sqoop Import Result to the Map for > processing rather than storing it to file system and then processing. > > If above can't be done then let me know whether, > > 2) I want the format of the file which is coming from the sqoop import > command to be comma(,) separated single line file, but my output is coming > as a multi row file. > > > I am querying from the Database fetching a single column of some random > values, which is coming in the format, > > 1234 > 1235 > 1564 > 1674 > 1546 > 2546 > 5653 > 6434 > > the output which I was expecting was, > > 1234,1235,1564,1674,1546,2546,5653,6434 > > and so on. . . > > Urgent!!! > Sqoop Developers/Users please reply! > > > -- > Thanks and Regards > Vineet Mishra >
-- Thanks and Regards Vineet Mishra
+
Vineet Mishra 2013-01-18, 12:31
-
Re: Sqoop import Format Issue
abhijeet gaikwad 2013-01-18, 12:41
This cannot be achieved with Sqoop.
Thanks, Abhijeet On Jan 18, 2013 6:01 PM, "Vineet Mishra" <[EMAIL PROTECTED]> wrote:
> > Hallo All, > > Can we customize the *--split-by* option in Sqoop import statement so > that the output would be single file with comma delimited and having a > fixed maximum number of values in each rows, like for fetching a single > Column from the DB the output should be like, > > 1234,1235,1564,1674 > 1546,2546,5653,6434 > 6543,6534,7763,6567 > > so it will be single file that can be done with option *-m 1* and each > row will have a maximum no. of values(comma separated), i.e in this case > its 4. > > Kindly let me if any one of you has worked this case. . . > > Thanks in Advance! > > > On Fri, Jan 18, 2013 at 4:50 PM, Vineet Mishra <[EMAIL PROTECTED] > > wrote: > >> Hallo All, >> >> I am working with Sqoop to query from the database to for my Map/Reduce >> job, the issue I am struggling with is >> >> 1) Is there a way to directly pass the Sqoop Import Result to the Map for >> processing rather than storing it to file system and then processing. >> >> If above can't be done then let me know whether, >> >> 2) I want the format of the file which is coming from the sqoop import >> command to be comma(,) separated single line file, but my output is coming >> as a multi row file. >> >> >> I am querying from the Database fetching a single column of some random >> values, which is coming in the format, >> >> 1234 >> 1235 >> 1564 >> 1674 >> 1546 >> 2546 >> 5653 >> 6434 >> >> the output which I was expecting was, >> >> 1234,1235,1564,1674,1546,2546,5653,6434 >> >> and so on. . . >> >> Urgent!!! >> Sqoop Developers/Users please reply! >> >> >> -- >> Thanks and Regards >> Vineet Mishra >> > > > > -- > Thanks and Regards > Vineet Mishra >
+
abhijeet gaikwad 2013-01-18, 12:41
-
Re: Sqoop import Format Issue
abhijeet gaikwad 2013-01-18, 11:50
Sqoop does not facilitate what you seek in #1. You will have to build an ad-hoc system for that! For #2 if you have only one column in your table, then add this option to your sqoop command: --lines-terminated-by ','
Thanks, Abhijeet On Jan 18, 2013 4:51 PM, "Vineet Mishra" <[EMAIL PROTECTED]> wrote:
> Hallo All, > > I am working with Sqoop to query from the database to for my Map/Reduce > job, the issue I am struggling with is > > 1) Is there a way to directly pass the Sqoop Import Result to the Map for > processing rather than storing it to file system and then processing. > > If above can't be done then let me know whether, > > 2) I want the format of the file which is coming from the sqoop import > command to be comma(,) separated single line file, but my output is coming > as a multi row file. > > > I am querying from the Database fetching a single column of some random > values, which is coming in the format, > > 1234 > 1235 > 1564 > 1674 > 1546 > 2546 > 5653 > 6434 > > the output which I was expecting was, > > 1234,1235,1564,1674,1546,2546,5653,6434 > > and so on. . . > > Urgent!!! > Sqoop Developers/Users please reply! > > > -- > Thanks and Regards > Vineet Mishra >
+
abhijeet gaikwad 2013-01-18, 11:50
-
Re: Sqoop import Format Issue
Vineet Mishra 2013-01-18, 12:06
Thanks Abhijeet for your response.
Let me know whether the Sqoop option that you have told me, *--lines-termination-by ','* can be implemented to query the same row in the same fashion only with a minute change that the each row should have maximum 4 values, the next succeeding values should come from the next line and so on, like
1234,1235,1564,1674 1546,2546,5653,6434 6543,6534,7763,6567
and so on. . .
On Fri, Jan 18, 2013 at 5:20 PM, abhijeet gaikwad <[EMAIL PROTECTED]>wrote:
> Sqoop does not facilitate what you seek in #1. You will have to build an > ad-hoc system for that! > For #2 if you have only one column in your table, then add this option to > your sqoop command: > --lines-terminated-by ',' > > Thanks, > Abhijeet > On Jan 18, 2013 4:51 PM, "Vineet Mishra" <[EMAIL PROTECTED]> > wrote: > >> Hallo All, >> >> I am working with Sqoop to query from the database to for my Map/Reduce >> job, the issue I am struggling with is >> >> 1) Is there a way to directly pass the Sqoop Import Result to the Map for >> processing rather than storing it to file system and then processing. >> >> If above can't be done then let me know whether, >> >> 2) I want the format of the file which is coming from the sqoop import >> command to be comma(,) separated single line file, but my output is coming >> as a multi row file. >> >> >> I am querying from the Database fetching a single column of some random >> values, which is coming in the format, >> >> 1234 >> 1235 >> 1564 >> 1674 >> 1546 >> 2546 >> 5653 >> 6434 >> >> the output which I was expecting was, >> >> 1234,1235,1564,1674,1546,2546,5653,6434 >> >> and so on. . . >> >> Urgent!!! >> Sqoop Developers/Users please reply! >> >> >> -- >> Thanks and Regards >> Vineet Mishra >> > -- Thanks and Regards Vineet Mishra
+
Vineet Mishra 2013-01-18, 12:06
|
|