|
Santosh Achhra
2013-01-07, 11:34
Jarek Jarcec Cecho
2013-01-07, 12:31
Santosh Achhra
2013-01-08, 05:21
Jarek Jarcec Cecho
2013-01-08, 07:36
Santosh Achhra
2013-01-08, 08:01
Jarek Jarcec Cecho
2013-01-08, 11:09
Santosh Achhra
2013-01-08, 14:33
Jarek Jarcec Cecho
2013-01-08, 15:43
Santosh Achhra
2013-01-08, 16:09
Jarek Jarcec Cecho
2013-01-09, 10:01
Santosh Achhra
2013-01-09, 11:45
Jarek Jarcec Cecho
2013-01-10, 09:40
Santosh Achhra
2013-01-10, 12:47
|
-
Creating compressed data with scooopSantosh Achhra 2013-01-07, 11:34
Hello,
I am trying to import data from table, and I would like final data to be compressed on HDFS which would help save some space. I am executing below mentioned command. This command completes successfully and I dont see any error reported however when see the final data using hadoop ls command is not in compressed format in HDFS sqoop --options-file /export/home/sqoop/connect.parm --table TEST --split-by F1 --compression-codec org.apache.hadoop.io.compress.SnappyCodec -z Am I missing something ? Also I would like to know if I can import data into hive in compressed format. I executed below mentioned command, in this case to data into HDFS is not in compressed format and describe table command in hive says that table is not compressed sqoop --options-file /export/home/sqoop/connect.parm --table TEST --split-by F1 --hive-import -m 1 --compress --compression-codec org.apache.hadoop.io.compress.GzipCodec Good wishes,always ! Santosh +
Santosh Achhra 2013-01-07, 11:34
-
Re: Creating compressed data with scooopJarek Jarcec Cecho 2013-01-07, 12:31
Hi Santosh,
Sqoop is delegating compression/decompression to mapreduce framework. Thus Sqoop options might be overridden by your Mapreduce configuration (for example by setting that mapreduce output can't be compressed). Would you mind sharing with us your: * Hadoop version * Sqoop version * TaskTracker configuration file (mapred-site.xml) * Job XML (~ configuration file) for job generated by Sqoop I'm particularly looking for: * io.compression.codecs - Must contain codes you're using with Sqoop * mapred.compress.map.output - Must be set to true on Job XML level and must not be set to final false in TaskTracker configuration Jarcec On Mon, Jan 07, 2013 at 07:34:35PM +0800, Santosh Achhra wrote: > Hello, > > I am trying to import data from table, and I would like final data to be > compressed on HDFS which would help save some space. > > I am executing below mentioned command. > This command completes successfully and I dont see any error reported > however when see the final data using hadoop ls command is not in > compressed format in HDFS > > sqoop --options-file /export/home/sqoop/connect.parm --table TEST > --split-by F1 --compression-codec > org.apache.hadoop.io.compress.SnappyCodec -z > > Am I missing something ? > > Also I would like to know if I can import data into hive in compressed > format. I executed below mentioned command, in this case to data into HDFS > is not in compressed format and describe table command in hive says that > table is not compressed > > sqoop --options-file /export/home/sqoop/connect.parm --table TEST > --split-by F1 --hive-import -m 1 --compress --compression-codec > org.apache.hadoop.io.compress.GzipCodec > > Good wishes,always ! > Santosh +
Jarek Jarcec Cecho 2013-01-07, 12:31
-
Re: Creating compressed data with scooopSantosh Achhra 2013-01-08, 05:21
Thank you Jarcec.
Here are the details *Hadoop Version:* Hadoop 2.0.0-cdh4.1.0 Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.0/src/hadoop-common-project/hadoop-common -r 5c0a0bddbc2aaff30a8624b5980cd4a2e1b68d18 Compiled by jenkins on Sat Sep 29 11:26:20 PDT 2012 >From source with checksum 95f5c7f30b4030f1f327758e7b2bd61f *Sqoop Version* Sqoop 1.4.1-cdh4.1.0 git commit id 10df2d6359a84f8877d63134b867a2ee718a2ca9 Compiled by jenkins on Sat Sep 29 12:11:42 PDT 2012 *Task Tracker Config file* <name>mapred.output.compress</name> <value>false</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.DefaultCodec</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> <property> * <name>mapred.compress.map.output</name>* * <value>true</value>* I was not able to locate Job XML file (configuration file). Could you please let me know where look for it ? Good wishes,always ! Santosh On Mon, Jan 7, 2013 at 8:31 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi Santosh, > Sqoop is delegating compression/decompression to mapreduce framework. Thus > Sqoop options might be overridden by your Mapreduce configuration (for > example by setting that mapreduce output can't be compressed). > > Would you mind sharing with us your: > > * Hadoop version > * Sqoop version > * TaskTracker configuration file (mapred-site.xml) > * Job XML (~ configuration file) for job generated by Sqoop > > I'm particularly looking for: > > * io.compression.codecs - Must contain codes you're using with Sqoop > * mapred.compress.map.output - Must be set to true on Job XML level and > must not be set to final false in TaskTracker configuration > > Jarcec > > On Mon, Jan 07, 2013 at 07:34:35PM +0800, Santosh Achhra wrote: > > Hello, > > > > I am trying to import data from table, and I would like final data to be > > compressed on HDFS which would help save some space. > > > > I am executing below mentioned command. > > This command completes successfully and I dont see any error reported > > however when see the final data using hadoop ls command is not in > > compressed format in HDFS > > > > sqoop --options-file /export/home/sqoop/connect.parm --table TEST > > --split-by F1 --compression-codec > > org.apache.hadoop.io.compress.SnappyCodec -z > > > > Am I missing something ? > > > > Also I would like to know if I can import data into hive in compressed > > format. I executed below mentioned command, in this case to data into > HDFS > > is not in compressed format and describe table command in hive says that > > table is not compressed > > > > sqoop --options-file /export/home/sqoop/connect.parm --table TEST > > --split-by F1 --hive-import -m 1 --compress --compression-codec > > org.apache.hadoop.io.compress.GzipCodec > > > > Good wishes,always ! > > Santosh > +
Santosh Achhra 2013-01-08, 05:21
-
Re: Creating compressed data with scooopJarek Jarcec Cecho 2013-01-08, 07:36
Hi Santosh,
thank you very much for sharing additional information with us. You can download the job XML from JobTracker web ui by clicking at you job details (column JobId on the "main" JobTracker dashboard) and following "JobConf" link on top of the page. You should get webpage that will list key-value pairs in table with two columns. Jarcec On Tue, Jan 08, 2013 at 01:21:57PM +0800, Santosh Achhra wrote: > Thank you Jarcec. > > Here are the details > > *Hadoop Version:* > Hadoop 2.0.0-cdh4.1.0 > Subversion > file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.0/src/hadoop-common-project/hadoop-common > -r 5c0a0bddbc2aaff30a8624b5980cd4a2e1b68d18 > Compiled by jenkins on Sat Sep 29 11:26:20 PDT 2012 > From source with checksum 95f5c7f30b4030f1f327758e7b2bd61f > > *Sqoop Version* > Sqoop 1.4.1-cdh4.1.0 > git commit id 10df2d6359a84f8877d63134b867a2ee718a2ca9 > Compiled by jenkins on Sat Sep 29 12:11:42 PDT 2012 > > *Task Tracker Config file* > <name>mapred.output.compress</name> > <value>false</value> > </property> > <property> > <name>mapred.output.compression.type</name> > <value>BLOCK</value> > </property> > <property> > <name>mapred.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.DefaultCodec</value> > </property> > <property> > <name>mapred.map.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.SnappyCodec</value> > </property> > <property> > * <name>mapred.compress.map.output</name>* > * <value>true</value>* > > I was not able to locate Job XML file (configuration file). Could you > please let me know where look for it ? > > Good wishes,always ! > Santosh > > > On Mon, Jan 7, 2013 at 8:31 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi Santosh, > > Sqoop is delegating compression/decompression to mapreduce framework. Thus > > Sqoop options might be overridden by your Mapreduce configuration (for > > example by setting that mapreduce output can't be compressed). > > > > Would you mind sharing with us your: > > > > * Hadoop version > > * Sqoop version > > * TaskTracker configuration file (mapred-site.xml) > > * Job XML (~ configuration file) for job generated by Sqoop > > > > I'm particularly looking for: > > > > * io.compression.codecs - Must contain codes you're using with Sqoop > > * mapred.compress.map.output - Must be set to true on Job XML level and > > must not be set to final false in TaskTracker configuration > > > > Jarcec > > > > On Mon, Jan 07, 2013 at 07:34:35PM +0800, Santosh Achhra wrote: > > > Hello, > > > > > > I am trying to import data from table, and I would like final data to be > > > compressed on HDFS which would help save some space. > > > > > > I am executing below mentioned command. > > > This command completes successfully and I dont see any error reported > > > however when see the final data using hadoop ls command is not in > > > compressed format in HDFS > > > > > > sqoop --options-file /export/home/sqoop/connect.parm --table TEST > > > --split-by F1 --compression-codec > > > org.apache.hadoop.io.compress.SnappyCodec -z > > > > > > Am I missing something ? > > > > > > Also I would like to know if I can import data into hive in compressed > > > format. I executed below mentioned command, in this case to data into > > HDFS > > > is not in compressed format and describe table command in hive says that > > > table is not compressed > > > > > > sqoop --options-file /export/home/sqoop/connect.parm --table TEST > > > --split-by F1 --hive-import -m 1 --compress --compression-codec > > > org.apache.hadoop.io.compress.GzipCodec > > > > > > Good wishes,always ! > > > Santosh > > +
Jarek Jarcec Cecho 2013-01-08, 07:36
-
Re: Creating compressed data with scooopSantosh Achhra 2013-01-08, 08:01
Thank you Jarcec,
I am posting values for few rows, Are you looking for any specific row ? Thanks once again for your help. mapred.compress.map.output true io.seqfile.lazydecompress true io.compression.codecs org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec mapred.output.compression.type BLOCK mapred.output.compression.codec org.apache.hadoop.io.compress.SnappyCodec io.seqfile.compress.blocksize 1000000 dfs.image.compress false mapred.output.compress true dfs.image.compression.codec org.apache.hadoop.io.compress.DefaultCodec Good wishes,always ! Santosh On Tue, Jan 8, 2013 at 3:36 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > L from JobTracker web ui by clicking at you job details (column +
Santosh Achhra 2013-01-08, 08:01
-
Re: Creating compressed data with scooopJarek Jarcec Cecho 2013-01-08, 11:09
I do not see any discrepancy here nor in the TaskTracker configuration. The mapreduce output compression should be working. Property:
* io.compression.codecs - Contains codes you were using * mapred.output.compress - is set to true and is not set to final in TT config * mapred.output.compression.codec - seems to be set correctly Are you sure that your TaskTrackers were started with this configuration? Would you mind sharing the map task output? Just to be sure that there won't be anything suspicious. Jarcec On Tue, Jan 08, 2013 at 04:01:35PM +0800, Santosh Achhra wrote: > Thank you Jarcec, > > I am posting values for few rows, Are you looking for any specific row ? > Thanks once again for your help. > > mapred.compress.map.output true > io.seqfile.lazydecompress true > io.compression.codecs > org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec > mapred.output.compression.type BLOCK > mapred.output.compression.codec org.apache.hadoop.io.compress.SnappyCodec > io.seqfile.compress.blocksize 1000000 > dfs.image.compress false > mapred.output.compress true > dfs.image.compression.codec org.apache.hadoop.io.compress.DefaultCodec > > Good wishes,always ! > Santosh > > > On Tue, Jan 8, 2013 at 3:36 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > L from JobTracker web ui by clicking at you job details (column +
Jarek Jarcec Cecho 2013-01-08, 11:09
-
Re: Creating compressed data with scooopSantosh Achhra 2013-01-08, 14:33
Hi Jarcec,
I appreciate your time on this. Here is the log captured from terminal after I execute sqoop command 13/01/08 13:27:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 13/01/08 13:27:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 13/01/08 13:27:42 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 13/01/08 13:27:42 INFO manager.SqlManager: Using default fetchSize of 1000 13/01/08 13:27:42 INFO tool.CodeGenTool: Beginning code generation 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TABLE AS t WHERE 1=0 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TABLE AS t WHERE 1=0 13/01/08 13:27:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/hadoop Note: /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/optqc5d_TABLE.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 13/01/08 13:27:44 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/TABLE.jar 13/01/08 13:27:44 INFO mapreduce.ImportJobBase: Beginning import of TABLE 13/01/08 13:27:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TABLE AS t WHERE 1=0 13/01/08 13:27:44 WARN snappy.LoadSnappy: Snappy native library is available 13/01/08 13:27:44 INFO snappy.LoadSnappy: Snappy native library loaded 13/01/08 13:27:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 1148 for saachhra on 10.207.37.75:8020 13/01/08 13:27:46 INFO security.TokenCache: Got dt for hdfs://sl73caeq02.visa.com:8020;uri=10.207.37.75:8020;t.service10.207.37.75:8020 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 1149 for saachhra on 10.207.37.75:8020 13/01/08 13:27:46 INFO security.TokenCache: Got dt for hdfs:// sl73caeq02.visa.com:8020/user/saachhra/.staging/job_201301072354_0114/libjars/ant-eclipse-1.0-jvm1.2.jar;uri=10.207.37.75:8020;t.service=10.207.37.75:8020 13/01/08 13:27:47 INFO mapred.JobClient: Running job: job_201301072354_0114 13/01/08 13:27:48 INFO mapred.JobClient: map 0% reduce 0% 13/01/08 13:27:59 INFO mapred.JobClient: map 100% reduce 0% 13/01/08 13:28:00 INFO mapred.JobClient: Job complete: job_201301072354_0114 13/01/08 13:28:00 INFO mapred.JobClient: Counters: 23 13/01/08 13:28:00 INFO mapred.JobClient: File System Counters 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of bytes read=0 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of bytes written=170459 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of read operations=0 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of large read operations=0 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of write operations=0 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of bytes read=87 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of bytes written=7863800 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of read operations=1 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of write operations=1 13/01/08 13:28:00 INFO mapred.JobClient: Job Counters 13/01/08 13:28:00 INFO mapred.JobClient: Launched map tasks=1 13/01/08 13:28:00 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=9118 13/01/08 13:28:00 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0 13/01/08 13:28:00 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/01/08 13:28:00 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/01/08 13:28:00 INFO mapred.JobClient: Map-Reduce Framework 13/01/08 13:28:00 INFO mapred.JobClient: Map input records=14363 13/01/08 13:28:00 INFO mapred.JobClient: Map output records=14363 13/01/08 13:28:00 INFO mapred.JobClient: Input split bytes=87 13/01/08 13:28:00 INFO mapred.JobClient: Spilled Records=0 13/01/08 13:28:00 INFO mapred.JobClient: CPU time spent (ms)=4530 13/01/08 13:28:00 INFO mapred.JobClient: Physical memory (bytes) snapshot=303534080 13/01/08 13:28:00 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1868890112 13/01/08 13:28:00 INFO mapred.JobClient: Total committed heap usage (bytes)=757792768 13/01/08 13:28:00 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 15.6271 seconds (0 bytes/sec) 13/01/08 13:28:00 INFO mapreduce.ImportJobBase: Retrieved 14363 records. 13/01/08 13:28:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TABLE AS t WHERE 1=0 13/01/08 13:28:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TABLE AS t WHERE 1=0 13/01/08 13:28:00 INFO hive.HiveImport: Removing temporary files from import process: hdfs://sl73caeq02.visa.com:8020/user/saachhra/TABLE/_logs 13/01/08 13:28:00 INFO hive.HiveImport: Loading uploaded data into Hive 13/01/08 13:28:02 INFO hive.HiveImport: Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties 13/01/08 13:28:02 INFO hive.HiveImport: Hive history file=/tmp/saachhra/hive_job_log_saachhra_201301081328_1669317999.txt 13/01/08 13:28:05 INFO hive.HiveImport: OK 13/01/08 13:28:05 INFO hive.HiveImport: Time taken: 2.487 seconds 13/01/08 13:28:06 INFO hive.HiveImport: Loading data to table TABLE 13/01/08 13:28:06 INFO hive.HiveImport: OK 13/01/08 13:28:06 INFO hive.HiveImport: Time taken: 0.918 seconds 13/01/08 13:28:06 INFO hive.HiveImport: Hive import complete. 13/01/08 13:28:06 INFO hive.HiveImport: Export directory is empty, removing it. sl73caeq02.visa.com:/export/home/saachhra:> Good wishes,always ! Santosh On Tue, Ja +
Santosh Achhra 2013-01-08, 14:33
-
Re: Creating compressed data with scooopJarek Jarcec Cecho 2013-01-08, 15:43
Hi Santosh,
thank you for the log. I do not see anything suspicious, would you mind also providing log from one map task? In this particular case for job job_201301072354_0114 that Sqoop generates. Jarcec On Tue, Jan 08, 2013 at 10:33:58PM +0800, Santosh Achhra wrote: > Hi Jarcec, > I appreciate your time on this. Here is the log captured from terminal > after I execute sqoop command > > 13/01/08 13:27:42 WARN tool.BaseSqoopTool: Setting your password on the > command-line is insecure. Consider using -P instead. > 13/01/08 13:27:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters > for output. You can override > 13/01/08 13:27:42 INFO tool.BaseSqoopTool: delimiters with > --fields-terminated-by, etc. > 13/01/08 13:27:42 INFO manager.SqlManager: Using default fetchSize of 1000 > 13/01/08 13:27:42 INFO tool.CodeGenTool: Beginning code generation > 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM TABLE AS t WHERE 1=0 > 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM TABLE AS t WHERE 1=0 > 13/01/08 13:27:42 INFO orm.CompilationManager: HADOOP_HOME is > /usr/lib/hadoop > Note: > /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/optqc5d_TABLE.java > uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. > 13/01/08 13:27:44 INFO orm.CompilationManager: Writing jar file: > /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/TABLE.jar > 13/01/08 13:27:44 INFO mapreduce.ImportJobBase: Beginning import of TABLE > 13/01/08 13:27:44 INFO manager.SqlManager: Executing SQL statement: SELECT > t.* FROM TABLE AS t WHERE 1=0 > 13/01/08 13:27:44 WARN snappy.LoadSnappy: Snappy native library is available > 13/01/08 13:27:44 INFO snappy.LoadSnappy: Snappy native library loaded > 13/01/08 13:27:46 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token > 1148 for saachhra on 10.207.37.75:8020 > 13/01/08 13:27:46 INFO security.TokenCache: Got dt for > hdfs://sl73caeq02.visa.com:8020;uri=10.207.37.75:8020;t.service> 10.207.37.75:8020 > 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token > 1149 for saachhra on 10.207.37.75:8020 > 13/01/08 13:27:46 INFO security.TokenCache: Got dt for hdfs:// > sl73caeq02.visa.com:8020/user/saachhra/.staging/job_201301072354_0114/libjars/ant-eclipse-1.0-jvm1.2.jar;uri=10.207.37.75:8020;t.service=10.207.37.75:8020 > 13/01/08 13:27:47 INFO mapred.JobClient: Running job: job_201301072354_0114 > 13/01/08 13:27:48 INFO mapred.JobClient: map 0% reduce 0% > 13/01/08 13:27:59 INFO mapred.JobClient: map 100% reduce 0% > 13/01/08 13:28:00 INFO mapred.JobClient: Job complete: job_201301072354_0114 > 13/01/08 13:28:00 INFO mapred.JobClient: Counters: 23 > 13/01/08 13:28:00 INFO mapred.JobClient: File System Counters > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of bytes read=0 > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of bytes > written=170459 > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of read > operations=0 > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of large read > operations=0 > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of write > operations=0 > 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of bytes read=87 > 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of bytes > written=7863800 > 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of read > operations=1 > 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of large read > operations=0 > 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of write > operations=1 > 13/01/08 13:28:00 INFO mapred.JobClient: Job Counters > 13/01/08 13:28:00 INFO mapred.JobClient: Launched map tasks=1 > 13/01/08 13:28:00 INFO mapred.JobClient: Total time spent by all maps +
Jarek Jarcec Cecho 2013-01-08, 15:43
-
Re: Creating compressed data with scooopSantosh Achhra 2013-01-08, 16:09
Hi Jarcec,
Could you please let me know where to look for the log you are requesting. In my previous mail, I have provided full log which displayed on terminal screen after executing sqoop command. Good wishes,always ! Santosh On Tue, Jan 8, 2013 at 11:43 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi Santosh, > thank you for the log. I do not see anything suspicious, would you mind > also providing log from one map task? In this particular case for job > job_201301072354_0114 that Sqoop generates. > > Jarcec > > On Tue, Jan 08, 2013 at 10:33:58PM +0800, Santosh Achhra wrote: > > Hi Jarcec, > > I appreciate your time on this. Here is the log captured from terminal > > after I execute sqoop command > > > > 13/01/08 13:27:42 WARN tool.BaseSqoopTool: Setting your password on the > > command-line is insecure. Consider using -P instead. > > 13/01/08 13:27:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters > > for output. You can override > > 13/01/08 13:27:42 INFO tool.BaseSqoopTool: delimiters with > > --fields-terminated-by, etc. > > 13/01/08 13:27:42 INFO manager.SqlManager: Using default fetchSize of > 1000 > > 13/01/08 13:27:42 INFO tool.CodeGenTool: Beginning code generation > > 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: > SELECT > > t.* FROM TABLE AS t WHERE 1=0 > > 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: > SELECT > > t.* FROM TABLE AS t WHERE 1=0 > > 13/01/08 13:27:42 INFO orm.CompilationManager: HADOOP_HOME is > > /usr/lib/hadoop > > Note: > > > /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/optqc5d_TABLE.java > > uses or overrides a deprecated API. > > Note: Recompile with -Xlint:deprecation for details. > > 13/01/08 13:27:44 INFO orm.CompilationManager: Writing jar file: > > /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/TABLE.jar > > 13/01/08 13:27:44 INFO mapreduce.ImportJobBase: Beginning import of TABLE > > 13/01/08 13:27:44 INFO manager.SqlManager: Executing SQL statement: > SELECT > > t.* FROM TABLE AS t WHERE 1=0 > > 13/01/08 13:27:44 WARN snappy.LoadSnappy: Snappy native library is > available > > 13/01/08 13:27:44 INFO snappy.LoadSnappy: Snappy native library loaded > > 13/01/08 13:27:46 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN > token > > 1148 for saachhra on 10.207.37.75:8020 > > 13/01/08 13:27:46 INFO security.TokenCache: Got dt for > > hdfs://sl73caeq02.visa.com:8020;uri=10.207.37.75:8020;t.service> > 10.207.37.75:8020 > > 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN > token > > 1149 for saachhra on 10.207.37.75:8020 > > 13/01/08 13:27:46 INFO security.TokenCache: Got dt for hdfs:// > > > sl73caeq02.visa.com:8020/user/saachhra/.staging/job_201301072354_0114/libjars/ant-eclipse-1.0-jvm1.2.jar;uri=10.207.37.75:8020;t.service=10.207.37.75:8020 > > 13/01/08 13:27:47 INFO mapred.JobClient: Running job: > job_201301072354_0114 > > 13/01/08 13:27:48 INFO mapred.JobClient: map 0% reduce 0% > > 13/01/08 13:27:59 INFO mapred.JobClient: map 100% reduce 0% > > 13/01/08 13:28:00 INFO mapred.JobClient: Job complete: > job_201301072354_0114 > > 13/01/08 13:28:00 INFO mapred.JobClient: Counters: 23 > > 13/01/08 13:28:00 INFO mapred.JobClient: File System Counters > > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of bytes read=0 > > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of bytes > > written=170459 > > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of read > > operations=0 > > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of large read > > operations=0 > > 13/01/08 13:28:00 INFO mapred.JobClient: FILE: Number of write > > operations=0 > > 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of bytes > read=87 > > 13/01/08 13:28:00 INFO mapred.JobClient: HDFS: Number of bytes +
Santosh Achhra 2013-01-08, 16:09
-
Re: Creating compressed data with scooopJarek Jarcec Cecho 2013-01-09, 10:01
Hi Santosh,
if you will go to your JobTracker web ui, you will see running/failed/history jobs. If you will follow the id link, you'll get to job page summary where you lastly got the job XML file. When you now click on "map" in the table with runnin/failed map tasks you'll get list of your map tasks. Choose one arbitrary map task and click on it. Please send us file that you'll get by clicking "All" in "Task logs" column for one arbitrary task attempt. Jarcec On Wed, Jan 09, 2013 at 12:09:17AM +0800, Santosh Achhra wrote: > Hi Jarcec, > > Could you please let me know where to look for the log you are requesting. > In my previous mail, I have provided full log which displayed > on terminal screen after executing sqoop command. > > Good wishes,always ! > Santosh > > > On Tue, Jan 8, 2013 at 11:43 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > Hi Santosh, > > thank you for the log. I do not see anything suspicious, would you mind > > also providing log from one map task? In this particular case for job > > job_201301072354_0114 that Sqoop generates. > > > > Jarcec > > > > On Tue, Jan 08, 2013 at 10:33:58PM +0800, Santosh Achhra wrote: > > > Hi Jarcec, > > > I appreciate your time on this. Here is the log captured from terminal > > > after I execute sqoop command > > > > > > 13/01/08 13:27:42 WARN tool.BaseSqoopTool: Setting your password on the > > > command-line is insecure. Consider using -P instead. > > > 13/01/08 13:27:42 INFO tool.BaseSqoopTool: Using Hive-specific delimiters > > > for output. You can override > > > 13/01/08 13:27:42 INFO tool.BaseSqoopTool: delimiters with > > > --fields-terminated-by, etc. > > > 13/01/08 13:27:42 INFO manager.SqlManager: Using default fetchSize of > > 1000 > > > 13/01/08 13:27:42 INFO tool.CodeGenTool: Beginning code generation > > > 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: > > SELECT > > > t.* FROM TABLE AS t WHERE 1=0 > > > 13/01/08 13:27:42 INFO manager.SqlManager: Executing SQL statement: > > SELECT > > > t.* FROM TABLE AS t WHERE 1=0 > > > 13/01/08 13:27:42 INFO orm.CompilationManager: HADOOP_HOME is > > > /usr/lib/hadoop > > > Note: > > > > > /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/optqc5d_TABLE.java > > > uses or overrides a deprecated API. > > > Note: Recompile with -Xlint:deprecation for details. > > > 13/01/08 13:27:44 INFO orm.CompilationManager: Writing jar file: > > > /tmp/sqoop-saachhra/compile/d347fd96a56c490d589ea9522e92d7a0/TABLE.jar > > > 13/01/08 13:27:44 INFO mapreduce.ImportJobBase: Beginning import of TABLE > > > 13/01/08 13:27:44 INFO manager.SqlManager: Executing SQL statement: > > SELECT > > > t.* FROM TABLE AS t WHERE 1=0 > > > 13/01/08 13:27:44 WARN snappy.LoadSnappy: Snappy native library is > > available > > > 13/01/08 13:27:44 INFO snappy.LoadSnappy: Snappy native library loaded > > > 13/01/08 13:27:46 WARN mapred.JobClient: Use GenericOptionsParser for > > > parsing the arguments. Applications should implement Tool for the same. > > > 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN > > token > > > 1148 for saachhra on 10.207.37.75:8020 > > > 13/01/08 13:27:46 INFO security.TokenCache: Got dt for > > > hdfs://sl73caeq02.visa.com:8020;uri=10.207.37.75:8020;t.service> > > 10.207.37.75:8020 > > > 13/01/08 13:27:46 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN > > token > > > 1149 for saachhra on 10.207.37.75:8020 > > > 13/01/08 13:27:46 INFO security.TokenCache: Got dt for hdfs:// > > > > > sl73caeq02.visa.com:8020/user/saachhra/.staging/job_201301072354_0114/libjars/ant-eclipse-1.0-jvm1.2.jar;uri=10.207.37.75:8020;t.service=10.207.37.75:8020 > > > 13/01/08 13:27:47 INFO mapred.JobClient: Running job: > > job_201301072354_0114 > > > 13/01/08 13:27:48 INFO mapred.JobClient: map 0% reduce 0% > > > 13/01/08 13:27:59 INFO mapred.JobClient: map 100% reduce 0% > > > 13/01/08 13:28:00 INFO mapred.JobClient: Job complete: > > job_201301072354_0114 > > > 13/01/08 13:28:00 INFO mapred.JobClient: Counters: 23 +
Jarek Jarcec Cecho 2013-01-09, 10:01
-
Re: Creating compressed data with scooopSantosh Achhra 2013-01-09, 11:45
Hi Jarcec,
Hopefully this is log which you had requested. all MAP task list for 0035_1357731518922_saachhra Task Id Start Time Finish Time Error task_201301081859_0035_m_000000 9/01 11:38:46 9/01 11:38:52 (6sec) After I click on "all MAP task list for 0035_1357731518922_saachhra", this is what I get Hadoop Job 0114_1357651667175_saachhra on History Viewer User: saachhra JobName: TABLE.jar JobConf: hdfs://host:8020/user/saachhra/.staging/job_201301072354_0114/job.xml Job-ACLs: All users are allowed Submitted At: 8-Jan-2013 13:27:47 Launched At: 8-Jan-2013 13:27:47 (0sec) Finished At: 8-Jan-2013 13:28:00 (12sec) Status: SUCCESS Analyse This Job Kind Total Tasks(successful+failed+killed) Successful tasks Failed tasks Killed tasks Start Time Finish Time Setup 1 1 0 0 8-Jan-2013 13:27:50 8-Jan-2013 13:27:52 (1sec) Map 1 1 0 0 8-Jan-2013 13:27:53 8-Jan-2013 13:27:58 (5sec) Reduce 0 0 0 0 Cleanup 1 1 0 0 8-Jan-2013 13:27:58 8-Jan-2013 13:28:00 (2sec) Counter Map Reduce Total File System Counters FILE: Number of bytes read 0 0 0 FILE: Number of bytes written 0 0 170,459 FILE: Number of read operations 0 0 0 FILE: Number of large read operations 0 0 0 FILE: Number of write operations 0 0 0 HDFS: Number of bytes read 0 0 87 HDFS: Number of bytes written 0 0 7,863,800 HDFS: Number of read operations 0 0 1 HDFS: Number of large read operations 0 0 0 HDFS: Number of write operations 0 0 1 Job Counters Launched map tasks 0 0 1 Total time spent by all maps in occupied slots (ms) 0 0 9,118 Total time spent by all reduces in occupied slots (ms) 0 0 0 Total time spent by all maps waiting after reserving slots (ms) 0 0 0 Total time spent by all reduces waiting after reserving slots (ms) 0 0 0 Map-Reduce Framework Map input records 0 0 14,363 Map output records 0 0 14,363 Input split bytes 0 0 87 Spilled Records 0 0 0 CPU time spent (ms) 0 0 4,530 Physical memory (bytes) snapshot 0 0 303,534,080 Virtual memory (bytes) snapshot 0 0 1,868,890,112 Total committed heap usage (bytes) 0 0 757,792,768 Good wishes,always ! Santosh On Wed, Jan 9, 2013 at 6:01 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > your JobTracker web ui, you will see running/failed/history jobs. If you > will follow the id link, you'll get to job page summary where you lastly > got the job XML file +
Santosh Achhra 2013-01-09, 11:45
-
Re: Creating compressed data with scooopJarek Jarcec Cecho 2013-01-10, 09:40
Hi Santosh,
almost :-). When you click on this page on the "map" link you should get page with map tasks. When you click on one arbitrary task you should get table with all corresponding task attempts. On this page you should see link to log. Jarcec On Wed, Jan 09, 2013 at 07:45:44PM +0800, Santosh Achhra wrote: > Hi Jarcec, > > Hopefully this is log which you had requested. > > all MAP task list for 0035_1357731518922_saachhra > > Task Id Start Time Finish Time > Error > task_201301081859_0035_m_000000 9/01 11:38:46 9/01 11:38:52 (6sec) > > > After I click on "all MAP task list for 0035_1357731518922_saachhra", this > is what I get > > Hadoop Job 0114_1357651667175_saachhra on History Viewer > > User: saachhra > JobName: TABLE.jar > JobConf: > hdfs://host:8020/user/saachhra/.staging/job_201301072354_0114/job.xml > Job-ACLs: All users are allowed > Submitted At: 8-Jan-2013 13:27:47 > Launched At: 8-Jan-2013 13:27:47 (0sec) > Finished At: 8-Jan-2013 13:28:00 (12sec) > Status: SUCCESS > Analyse This Job > Kind Total Tasks(successful+failed+killed) Successful tasks Failed tasks Killed > tasks Start Time Finish Time > Setup 1 1 0 0 8-Jan-2013 13:27:50 8-Jan-2013 13:27:52 (1sec) > Map 1 1 0 0 8-Jan-2013 13:27:53 8-Jan-2013 13:27:58 (5sec) > Reduce 0 0 0 0 > Cleanup 1 1 0 0 8-Jan-2013 13:27:58 8-Jan-2013 13:28:00 (2sec) > > > > Counter Map Reduce Total > File System Counters FILE: Number of bytes read 0 0 0 > FILE: Number of bytes written 0 0 170,459 > FILE: Number of read operations 0 0 0 > FILE: Number of large read operations 0 0 0 > FILE: Number of write operations 0 0 0 > HDFS: Number of bytes read 0 0 87 > HDFS: Number of bytes written 0 0 7,863,800 > HDFS: Number of read operations 0 0 1 > HDFS: Number of large read operations 0 0 0 > HDFS: Number of write operations 0 0 1 > Job Counters Launched map tasks 0 0 1 > Total time spent by all maps in occupied slots (ms) 0 0 9,118 > Total time spent by all reduces in occupied slots (ms) 0 0 0 > Total time spent by all maps waiting after reserving slots (ms) 0 0 0 > Total time spent by all reduces waiting after reserving slots (ms) 0 0 0 > Map-Reduce Framework Map input records 0 0 14,363 > Map output records 0 0 14,363 > Input split bytes 0 0 87 > Spilled Records 0 0 0 > CPU time spent (ms) 0 0 4,530 > Physical memory (bytes) snapshot 0 0 303,534,080 > Virtual memory (bytes) snapshot 0 0 1,868,890,112 > Total committed heap usage (bytes) 0 0 757,792,768 > > Good wishes,always ! > Santosh > > > On Wed, Jan 9, 2013 at 6:01 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > > > your JobTracker web ui, you will see running/failed/history jobs. If you > > will follow the id link, you'll get to job page summary where you lastly > > got the job XML file +
Jarek Jarcec Cecho 2013-01-10, 09:40
-
Re: Creating compressed data with scooopSantosh Achhra 2013-01-10, 12:47
Hi Jarcec,
Are you referring to task logs ? in below screen. When I click on "All" or "Last 4KB" or "Last 8KB', the page times out :-( Not sure why task_201301072354_0114_m_000000 attempts for 0114_1357651667175_saachhra<http://sl73caeq02.visa.com:50030/jobdetailshistory.jsp?logFile=file:/hadoop1/logs/hadoop/history/done/sl73caeq02.visa.com_1357602885590_/2013/00/08/000000/job_201301072354_0114_1357651667175_saachhra_optqc5d.TQC_ELIG_TRAN_HIST.jar> Task IdStart TimeFinish Time HostErrorTask Logs Counters attempt_201301072354_0114_m_000000_0 8/01 13:27:538/01 13:27:58 (5sec) /default/host Last 4KB<http://sl73caeq03.visa.com:50060/tasklog?attemptid=attempt_201301072354_0114_m_000000_0&start=-4097> Last 8KB<http://sl73caeq03.visa.com:50060/tasklog?attemptid=attempt_201301072354_0114_m_000000_0&start=-8193> All<http://sl73caeq03.visa.com:50060/tasklog?attemptid=attempt_201301072354_0114_m_000000_0&all=true> 18<http://sl73caeq02.visa.com:50030/taskstatshistory.jsp?attemptid=attempt_201301072354_0114_m_000000_0&logFile=file:/hadoop1/logs/hadoop/history/done/sl73caeq02.visa.com_1357602885590_/2013/00/08/000000/job_201301072354_0114_1357651667175_saachhra_optqc5d.TQC_ELIG_TRAN_HIST.jar> However I got the location on linux here it was trying to look. I am afraid I wont be able to attach file or put full contents in this email :-( Is this what you were looking for (See below) ? - <property> <name>mapred.compress.map.output</name> <value>true</value> <source>mapred-site.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>io.seqfile.lazydecompress</name> <value>true</value> <source>core-default.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec</value> <source>core-site.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> <source>mapred-site.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>mapred.output.compression.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> <source>programatically</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>mapred.map.output.compression.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> <source>mapred-site.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>io.seqfile.compress.blocksize</name> <value>1000000</value> <source>core-default.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>dfs.image.compress</name> <value>false</value> <source>hdfs-default.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> - <property> <name>dfs.image.compression.codec</name> <value>org.apache.hadoop.io.compress.DefaultCodec</value> <source>hdfs-default.xml</source> <source>/data/mapred/jt/jobTracker/job_201301072354_0114.xml</source> </property> Good wishes,always ! Santosh On Thu, Jan 10, 2013 at 5:40 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote: > Hi Santosh, > almost :-). When you click on this page on the "map" link you should get > page with map tasks. When you click on one arbitrary task you should get > table with all corresponding task attempts. On this page you should see > link to log. > > Jarcec > > On Wed, Jan 09, 2013 at 07:45:44PM +0800, Santosh Achhra wrote: > > Hi Jarcec, > > > > Hopefully this is log which you had requested. +
Santosh Achhra 2013-01-10, 12:47
|