Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop >> mail # user >> Creating compressed data with scooop


+
Santosh Achhra 2013-01-07, 11:34
+
Jarek Jarcec Cecho 2013-01-07, 12:31
Copy link to this message
-
Re: Creating compressed data with scooop
Thank you Jarcec.

Here are the details

*Hadoop Version:*
Hadoop 2.0.0-cdh4.1.0
Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.0/src/hadoop-common-project/hadoop-common
-r 5c0a0bddbc2aaff30a8624b5980cd4a2e1b68d18
Compiled by jenkins on Sat Sep 29 11:26:20 PDT 2012
>From source with checksum 95f5c7f30b4030f1f327758e7b2bd61f

*Sqoop Version*
Sqoop 1.4.1-cdh4.1.0
git commit id 10df2d6359a84f8877d63134b867a2ee718a2ca9
Compiled by jenkins on Sat Sep 29 12:11:42 PDT 2012

*Task Tracker Config file*
 <name>mapred.output.compress</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
  </property>
  <property>
    <name>mapred.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
*    <name>mapred.compress.map.output</name>*
*    <value>true</value>*

I was not able to locate Job XML file (configuration file). Could you
please let me know where look for it ?

Good wishes,always !
Santosh
On Mon, Jan 7, 2013 at 8:31 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Santosh,
> Sqoop is delegating compression/decompression to mapreduce framework. Thus
> Sqoop options might be overridden by your Mapreduce configuration (for
> example by setting that mapreduce output can't be compressed).
>
> Would you mind sharing with us your:
>
> * Hadoop version
> * Sqoop version
> * TaskTracker configuration file (mapred-site.xml)
> * Job XML (~ configuration file) for job generated by Sqoop
>
> I'm particularly looking for:
>
> * io.compression.codecs - Must contain codes you're using with Sqoop
> * mapred.compress.map.output - Must be set to true on Job XML level and
> must not be set to final false in TaskTracker configuration
>
> Jarcec
>
> On Mon, Jan 07, 2013 at 07:34:35PM +0800, Santosh Achhra wrote:
> > Hello,
> >
> > I am trying to import data from table, and I would like final data to be
> > compressed on HDFS which would help save some space.
> >
> > I am executing below mentioned command.
> > This command completes successfully and I dont see any error reported
> > however when see the final data using  hadoop ls command is not in
> > compressed format in HDFS
> >
> > sqoop --options-file /export/home/sqoop/connect.parm --table TEST
> >  --split-by F1  --compression-codec
> > org.apache.hadoop.io.compress.SnappyCodec -z
> >
> > Am I missing something ?
> >
> > Also I would like to know if I can import data into hive in compressed
> > format. I executed below mentioned command, in this case to data into
> HDFS
> > is not in compressed format and describe table command in hive  says that
> > table is not compressed
> >
> > sqoop --options-file /export/home/sqoop/connect.parm --table TEST
> >  --split-by F1  --hive-import   -m 1  --compress --compression-codec
> > org.apache.hadoop.io.compress.GzipCodec
> >
> > Good wishes,always !
> > Santosh
>
+
Jarek Jarcec Cecho 2013-01-08, 07:36
+
Santosh Achhra 2013-01-08, 08:01
+
Jarek Jarcec Cecho 2013-01-08, 11:09
+
Santosh Achhra 2013-01-08, 14:33
+
Jarek Jarcec Cecho 2013-01-08, 15:43
+
Santosh Achhra 2013-01-08, 16:09
+
Jarek Jarcec Cecho 2013-01-09, 10:01
+
Santosh Achhra 2013-01-09, 11:45
+
Jarek Jarcec Cecho 2013-01-10, 09:40
+
Santosh Achhra 2013-01-10, 12:47
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB