Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Sqoop, mail # user - Creating compressed data with scooop


+
Santosh Achhra 2013-01-07, 11:34
+
Jarek Jarcec Cecho 2013-01-07, 12:31
Copy link to this message
-
Re: Creating compressed data with scooop
Santosh Achhra 2013-01-08, 05:21
Thank you Jarcec.

Here are the details

*Hadoop Version:*
Hadoop 2.0.0-cdh4.1.0
Subversion
file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.0/src/hadoop-common-project/hadoop-common
-r 5c0a0bddbc2aaff30a8624b5980cd4a2e1b68d18
Compiled by jenkins on Sat Sep 29 11:26:20 PDT 2012
>From source with checksum 95f5c7f30b4030f1f327758e7b2bd61f

*Sqoop Version*
Sqoop 1.4.1-cdh4.1.0
git commit id 10df2d6359a84f8877d63134b867a2ee718a2ca9
Compiled by jenkins on Sat Sep 29 12:11:42 PDT 2012

*Task Tracker Config file*
 <name>mapred.output.compress</name>
    <value>false</value>
  </property>
  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
  </property>
  <property>
    <name>mapred.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>
  <property>
*    <name>mapred.compress.map.output</name>*
*    <value>true</value>*

I was not able to locate Job XML file (configuration file). Could you
please let me know where look for it ?

Good wishes,always !
Santosh
On Mon, Jan 7, 2013 at 8:31 PM, Jarek Jarcec Cecho <[EMAIL PROTECTED]>wrote:

> Hi Santosh,
> Sqoop is delegating compression/decompression to mapreduce framework. Thus
> Sqoop options might be overridden by your Mapreduce configuration (for
> example by setting that mapreduce output can't be compressed).
>
> Would you mind sharing with us your:
>
> * Hadoop version
> * Sqoop version
> * TaskTracker configuration file (mapred-site.xml)
> * Job XML (~ configuration file) for job generated by Sqoop
>
> I'm particularly looking for:
>
> * io.compression.codecs - Must contain codes you're using with Sqoop
> * mapred.compress.map.output - Must be set to true on Job XML level and
> must not be set to final false in TaskTracker configuration
>
> Jarcec
>
> On Mon, Jan 07, 2013 at 07:34:35PM +0800, Santosh Achhra wrote:
> > Hello,
> >
> > I am trying to import data from table, and I would like final data to be
> > compressed on HDFS which would help save some space.
> >
> > I am executing below mentioned command.
> > This command completes successfully and I dont see any error reported
> > however when see the final data using  hadoop ls command is not in
> > compressed format in HDFS
> >
> > sqoop --options-file /export/home/sqoop/connect.parm --table TEST
> >  --split-by F1  --compression-codec
> > org.apache.hadoop.io.compress.SnappyCodec -z
> >
> > Am I missing something ?
> >
> > Also I would like to know if I can import data into hive in compressed
> > format. I executed below mentioned command, in this case to data into
> HDFS
> > is not in compressed format and describe table command in hive  says that
> > table is not compressed
> >
> > sqoop --options-file /export/home/sqoop/connect.parm --table TEST
> >  --split-by F1  --hive-import   -m 1  --compress --compression-codec
> > org.apache.hadoop.io.compress.GzipCodec
> >
> > Good wishes,always !
> > Santosh
>
+
Jarek Jarcec Cecho 2013-01-08, 07:36
+
Santosh Achhra 2013-01-08, 08:01
+
Jarek Jarcec Cecho 2013-01-08, 11:09
+
Santosh Achhra 2013-01-08, 14:33
+
Jarek Jarcec Cecho 2013-01-08, 15:43
+
Santosh Achhra 2013-01-08, 16:09
+
Jarek Jarcec Cecho 2013-01-09, 10:01
+
Santosh Achhra 2013-01-09, 11:45
+
Jarek Jarcec Cecho 2013-01-10, 09:40
+
Santosh Achhra 2013-01-10, 12:47