|
Saurabh Nanda
2009-08-13, 04:14
Saurabh Nanda
2009-08-13, 05:52
Saurabh Nanda
2009-08-13, 12:20
Zheng Shao
2009-08-13, 19:08
Saurabh Nanda
2009-08-14, 04:06
Zheng Shao
2009-08-14, 04:21
Saurabh Nanda
2009-08-14, 05:24
Saurabh Nanda
2009-08-14, 05:28
Saurabh Nanda
2009-08-14, 05:29
Saurabh Nanda
2009-08-14, 06:34
Zheng Shao
2009-08-14, 07:09
Saurabh Nanda
2009-08-14, 07:34
Saurabh Nanda
2009-08-14, 09:01
Saurabh Nanda
2009-08-14, 09:03
Zheng Shao
2009-08-14, 10:03
Saurabh Nanda
2009-08-17, 06:58
Zheng Shao
2009-08-17, 07:26
Saurabh Nanda
2009-08-17, 07:56
Saurabh Nanda
2009-08-17, 07:58
Saurabh Nanda
2009-08-17, 12:17
Saurabh Nanda
2009-08-18, 04:13
Zheng Shao
2009-08-18, 05:49
Saurabh Nanda
2009-08-18, 10:21
Saurabh Nanda
2009-08-19, 05:46
Saurabh Nanda
2009-08-20, 19:43
Ashish Thusoo
2009-08-20, 20:05
Saurabh Nanda
2009-08-21, 05:34
Zheng Shao
2009-08-21, 06:00
Saurabh Nanda
2009-08-25, 07:19
Zheng Shao
2009-08-25, 08:23
Saurabh Nanda
2009-08-25, 12:18
|
-
Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-13, 04:14
I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop
0.18.3), copied over metastore_db & the conf directory. Output compression used to work with my earlier Hive installation, but it seems to have stopped working now. Are the configuration parameters different from Hive-0.3? "set -v" on Hive-trunk throws up the following relevant configuration parameters: mapred.output.compress=false hive.exec.compress.intermediate=false hive.exec.compress.output=true mapred.output.compression.type=BLOCK mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec io.seqfile.compress.blocksize=1000000 io.seqfile.lazydecompress=true mapred.compress.map.output=false io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec What am I missing? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-13, 05:52
I've even tried setting "mapred.output.compress=true" in hadoop-site.xml and
restarting the cluster but in vain. How do I get compression to work in Hive-trunk? Is it something to do with the Hive query as well. Here's what I'm trying: from raw_compressed insert overwrite table raw partition (dt='2009-04-02') select transform(line) using 'parse_logs.rb' as ip_address, aid, uid, ts, method, uri, response, referer, user_agent, cookies, ptime Saurabh. On Thu, Aug 13, 2009 at 9:44 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop > 0.18.3), copied over metastore_db & the conf directory. Output compression > used to work with my earlier Hive installation, but it seems to have stopped > working now. Are the configuration parameters different from Hive-0.3? > > "set -v" on Hive-trunk throws up the following relevant configuration > parameters: > > mapred.output.compress=false > hive.exec.compress.intermediate=false > hive.exec.compress.output=true > mapred.output.compression.type=BLOCK > mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec > > mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec > io.seqfile.compress.blocksize=1000000 > io.seqfile.lazydecompress=true > mapred.compress.map.output=false > > io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec > > What am I missing? > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-13, 12:20
Strangely, when I'm using JDBC all new data/partitions are compressed.
However, when I'm using CLI no matter what I do everything is uncompressed. Saurabh. On Thu, Aug 13, 2009 at 11:22 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > I've even tried setting "mapred.output.compress=true" in hadoop-site.xml > and restarting the cluster but in vain. > > How do I get compression to work in Hive-trunk? Is it something to do with > the Hive query as well. Here's what I'm trying: > > from raw_compressed > insert overwrite table raw partition (dt='2009-04-02') > select transform(line) using 'parse_logs.rb' as ip_address, aid, uid, > ts, method, uri, response, referer, user_agent, cookies, ptime > > Saurabh. > > > On Thu, Aug 13, 2009 at 9:44 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop >> 0.18.3), copied over metastore_db & the conf directory. Output compression >> used to work with my earlier Hive installation, but it seems to have stopped >> working now. Are the configuration parameters different from Hive-0.3? >> >> "set -v" on Hive-trunk throws up the following relevant configuration >> parameters: >> >> mapred.output.compress=false >> hive.exec.compress.intermediate=false >> hive.exec.compress.output=true >> mapred.output.compression.type=BLOCK >> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec >> >> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec >> io.seqfile.compress.blocksize=1000000 >> io.seqfile.lazydecompress=true >> mapred.compress.map.output=false >> >> io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec >> >> What am I missing? >> >> Saurabh. >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-13, 19:08
Hi Saurabh,
hive.exec.compress.output=true is the correct option. Can you post the "insert" command that you run which produced non-compressed results? Is the output in TextFileFormat or SequenceFileFormat? Zheng On Wed, Aug 12, 2009 at 10:52 PM, Saurabh Nanda<[EMAIL PROTECTED]> wrote: > I've even tried setting "mapred.output.compress=true" in hadoop-site.xml and > restarting the cluster but in vain. > > How do I get compression to work in Hive-trunk? Is it something to do with > the Hive query as well. Here's what I'm trying: > > from raw_compressed > insert overwrite table raw partition (dt='2009-04-02') > select transform(line) using 'parse_logs.rb' as ip_address, aid, uid, > ts, method, uri, response, referer, user_agent, cookies, ptime > > Saurabh. > > On Thu, Aug 13, 2009 at 9:44 AM, Saurabh Nanda <[EMAIL PROTECTED]> > wrote: >> >> I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop >> 0.18.3), copied over metastore_db & the conf directory. Output compression >> used to work with my earlier Hive installation, but it seems to have stopped >> working now. Are the configuration parameters different from Hive-0.3? >> >> "set -v" on Hive-trunk throws up the following relevant configuration >> parameters: >> >> mapred.output.compress=false >> hive.exec.compress.intermediate=false >> hive.exec.compress.output=true >> mapred.output.compression.type=BLOCK >> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec >> >> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec >> io.seqfile.compress.blocksize=1000000 >> io.seqfile.lazydecompress=true >> mapred.compress.map.output=false >> >> io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec >> >> What am I missing? >> >> Saurabh. >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 04:06
> hive.exec.compress.output=true is the correct option. Can you post the
> "insert" command that you run which produced non-compressed results? > Is the output in TextFileFormat or SequenceFileFormat? Here's the query. raw_compressed is a SequenceFile table with raw lines. raw is a SequenceFile table with separate columns for each data field. from raw_compressed insert overwrite table raw partition (dt='2009-04-02') select transform(line) using 'parse_logs.rb' as ip_address, aid, uid, ts, method, uri, response, referer, user_agent, cookies, ptime Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-14, 04:21
What is the average file size in table raw?
Can you put a log line in FileSinkOperator.java:107 ? That will tell us whether compression is turned on or not. Zheng On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<[EMAIL PROTECTED]> wrote: > >> hive.exec.compress.output=true is the correct option. Can you post the >> "insert" command that you run which produced non-compressed results? >> Is the output in TextFileFormat or SequenceFileFormat? > > Here's the query. raw_compressed is a SequenceFile table with raw lines. raw > is a SequenceFile table with separate columns for each data field. > > from raw_compressed > insert overwrite table raw partition (dt='2009-04-02') > select transform(line) using 'parse_logs.rb' as ip_address, aid, uid, > ts, method, uri, response, referer, user_agent, cookies, ptime > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 05:24
Files in table raw_compressed start with this header:
SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text||'org.apache.hadoop.io.compress.GzipCodec Files in table raw start with this header: SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text File size for raw_compressed: 250MB File size for raw: 2150 MB After "boolean isCompressed = conf.getCompressed();" should I put "LOG.info("Compression config is:" + isCompressed);" ? Saurabh. On Fri, Aug 14, 2009 at 9:51 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > What is the average file size in table raw? > > Can you put a log line in FileSinkOperator.java:107 ? That will tell > us whether compression is turned on or not. > > Zheng > > On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<[EMAIL PROTECTED]> > wrote: > > > >> hive.exec.compress.output=true is the correct option. Can you post the > >> "insert" command that you run which produced non-compressed results? > >> Is the output in TextFileFormat or SequenceFileFormat? > > > > Here's the query. raw_compressed is a SequenceFile table with raw lines. > raw > > is a SequenceFile table with separate columns for each data field. > > > > from raw_compressed > > insert overwrite table raw partition (dt='2009-04-02') > > select transform(line) using 'parse_logs.rb' as ip_address, aid, uid, > > ts, method, uri, response, referer, user_agent, cookies, ptime > > > > Saurabh. > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 05:28
I can't find the previous log entry anywhere -- LOG.info("Writing to temp
file: FS " + outPath); -- where should I be looking? Should I configure log4j differently for LOG.info to show up? Saurabh. On Fri, Aug 14, 2009 at 10:54 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Files in table raw_compressed start with this header: > SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text||'org.apache.hadoop.io.compress.GzipCodec > > Files in table raw start with this header: > SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text > > File size for raw_compressed: 250MB > File size for raw: 2150 MB > > After "boolean isCompressed = conf.getCompressed();" should I put > "LOG.info("Compression config is:" + isCompressed);" ? > > Saurabh. > > > On Fri, Aug 14, 2009 at 9:51 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> What is the average file size in table raw? >> >> Can you put a log line in FileSinkOperator.java:107 ? That will tell >> us whether compression is turned on or not. >> >> Zheng >> >> On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<[EMAIL PROTECTED]> >> wrote: >> > >> >> hive.exec.compress.output=true is the correct option. Can you post the >> >> "insert" command that you run which produced non-compressed results? >> >> Is the output in TextFileFormat or SequenceFileFormat? >> > >> > Here's the query. raw_compressed is a SequenceFile table with raw lines. >> raw >> > is a SequenceFile table with separate columns for each data field. >> > >> > from raw_compressed >> > insert overwrite table raw partition (dt='2009-04-02') >> > select transform(line) using 'parse_logs.rb' as ip_address, aid, >> uid, >> > ts, method, uri, response, referer, user_agent, cookies, ptime >> > >> > Saurabh. >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 05:29
Sorry, found it. It's in the task logs for the reduce jobs.
Saurabh. On Fri, Aug 14, 2009 at 10:58 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > I can't find the previous log entry anywhere -- LOG.info("Writing to temp > file: FS " + outPath); -- where should I be looking? Should I configure > log4j differently for LOG.info to show up? > > Saurabh. > > > On Fri, Aug 14, 2009 at 10:54 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> Files in table raw_compressed start with this header: >> SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text||'org.apache.hadoop.io.compress.GzipCodec >> >> Files in table raw start with this header: >> SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text >> >> File size for raw_compressed: 250MB >> File size for raw: 2150 MB >> >> After "boolean isCompressed = conf.getCompressed();" should I put >> "LOG.info("Compression config is:" + isCompressed);" ? >> >> Saurabh. >> >> >> On Fri, Aug 14, 2009 at 9:51 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> What is the average file size in table raw? >>> >>> Can you put a log line in FileSinkOperator.java:107 ? That will tell >>> us whether compression is turned on or not. >>> >>> Zheng >>> >>> On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<[EMAIL PROTECTED]> >>> wrote: >>> > >>> >> hive.exec.compress.output=true is the correct option. Can you post the >>> >> "insert" command that you run which produced non-compressed results? >>> >> Is the output in TextFileFormat or SequenceFileFormat? >>> > >>> > Here's the query. raw_compressed is a SequenceFile table with raw >>> lines. raw >>> > is a SequenceFile table with separate columns for each data field. >>> > >>> > from raw_compressed >>> > insert overwrite table raw partition (dt='2009-04-02') >>> > select transform(line) using 'parse_logs.rb' as ip_address, aid, >>> uid, >>> > ts, method, uri, response, referer, user_agent, cookies, ptime >>> > >>> > Saurabh. >>> > -- >>> > http://nandz.blogspot.com >>> > http://foodieforlife.blogspot.com >>> > >>> >>> >>> >>> -- >>> Yours, >>> Zheng >>> >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 06:34
The query is being split into two map/reduce jobs. The first job
consists of 16 map tasks (no reduce job). The relevant log output is given below: 2009-08-14 11:29:38,245 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0 2009-08-14 11:29:38,246 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration is:true 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor 2009-08-14 11:29:38,358 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS initialized 2009-08-14 11:29:38,358 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 FS The second job consists of 16 map tasks & 3 reduce tasks. None of the map tasks contain any log output from FileSinkOperator. The reduce tasks contain the following relevant log output: 2009-08-14 11:38:13,553 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS 2009-08-14 11:38:13,553 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS 2009-08-14 11:38:13,604 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0 2009-08-14 11:38:13,605 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration is:false 2009-08-14 11:38:43,128 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS initialized 2009-08-14 11:38:43,128 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 FS You can see, that compression is "on" for the first map/reduce job, but "off" for the second one. Did I forget to set any configuration parameter? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-14, 07:09
Great. We are one step closer to the root cause.
Can you print out a log line here as well? This is the place that we fill in the compression option. SemanticAnalyzer.java:2711: Operator output = putOpInsertMap( OperatorFactory.getAndMakeChild( new fileSinkDesc(queryTmpdir, table_desc, conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId), fsRS, input), inputRR); Zheng On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<[EMAIL PROTECTED]> wrote: > The query is being split into two map/reduce jobs. The first job consists of > 16 map tasks (no reduce job). The relevant log output is given below: > > > 2009-08-14 11:29:38,245 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS > hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0 > 2009-08-14 11:29:38,246 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration > is:true > > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: Got > brand-new compressor > 2009-08-14 11:29:38,358 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS initialized > 2009-08-14 11:29:38,358 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 FS > > > The second job consists of 16 map tasks & 3 reduce tasks. None of the map > tasks contain any log output from FileSinkOperator. The reduce tasks contain > the following relevant log output: > > > 2009-08-14 11:38:13,553 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS > 2009-08-14 11:38:13,553 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS > 2009-08-14 11:38:13,604 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS > hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0 > > 2009-08-14 11:38:13,605 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration > is:false > 2009-08-14 11:38:43,128 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS initialized > > 2009-08-14 11:38:43,128 INFO > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 FS > > You can see, that compression is "on" for the first map/reduce job, but > "off" for the second one. Did I forget to set any configuration parameter? > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 07:34
I'm changing the LOG.debug statement to the following --
LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: " + dest_path + " row schema: " + inputRR.toString() + ". HiveConf.ConfVars.COMPRESSRESULT=" + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT)); Saurabh. On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Great. We are one step closer to the root cause. > > Can you print out a log line here as well? This is the place that we > fill in the compression option. > > SemanticAnalyzer.java:2711: > Operator output = putOpInsertMap( > OperatorFactory.getAndMakeChild( > new fileSinkDesc(queryTmpdir, table_desc, > > conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId), > fsRS, input), inputRR); > > > Zheng > > On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<[EMAIL PROTECTED]> > wrote: > > The query is being split into two map/reduce jobs. The first job consists > of > > 16 map tasks (no reduce job). The relevant log output is given below: > > > > > > 2009-08-14 11:29:38,245 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS > > > hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0 > > 2009-08-14 11:29:38,246 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression > configuration > > is:true > > > > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: Got > > brand-new compressor > > 2009-08-14 11:29:38,358 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS > initialized > > 2009-08-14 11:29:38,358 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 FS > > > > > > The second job consists of 16 map tasks & 3 reduce tasks. None of the map > > tasks contain any log output from FileSinkOperator. The reduce tasks > contain > > the following relevant log output: > > > > > > 2009-08-14 11:38:13,553 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS > > 2009-08-14 11:38:13,553 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS > > 2009-08-14 11:38:13,604 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS > > > hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0 > > > > 2009-08-14 11:38:13,605 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression > configuration > > is:false > > 2009-08-14 11:38:43,128 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS > initialized > > > > 2009-08-14 11:38:43,128 INFO > > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 FS > > > > You can see, that compression is "on" for the first map/reduce job, but > > "off" for the second one. Did I forget to set any configuration > parameter? > > > > Saurabh. > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 09:01
I changed the log statement, rebuilt Hive, and re-ran the insert query. I
didn't find this log entry anywhere. Where exactly should I be looking for this log entry? Saurabh. On Fri, Aug 14, 2009 at 1:04 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > I'm changing the LOG.debug statement to the following -- > > LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: " > + dest_path + " row schema: " > + inputRR.toString() + ". HiveConf.ConfVars.COMPRESSRESULT=" > + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT)); > > Saurabh. > > > On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> Great. We are one step closer to the root cause. >> >> Can you print out a log line here as well? This is the place that we >> fill in the compression option. >> >> SemanticAnalyzer.java:2711: >> Operator output = putOpInsertMap( >> OperatorFactory.getAndMakeChild( >> new fileSinkDesc(queryTmpdir, table_desc, >> >> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId), >> fsRS, input), inputRR); >> >> >> Zheng >> >> On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<[EMAIL PROTECTED]> >> wrote: >> > The query is being split into two map/reduce jobs. The first job >> consists of >> > 16 map tasks (no reduce job). The relevant log output is given below: >> > >> > >> > 2009-08-14 11:29:38,245 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: >> FS >> > >> hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0 >> > 2009-08-14 11:29:38,246 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression >> configuration >> > is:true >> > >> > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: >> Got >> > brand-new compressor >> > 2009-08-14 11:29:38,358 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS >> initialized >> > 2009-08-14 11:29:38,358 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 >> FS >> > >> > >> > The second job consists of 16 map tasks & 3 reduce tasks. None of the >> map >> > tasks contain any log output from FileSinkOperator. The reduce tasks >> contain >> > the following relevant log output: >> > >> > >> > 2009-08-14 11:38:13,553 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS >> > 2009-08-14 11:38:13,553 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS >> > 2009-08-14 11:38:13,604 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: >> FS >> > >> hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0 >> > >> > 2009-08-14 11:38:13,605 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression >> configuration >> > is:false >> > 2009-08-14 11:38:43,128 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS >> initialized >> > >> > 2009-08-14 11:38:43,128 INFO >> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 >> FS >> > >> > You can see, that compression is "on" for the first map/reduce job, but >> > "off" for the second one. Did I forget to set any configuration >> parameter? >> > >> > Saurabh. >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-14, 09:03
The statement I changed was in the function genFileSinkPlan() and was on
line 2571 not 2711 Saurabh. On Fri, Aug 14, 2009 at 2:31 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > I changed the log statement, rebuilt Hive, and re-ran the insert query. I > didn't find this log entry anywhere. Where exactly should I be looking for > this log entry? > > Saurabh. > > > On Fri, Aug 14, 2009 at 1:04 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> I'm changing the LOG.debug statement to the following -- >> >> LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: " >> + dest_path + " row schema: " >> + inputRR.toString() + ". HiveConf.ConfVars.COMPRESSRESULT=" >> + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT)); >> >> Saurabh. >> >> >> On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> Great. We are one step closer to the root cause. >>> >>> Can you print out a log line here as well? This is the place that we >>> fill in the compression option. >>> >>> SemanticAnalyzer.java:2711: >>> Operator output = putOpInsertMap( >>> OperatorFactory.getAndMakeChild( >>> new fileSinkDesc(queryTmpdir, table_desc, >>> >>> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId), >>> fsRS, input), inputRR); >>> >>> >>> Zheng >>> >>> On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<[EMAIL PROTECTED]> >>> wrote: >>> > The query is being split into two map/reduce jobs. The first job >>> consists of >>> > 16 map tasks (no reduce job). The relevant log output is given below: >>> > >>> > >>> > 2009-08-14 11:29:38,245 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: >>> FS >>> > >>> hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0 >>> > 2009-08-14 11:29:38,246 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression >>> configuration >>> > is:true >>> > >>> > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: >>> Got >>> > brand-new compressor >>> > 2009-08-14 11:29:38,358 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS >>> initialized >>> > 2009-08-14 11:29:38,358 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 >>> FS >>> > >>> > >>> > The second job consists of 16 map tasks & 3 reduce tasks. None of the >>> map >>> > tasks contain any log output from FileSinkOperator. The reduce tasks >>> contain >>> > the following relevant log output: >>> > >>> > >>> > 2009-08-14 11:38:13,553 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 >>> FS >>> > 2009-08-14 11:38:13,553 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS >>> > 2009-08-14 11:38:13,604 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: >>> FS >>> > >>> hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0 >>> > >>> > 2009-08-14 11:38:13,605 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression >>> configuration >>> > is:false >>> > 2009-08-14 11:38:43,128 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS >>> initialized >>> > >>> > 2009-08-14 11:38:43,128 INFO >>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 >>> FS >>> > >>> > You can see, that compression is "on" for the first map/reduce job, >>> but >>> > "off" for the second one. Did I forget to set any configuration >>> parameter? >>> > >>> > Saurabh. >>> > -- >>> > http://nandz.blogspot.com >>> > http://foodieforlife.blogspot.com >>> > >>> >>> >>> >>> -- >>> Yours, >>> Zheng >>> >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-14, 10:03
Should be in /mnt/<yourname>/hive.log
This is specified in conf/hive-log4j.properties Zheng On Fri, Aug 14, 2009 at 2:03 AM, Saurabh Nanda<[EMAIL PROTECTED]> wrote: > The statement I changed was in the function genFileSinkPlan() and was on > line 2571 not 2711 > > Saurabh. > > On Fri, Aug 14, 2009 at 2:31 PM, Saurabh Nanda <[EMAIL PROTECTED]> > wrote: >> >> I changed the log statement, rebuilt Hive, and re-ran the insert query. I >> didn't find this log entry anywhere. Where exactly should I be looking for >> this log entry? >> >> Saurabh. >> >> On Fri, Aug 14, 2009 at 1:04 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> wrote: >>> >>> I'm changing the LOG.debug statement to the following -- >>> >>> LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: " >>> + dest_path + " row schema: " >>> + inputRR.toString() + ". >>> HiveConf.ConfVars.COMPRESSRESULT=" + >>> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT)); >>> >>> Saurabh. >>> >>> On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>>> >>>> Great. We are one step closer to the root cause. >>>> >>>> Can you print out a log line here as well? This is the place that we >>>> fill in the compression option. >>>> >>>> SemanticAnalyzer.java:2711: >>>> Operator output = putOpInsertMap( >>>> OperatorFactory.getAndMakeChild( >>>> new fileSinkDesc(queryTmpdir, table_desc, >>>> >>>> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId), >>>> fsRS, input), inputRR); >>>> >>>> >>>> Zheng >>>> >>>> On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<[EMAIL PROTECTED]> >>>> wrote: >>>> > The query is being split into two map/reduce jobs. The first job >>>> > consists of >>>> > 16 map tasks (no reduce job). The relevant log output is given below: >>>> > >>>> > >>>> > 2009-08-14 11:29:38,245 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: >>>> > FS >>>> > >>>> > hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0 >>>> > 2009-08-14 11:29:38,246 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression >>>> > configuration >>>> > is:true >>>> > >>>> > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: >>>> > Got >>>> > brand-new compressor >>>> > 2009-08-14 11:29:38,358 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS >>>> > initialized >>>> > 2009-08-14 11:29:38,358 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 >>>> > FS >>>> > >>>> > >>>> > The second job consists of 16 map tasks & 3 reduce tasks. None of the >>>> > map >>>> > tasks contain any log output from FileSinkOperator. The reduce tasks >>>> > contain >>>> > the following relevant log output: >>>> > >>>> > >>>> > 2009-08-14 11:38:13,553 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 >>>> > FS >>>> > 2009-08-14 11:38:13,553 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 >>>> > FS >>>> > 2009-08-14 11:38:13,604 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: >>>> > FS >>>> > >>>> > hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0 >>>> > >>>> > 2009-08-14 11:38:13,605 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression >>>> > configuration >>>> > is:false >>>> > 2009-08-14 11:38:43,128 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS >>>> > initialized >>>> > >>>> > 2009-08-14 11:38:43,128 INFO >>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 >>>> > FS >>>> > >>>> > You can see, that compression is "on" for the first map/reduce job, >>>> > but >>>> > "off" for the second one. Did I forget to set any configuration >>>> > parameter? >>>> > >>>> > Saurabh. >>>> > -- >>>> > http://nandz.blogspot.com >>>> > http://foodieforlife.blogspot.com Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-17, 06:58
I still can't find the log output anywhere.
*The log file is in /tmp/ct-admin/hive.log for me. The only contents in the log file are:* 2009-08-17 11:18:18,018 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-08-17 11:26:45,380 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. *Here's the exact change I made in SemanticAnalyzer.java:* Operator output = putOpInsertMap( OperatorFactory.getAndMakeChild( new fileSinkDesc(queryTmpdir, table_desc, conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId), fsRS, input), inputRR); LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: " + dest_path + " row schema: " + inputRR.toString() + ". HiveConf.ConfVars.COMPRESSRESULT=" + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT)); *Here's what conf/hive-log4j.properties looks like:* # Define some default values that can be overridden by system properties hive.root.logger=WARN,DRFA hive.log.dir=/tmp/${user.name} hive.log.file=hive.log # Define the root logger to the system property "hadoop.root.logger". log4j.rootLogger=${hive.root.logger}, EventCounter # Logging Threshold log4j.threshhold=ALL # # Daily Rolling File Appender # log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender log4j.appender.DRFA.File=${hive.log.dir}/${hive.log.file} # Rollver at midnight log4j.appender.DRFA.DatePattern=.yyyy-MM-dd # 30-day backup #log4j.appender.DRFA.MaxBackupIndex=30 log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout # Pattern format: Date LogLevel LoggerName LogMessage #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n # Debugging Pattern format log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n # # console # Add "console" to rootlogger above if you want to use this # log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n #custom logging levels # log4j.logger.root=DEBUG # # Event Counter Appender # Sends counts of logging messages at different severity levels to Hadoop Metrics. # log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter log4j.category.DataNucleus=ERROR,DRFA log4j.category.Datastore=ERROR,DRFA log4j.category.Datastore.Schema=ERROR,DRFA log4j.category.JPOX.Datastore=ERROR,DRFA log4j.category.JPOX.Plugin=ERROR,DRFA log4j.category.JPOX.MetaData=ERROR,DRFA log4j.category.JPOX.Query=ERROR,DRFA log4j.category.JPOX.General=ERROR,DRFA log4j.category.JPOX.Enhancer=ERROR,DRFA What is going wrong? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-17, 07:26
The default log level is WARN. Please change it to INFO.
hive.root.logger=INFO,DRFA Of course you can also use LOG.warn() in your test code. Zheng On Sun, Aug 16, 2009 at 11:58 PM, Saurabh Nanda<[EMAIL PROTECTED]> wrote: > I still can't find the log output anywhere. > > The log file is in /tmp/ct-admin/hive.log for me. The only contents in the > log file are: > > 2009-08-17 11:18:18,018 WARN mapred.JobClient > (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser > for parsing the arguments. Applications should implement Tool for the same. > 2009-08-17 11:26:45,380 WARN mapred.JobClient > (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser > for parsing the arguments. Applications should implement Tool for the same. > > Here's the exact change I made in SemanticAnalyzer.java: > > Operator output = putOpInsertMap( > OperatorFactory.getAndMakeChild( > new fileSinkDesc(queryTmpdir, table_desc, > conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), > currentTableId), > fsRS, input), inputRR); > > LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: " > + dest_path + " row schema: " > + inputRR.toString() > + ". HiveConf.ConfVars.COMPRESSRESULT=" > + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRES > ULT)); > > Here's what conf/hive-log4j.properties looks like: > > # Define some default values that can be overridden by system properties > hive.root.logger=WARN,DRFA > hive.log.dir=/tmp/${user.name} > hive.log.file=hive.log > > # Define the root logger to the system property "hadoop.root.logger". > log4j.rootLogger=${hive.root.logger}, EventCounter > > # Logging Threshold > log4j.threshhold=ALL > > # > # Daily Rolling File Appender > # > > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender > log4j.appender.DRFA.File=${hive.log.dir}/${hive.log.file} > > # Rollver at midnight > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd > > # 30-day backup > #log4j.appender.DRFA.MaxBackupIndex=30 > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout > > # Pattern format: Date LogLevel LoggerName LogMessage > #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n > # Debugging Pattern format > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} > (%F:%M(%L)) - %m%n > > > # > # console > # Add "console" to rootlogger above if you want to use this > # > > log4j.appender.console=org.apache.log4j.ConsoleAppender > log4j.appender.console.target=System.err > log4j.appender.console.layout=org.apache.log4j.PatternLayout > log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p > %c{2}: %m%n > > #custom logging levels > # log4j.logger.root=DEBUG > > # > # Event Counter Appender > # Sends counts of logging messages at different severity levels to Hadoop > Metrics. > # > log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter > > > log4j.category.DataNucleus=ERROR,DRFA > log4j.category.Datastore=ERROR,DRFA > log4j.category.Datastore.Schema=ERROR,DRFA > log4j.category.JPOX.Datastore=ERROR,DRFA > log4j.category.JPOX.Plugin=ERROR,DRFA > log4j.category.JPOX.MetaData=ERROR,DRFA > log4j.category.JPOX.Query=ERROR,DRFA > log4j.category.JPOX.General=ERROR,DRFA > log4j.category.JPOX.Enhancer=ERROR,DRFA > > What is going wrong? > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-17, 07:56
Strange. The compression configuration log entry was also info but I could
see it in the task logs: LOG.info("Compression configuration is:" + isCompressed); Saurabh. On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > The default log level is WARN. Please change it to INFO. > > hive.root.logger=INFO,DRFA > > Of course you can also use LOG.warn() in your test code. > > Zheng > > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-17, 07:58
Here's the log output:
2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for clause: insclause-0dest_path: hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: string)(_col9,_col9: string)(_col10,_col10: int)} . HiveConf.ConfVars.COMPRESSRESULT=true Is the SemanticAnalyszer run more than once in the lifetime of a job? Should I be looking for another log entry like this one? Saurabh. On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Strange. The compression configuration log entry was also info but I could > see it in the task logs: > > LOG.info("Compression configuration is:" + isCompressed); > > Saurabh. > > On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> The default log level is WARN. Please change it to INFO. >> >> hive.root.logger=INFO,DRFA >> >> Of course you can also use LOG.warn() in your test code. >> >> Zheng >> >> -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-17, 12:17
Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can
do some more digging and logging if required. Saurabh. On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Here's the log output: > > 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer > (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for > clause: insclause-0dest_path: > hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: > {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: > string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: > string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: > string)(_col9,_col9: string)(_col10,_col10: int)} . > HiveConf.ConfVars.COMPRESSRESULT=true > > Is the SemanticAnalyszer run more than once in the lifetime of a job? > Should I be looking for another log entry like this one? > > Saurabh. > > > On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> Strange. The compression configuration log entry was also info but I could >> see it in the task logs: >> >> LOG.info("Compression configuration is:" + isCompressed); >> >> Saurabh. >> >> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> The default log level is WARN. Please change it to INFO. >>> >>> hive.root.logger=INFO,DRFA >>> >>> Of course you can also use LOG.warn() in your test code. >>> >>> Zheng >>> >>> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-18, 04:13
Any clues anyone?
On Mon, Aug 17, 2009 at 5:47 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can > do some more digging and logging if required. > > Saurabh. > > > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> Here's the log output: >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for >> clause: insclause-0dest_path: >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >> string)(_col9,_col9: string)(_col10,_col10: int)} . >> HiveConf.ConfVars.COMPRESSRESULT=true >> >> Is the SemanticAnalyszer run more than once in the lifetime of a job? >> Should I be looking for another log entry like this one? >> >> Saurabh. >> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: >> >>> Strange. The compression configuration log entry was also info but I >>> could see it in the task logs: >>> >>> LOG.info("Compression configuration is:" + isCompressed); >>> >>> Saurabh. >>> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>> >>>> The default log level is WARN. Please change it to INFO. >>>> >>>> hive.root.logger=INFO,DRFA >>>> >>>> Of course you can also use LOG.warn() in your test code. >>>> >>>> Zheng >>>> >>>> -- >>> http://nandz.blogspot.com >>> http://foodieforlife.blogspot.com >>> >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-18, 05:49
Hi Saurabh,
So the compression flag is correct when the plan is generated. When you run the query, you should see "plan = xxx.xml" in the log file. Can you open that file (in HDFS) and see whether the compression flag is on or not? Zheng On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]> wrote: > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can > do some more digging and logging if required. > > Saurabh. > > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED]> > wrote: >> >> Here's the log output: >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for >> clause: insclause-0dest_path: >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >> string)(_col9,_col9: string)(_col10,_col10: int)} . >> HiveConf.ConfVars.COMPRESSRESULT=true >> >> Is the SemanticAnalyszer run more than once in the lifetime of a job? >> Should I be looking for another log entry like this one? >> >> Saurabh. >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> wrote: >>> >>> Strange. The compression configuration log entry was also info but I >>> could see it in the task logs: >>> >>> LOG.info("Compression configuration is:" + isCompressed); >>> >>> Saurabh. >>> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>>> >>>> The default log level is WARN. Please change it to INFO. >>>> >>>> hive.root.logger=INFO,DRFA >>>> >>>> Of course you can also use LOG.warn() in your test code. >>>> >>>> Zheng >>>> >>> -- >>> http://nandz.blogspot.com >>> http://foodieforlife.blogspot.com >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-18, 10:21
Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like
compression is on. Is there any difference in how CLI queries and JDBC queries are treated? Saurabh. On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Hi Saurabh, > > So the compression flag is correct when the plan is generated. > When you run the query, you should see "plan = xxx.xml" in the log > file. Can you open that file (in HDFS) and see whether the compression > flag is on or not? > > Zheng > > On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]> > wrote: > > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I > can > > do some more digging and logging if required. > > > > Saurabh. > > > > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED]> > > wrote: > >> > >> Here's the log output: > >> > >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer > >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan > for > >> clause: insclause-0dest_path: > >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: > >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: > >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: > >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: > >> string)(_col9,_col9: string)(_col10,_col10: int)} . > >> HiveConf.ConfVars.COMPRESSRESULT=true > >> > >> Is the SemanticAnalyszer run more than once in the lifetime of a job? > >> Should I be looking for another log entry like this one? > >> > >> Saurabh. > >> > >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED]> > >> wrote: > >>> > >>> Strange. The compression configuration log entry was also info but I > >>> could see it in the task logs: > >>> > >>> LOG.info("Compression configuration is:" + isCompressed); > >>> > >>> Saurabh. > >>> > >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >>>> > >>>> The default log level is WARN. Please change it to INFO. > >>>> > >>>> hive.root.logger=INFO,DRFA > >>>> > >>>> Of course you can also use LOG.warn() in your test code. > >>>> > >>>> Zheng > >>>> > >>> -- > >>> http://nandz.blogspot.com > >>> http://foodieforlife.blogspot.com > >> > >> > >> > >> -- > >> http://nandz.blogspot.com > >> http://foodieforlife.blogspot.com > > > > > > > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-19, 05:46
Any clue? Has anyone else tried to replicate this? Is this really a bug or
am I doing something obviously stupid? Saurabh. On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems > like compression is on. > > Is there any difference in how CLI queries and JDBC queries are treated? > > Saurabh. > > > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> Hi Saurabh, >> >> So the compression flag is correct when the plan is generated. >> When you run the query, you should see "plan = xxx.xml" in the log >> file. Can you open that file (in HDFS) and see whether the compression >> flag is on or not? >> >> Zheng >> >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]> >> wrote: >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I >> can >> > do some more digging and logging if required. >> > >> > Saurabh. >> > >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> > wrote: >> >> >> >> Here's the log output: >> >> >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan >> for >> >> clause: insclause-0dest_path: >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >> >> string)(_col9,_col9: string)(_col10,_col10: int)} . >> >> HiveConf.ConfVars.COMPRESSRESULT=true >> >> >> >> Is the SemanticAnalyszer run more than once in the lifetime of a job? >> >> Should I be looking for another log entry like this one? >> >> >> >> Saurabh. >> >> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED] >> > >> >> wrote: >> >>> >> >>> Strange. The compression configuration log entry was also info but I >> >>> could see it in the task logs: >> >>> >> >>> LOG.info("Compression configuration is:" + isCompressed); >> >>> >> >>> Saurabh. >> >>> >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> >> wrote: >> >>>> >> >>>> The default log level is WARN. Please change it to INFO. >> >>>> >> >>>> hive.root.logger=INFO,DRFA >> >>>> >> >>>> Of course you can also use LOG.warn() in your test code. >> >>>> >> >>>> Zheng >> >>>> >> >>> -- >> >>> http://nandz.blogspot.com >> >>> http://foodieforlife.blogspot.com >> >> >> >> >> >> >> >> -- >> >> http://nandz.blogspot.com >> >> http://foodieforlife.blogspot.com >> > >> > >> > >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-20, 19:43
Is anyone else facing this issue?
On Wed, Aug 19, 2009 at 11:16 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Any clue? Has anyone else tried to replicate this? Is this really a bug or > am I doing something obviously stupid? > > Saurabh. > > > On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems >> like compression is on. >> >> Is there any difference in how CLI queries and JDBC queries are treated? >> >> Saurabh. >> >> >> On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> Hi Saurabh, >>> >>> So the compression flag is correct when the plan is generated. >>> When you run the query, you should see "plan = xxx.xml" in the log >>> file. Can you open that file (in HDFS) and see whether the compression >>> flag is on or not? >>> >>> Zheng >>> >>> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]> >>> wrote: >>> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I >>> can >>> > do some more digging and logging if required. >>> > >>> > Saurabh. >>> > >>> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED] >>> > >>> > wrote: >>> >> >>> >> Here's the log output: >>> >> >>> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >>> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan >>> for >>> >> clause: insclause-0dest_path: >>> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: >>> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >>> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >>> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >>> >> string)(_col9,_col9: string)(_col10,_col10: int)} . >>> >> HiveConf.ConfVars.COMPRESSRESULT=true >>> >> >>> >> Is the SemanticAnalyszer run more than once in the lifetime of a job? >>> >> Should I be looking for another log entry like this one? >>> >> >>> >> Saurabh. >>> >> >>> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda < >>> [EMAIL PROTECTED]> >>> >> wrote: >>> >>> >>> >>> Strange. The compression configuration log entry was also info but I >>> >>> could see it in the task logs: >>> >>> >>> >>> LOG.info("Compression configuration is:" + isCompressed); >>> >>> >>> >>> Saurabh. >>> >>> >>> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> >>> wrote: >>> >>>> >>> >>>> The default log level is WARN. Please change it to INFO. >>> >>>> >>> >>>> hive.root.logger=INFO,DRFA >>> >>>> >>> >>>> Of course you can also use LOG.warn() in your test code. >>> >>>> >>> >>>> Zheng >>> >>>> >>> >>> -- >>> >>> http://nandz.blogspot.com >>> >>> http://foodieforlife.blogspot.com >>> >> >>> >> >>> >> >>> >> -- >>> >> http://nandz.blogspot.com >>> >> http://foodieforlife.blogspot.com >>> > >>> > >>> > >>> > -- >>> > http://nandz.blogspot.com >>> > http://foodieforlife.blogspot.com >>> > >>> >>> >>> >>> -- >>> Yours, >>> Zheng >>> >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
RE: Output compression not working on hive-trunk (r802989)Ashish Thusoo 2009-08-20, 20:05
Hi Saurabh,
Can you give a simple reproducible test case for this (unless you have already done so) ? Thanks, Ashish ________________________________ From: Saurabh Nanda [mailto:[EMAIL PROTECTED]] Sent: Thursday, August 20, 2009 12:43 PM To: [EMAIL PROTECTED] Subject: Re: Output compression not working on hive-trunk (r802989) Is anyone else facing this issue? On Wed, Aug 19, 2009 at 11:16 AM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Any clue? Has anyone else tried to replicate this? Is this really a bug or am I doing something obviously stupid? Saurabh. On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like compression is on. Is there any difference in how CLI queries and JDBC queries are treated? Saurabh. On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: Hi Saurabh, So the compression flag is correct when the plan is generated. When you run the query, you should see "plan = xxx.xml" in the log file. Can you open that file (in HDFS) and see whether the compression flag is on or not? Zheng On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can > do some more digging and logging if required. > > Saurabh. > > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> > wrote: >> >> Here's the log output: >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for >> clause: insclause-0dest_path: >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >> string)(_col9,_col9: string)(_col10,_col10: int)} . >> HiveConf.ConfVars.COMPRESSRESULT=true >> >> Is the SemanticAnalyszer run more than once in the lifetime of a job? >> Should I be looking for another log entry like this one? >> >> Saurabh. >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> wrote: >>> >>> Strange. The compression configuration log entry was also info but I >>> could see it in the task logs: >>> >>> LOG.info("Compression configuration is:" + isCompressed); >>> >>> Saurabh. >>> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >>>> >>>> The default log level is WARN. Please change it to INFO. >>>> >>>> hive.root.logger=INFO,DRFA >>>> >>>> Of course you can also use LOG.warn() in your test code. >>>> >>>> Zheng >>>> >>> -- >>> http://nandz.blogspot.com >>> http://foodieforlife.blogspot.com >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng -- http://nandz.blogspot.com http://foodieforlife.blogspot.com -- http://nandz.blogspot.com http://foodieforlife.blogspot.com -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-21, 05:34
How do i do that? Just steps to reproduce the error or a formal java
based test case? On 8/21/09, Ashish Thusoo <[EMAIL PROTECTED]> wrote: > Hi Saurabh, > > Can you give a simple reproducible test case for this (unless you have > already done so) ? > > Thanks, > Ashish > > ________________________________ > From: Saurabh Nanda [mailto:[EMAIL PROTECTED]] > Sent: Thursday, August 20, 2009 12:43 PM > To: [EMAIL PROTECTED] > Subject: Re: Output compression not working on hive-trunk (r802989) > > Is anyone else facing this issue? > > On Wed, Aug 19, 2009 at 11:16 AM, Saurabh Nanda > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > Any clue? Has anyone else tried to replicate this? Is this really a bug or > am I doing something obviously stupid? > > Saurabh. > > > On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like > compression is on. > > Is there any difference in how CLI queries and JDBC queries are treated? > > Saurabh. > > > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao > <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: > Hi Saurabh, > > So the compression flag is correct when the plan is generated. > When you run the query, you should see "plan = xxx.xml" in the log > file. Can you open that file (in HDFS) and see whether the compression > flag is on or not? > > Zheng > > On Mon, Aug 17, 2009 at 5:17 AM, Saurabh > Nanda<[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >> Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can >> do some more digging and logging if required. >> >> Saurabh. >> >> On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda >> <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >> wrote: >>> >>> Here's the log output: >>> >>> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >>> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for >>> clause: insclause-0dest_path: >>> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: >>> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >>> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >>> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >>> string)(_col9,_col9: string)(_col10,_col10: int)} . >>> HiveConf.ConfVars.COMPRESSRESULT=true >>> >>> Is the SemanticAnalyszer run more than once in the lifetime of a job? >>> Should I be looking for another log entry like this one? >>> >>> Saurabh. >>> >>> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda >>> <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> >>> wrote: >>>> >>>> Strange. The compression configuration log entry was also info but I >>>> could see it in the task logs: >>>> >>>> LOG.info("Compression configuration is:" + isCompressed); >>>> >>>> Saurabh. >>>> >>>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao >>>> <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: >>>>> >>>>> The default log level is WARN. Please change it to INFO. >>>>> >>>>> hive.root.logger=INFO,DRFA >>>>> >>>>> Of course you can also use LOG.warn() in your test code. >>>>> >>>>> Zheng >>>>> >>>> -- >>>> http://nandz.blogspot.com >>>> http://foodieforlife.blogspot.com >>> >>> >>> >>> -- >>> http://nandz.blogspot.com >>> http://foodieforlife.blogspot.com >> >> >> >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > Yours, > Zheng > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-21, 06:00
Hi Suarabh,
Sorry for the delay on this. We are busy with the production this week. I don't think there is much difference in CLI queries and JDBC queries. Yes, this is what I am talking about. Since your query has 2 map-reduce jobs, there will be two .xml files. Can you show us the second one? Does the second one also contains "<...>compressed<...>true<...>" in the section of FileSinkOperator? Zheng On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<[EMAIL PROTECTED]> wrote: > Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like > compression is on. > > Is there any difference in how CLI queries and JDBC queries are treated? > > Saurabh. > > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >> Hi Saurabh, >> >> So the compression flag is correct when the plan is generated. >> When you run the query, you should see "plan = xxx.xml" in the log >> file. Can you open that file (in HDFS) and see whether the compression >> flag is on or not? >> >> Zheng >> >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]> >> wrote: >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I >> > can >> > do some more digging and logging if required. >> > >> > Saurabh. >> > >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> > wrote: >> >> >> >> Here's the log output: >> >> >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan >> >> for >> >> clause: insclause-0dest_path: >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema: >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >> >> string)(_col9,_col9: string)(_col10,_col10: int)} . >> >> HiveConf.ConfVars.COMPRESSRESULT=true >> >> >> >> Is the SemanticAnalyszer run more than once in the lifetime of a job? >> >> Should I be looking for another log entry like this one? >> >> >> >> Saurabh. >> >> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <[EMAIL PROTECTED]> >> >> wrote: >> >>> >> >>> Strange. The compression configuration log entry was also info but I >> >>> could see it in the task logs: >> >>> >> >>> LOG.info("Compression configuration is:" + isCompressed); >> >>> >> >>> Saurabh. >> >>> >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>>> >> >>>> The default log level is WARN. Please change it to INFO. >> >>>> >> >>>> hive.root.logger=INFO,DRFA >> >>>> >> >>>> Of course you can also use LOG.warn() in your test code. >> >>>> >> >>>> Zheng >> >>>> >> >>> -- >> >>> http://nandz.blogspot.com >> >>> http://foodieforlife.blogspot.com >> >> >> >> >> >> >> >> -- >> >> http://nandz.blogspot.com >> >> http://foodieforlife.blogspot.com >> > >> > >> > >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-25, 07:19
Hi Zheng,
Here's the plan for the second map-reduce job -- http://pastebin.com/m59d5a84b I don't see compression anywhere. Saurabh. On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Hi Suarabh, > > Sorry for the delay on this. We are busy with the production this week. > > I don't think there is much difference in CLI queries and JDBC queries. > > Yes, this is what I am talking about. Since your query has 2 > map-reduce jobs, there will be two .xml files. > Can you show us the second one? Does the second one also contains > "<...>compressed<...>true<...>" in the section of FileSinkOperator? > > Zheng > > On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<[EMAIL PROTECTED]> > wrote: > > Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems > like > > compression is on. > > > > Is there any difference in how CLI queries and JDBC queries are treated? > > > > Saurabh. > > > > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> > >> Hi Saurabh, > >> > >> So the compression flag is correct when the plan is generated. > >> When you run the query, you should see "plan = xxx.xml" in the log > >> file. Can you open that file (in HDFS) and see whether the compression > >> flag is on or not? > >> > >> Zheng > >> > >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]> > >> wrote: > >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I > >> > can > >> > do some more digging and logging if required. > >> > > >> > Saurabh. > >> > > >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda < > [EMAIL PROTECTED]> > >> > wrote: > >> >> > >> >> Here's the log output: > >> >> > >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer > >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan > >> >> for > >> >> clause: insclause-0dest_path: > >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row > schema: > >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: > >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: > >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: > >> >> string)(_col9,_col9: string)(_col10,_col10: int)} . > >> >> HiveConf.ConfVars.COMPRESSRESULT=true > >> >> > >> >> Is the SemanticAnalyszer run more than once in the lifetime of a job? > >> >> Should I be looking for another log entry like this one? > >> >> > >> >> Saurabh. > >> >> > >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda < > [EMAIL PROTECTED]> > >> >> wrote: > >> >>> > >> >>> Strange. The compression configuration log entry was also info but I > >> >>> could see it in the task logs: > >> >>> > >> >>> LOG.info("Compression configuration is:" + isCompressed); > >> >>> > >> >>> Saurabh. > >> >>> > >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> > wrote: > >> >>>> > >> >>>> The default log level is WARN. Please change it to INFO. > >> >>>> > >> >>>> hive.root.logger=INFO,DRFA > >> >>>> > >> >>>> Of course you can also use LOG.warn() in your test code. > >> >>>> > >> >>>> Zheng > >> >>>> > >> >>> -- > >> >>> http://nandz.blogspot.com > >> >>> http://foodieforlife.blogspot.com > >> >> > >> >> > >> >> > >> >> -- > >> >> http://nandz.blogspot.com > >> >> http://foodieforlife.blogspot.com > >> > > >> > > >> > > >> > -- > >> > http://nandz.blogspot.com > >> > http://foodieforlife.blogspot.com > >> > > >> > >> > >> > >> -- > >> Yours, > >> Zheng > > > > > > > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
-
Re: Output compression not working on hive-trunk (r802989)Zheng Shao 2009-08-25, 08:23
Hi Saurabh,
Finally I found the line of code. See https://issues.apache.org/jira/browse/HIVE-794 for details. Can you help make a patch for that? Zheng On Tue, Aug 25, 2009 at 12:19 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > Hi Zheng, > > Here's the plan for the second map-reduce job -- > http://pastebin.com/m59d5a84b > I don't see compression anywhere. > > Saurabh. > > > On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: > >> Hi Suarabh, >> >> Sorry for the delay on this. We are busy with the production this week. >> >> I don't think there is much difference in CLI queries and JDBC queries. >> >> Yes, this is what I am talking about. Since your query has 2 >> map-reduce jobs, there will be two .xml files. >> Can you show us the second one? Does the second one also contains >> "<...>compressed<...>true<...>" in the section of FileSinkOperator? >> >> Zheng >> >> On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<[EMAIL PROTECTED]> >> wrote: >> > Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems >> like >> > compression is on. >> > >> > Is there any difference in how CLI queries and JDBC queries are treated? >> > >> > Saurabh. >> > >> > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >> >> >> Hi Saurabh, >> >> >> >> So the compression flag is correct when the plan is generated. >> >> When you run the query, you should see "plan = xxx.xml" in the log >> >> file. Can you open that file (in HDFS) and see whether the compression >> >> flag is on or not? >> >> >> >> Zheng >> >> >> >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED]> >> >> wrote: >> >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? >> I >> >> > can >> >> > do some more digging and logging if required. >> >> > >> >> > Saurabh. >> >> > >> >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda < >> [EMAIL PROTECTED]> >> >> > wrote: >> >> >> >> >> >> Here's the log output: >> >> >> >> >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >> >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink >> Plan >> >> >> for >> >> >> clause: insclause-0dest_path: >> >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row >> schema: >> >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >> >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >> >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >> >> >> string)(_col9,_col9: string)(_col10,_col10: int)} . >> >> >> HiveConf.ConfVars.COMPRESSRESULT=true >> >> >> >> >> >> Is the SemanticAnalyszer run more than once in the lifetime of a >> job? >> >> >> Should I be looking for another log entry like this one? >> >> >> >> >> >> Saurabh. >> >> >> >> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda < >> [EMAIL PROTECTED]> >> >> >> wrote: >> >> >>> >> >> >>> Strange. The compression configuration log entry was also info but >> I >> >> >>> could see it in the task logs: >> >> >>> >> >> >>> LOG.info("Compression configuration is:" + isCompressed); >> >> >>> >> >> >>> Saurabh. >> >> >>> >> >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> >> wrote: >> >> >>>> >> >> >>>> The default log level is WARN. Please change it to INFO. >> >> >>>> >> >> >>>> hive.root.logger=INFO,DRFA >> >> >>>> >> >> >>>> Of course you can also use LOG.warn() in your test code. >> >> >>>> >> >> >>>> Zheng >> >> >>>> >> >> >>> -- >> >> >>> http://nandz.blogspot.com >> >> >>> http://foodieforlife.blogspot.com >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> http://nandz.blogspot.com >> >> >> http://foodieforlife.blogspot.com >> >> > >> >> > >> >> > >> >> > -- >> >> > http://nandz.blogspot.com >> >> > http://foodieforlife.blogspot.com >> >> > >> >> >> >> >> >> >> >> -- >> >> Yours, >> >> Zheng >> > >> > >> > >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- Yours, Zheng
-
Re: Output compression not working on hive-trunk (r802989)Saurabh Nanda 2009-08-25, 12:18
Hey Zheng, thanks for fixing the issue. I've commented on
https://issues.apache.org/jira/browse/HIVE-794 with the results of applying the change. Do you really need a patch for a one line change? Saurabh. On Tue, Aug 25, 2009 at 1:53 PM, Zheng Shao <[EMAIL PROTECTED]> wrote: > Hi Saurabh, > > Finally I found the line of code. See > https://issues.apache.org/jira/browse/HIVE-794 for details. > Can you help make a patch for that? > > Zheng > > > On Tue, Aug 25, 2009 at 12:19 AM, Saurabh Nanda <[EMAIL PROTECTED]>wrote: > >> Hi Zheng, >> >> Here's the plan for the second map-reduce job -- >> http://pastebin.com/m59d5a84b >> I don't see compression anywhere. >> >> Saurabh. >> >> >> On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >> >>> Hi Suarabh, >>> >>> Sorry for the delay on this. We are busy with the production this week. >>> >>> I don't think there is much difference in CLI queries and JDBC queries. >>> >>> Yes, this is what I am talking about. Since your query has 2 >>> map-reduce jobs, there will be two .xml files. >>> Can you show us the second one? Does the second one also contains >>> "<...>compressed<...>true<...>" in the section of FileSinkOperator? >>> >>> Zheng >>> >>> On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<[EMAIL PROTECTED]> >>> wrote: >>> > Is this what you're talking about -- http://pastebin.ca/1533627 ? >>> Seems like >>> > compression is on. >>> > >>> > Is there any difference in how CLI queries and JDBC queries are >>> treated? >>> > >>> > Saurabh. >>> > >>> > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <[EMAIL PROTECTED]> wrote: >>> >> >>> >> Hi Saurabh, >>> >> >>> >> So the compression flag is correct when the plan is generated. >>> >> When you run the query, you should see "plan = xxx.xml" in the log >>> >> file. Can you open that file (in HDFS) and see whether the compression >>> >> flag is on or not? >>> >> >>> >> Zheng >>> >> >>> >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<[EMAIL PROTECTED] >>> > >>> >> wrote: >>> >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? >>> I >>> >> > can >>> >> > do some more digging and logging if required. >>> >> > >>> >> > Saurabh. >>> >> > >>> >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda < >>> [EMAIL PROTECTED]> >>> >> > wrote: >>> >> >> >>> >> >> Here's the log output: >>> >> >> >>> >> >> 2009-08-17 13:26:42,183 INFO parse.SemanticAnalyzer >>> >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink >>> Plan >>> >> >> for >>> >> >> clause: insclause-0dest_path: >>> >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row >>> schema: >>> >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2: >>> >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5: >>> >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8: >>> >> >> string)(_col9,_col9: string)(_col10,_col10: int)} . >>> >> >> HiveConf.ConfVars.COMPRESSRESULT=true >>> >> >> >>> >> >> Is the SemanticAnalyszer run more than once in the lifetime of a >>> job? >>> >> >> Should I be looking for another log entry like this one? >>> >> >> >>> >> >> Saurabh. >>> >> >> >>> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda < >>> [EMAIL PROTECTED]> >>> >> >> wrote: >>> >> >>> >>> >> >>> Strange. The compression configuration log entry was also info but >>> I >>> >> >>> could see it in the task logs: >>> >> >>> >>> >> >>> LOG.info("Compression configuration is:" + isCompressed); >>> >> >>> >>> >> >>> Saurabh. >>> >> >>> >>> >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <[EMAIL PROTECTED]> >>> wrote: >>> >> >>>> >>> >> >>>> The default log level is WARN. Please change it to INFO. >>> >> >>>> >>> >> >>>> hive.root.logger=INFO,DRFA >>> >> >>>> >>> >> >>>> Of course you can also use LOG.warn() in your test code. >>> >> >>>> >>> >> >>>> Zheng >>> >> >>>> >>> >> >>> -- >>> >> >>> http://nandz.blogspot.com >>> >> >>> http://foodieforlife.blogspot.com >>> >> >> >>> >> >> >>> >> >> http://nandz.blogspot.com http://foodieforlife.blogspot.com |