Home | About | Sematext search-lucene.com search-hadoop.com
clear query|facets|time Search criteria: .   Results from 71 to 80 from 112 (0.06s).
Loading phrases to help you
refine your search...
Re: Re: bz2 Splits. - Hive - [mail # user]
... Makes sense. Better compression brought down a count(1) query from 100+ sec down to 40sec. The ETL phase is now taking 510sec as opposed to 700sec earlier.  Do you also compress a...
   Author: Saurabh Nanda, 2009-07-28, 06:08
Re: Re: bz2 Splits. - Hive - [mail # user]
... I've set mapred.output.compression.type and changed io.seqfile.compress.blocksize to 100,000,000 (100MB) and now 3600 MB files are down to 260MB!  Is such high compression recomme...
   Author: Saurabh Nanda, 2009-07-28, 05:38
Re: Re: bz2 Splits. - Hive - [mail # user]
... Here's the exact snippet from my shell script. Do I have to set these configuration parameters directly in the hadoop configuration file:      ${HIVE_COMMAND} -e "set hi...
   Author: Saurabh Nanda, 2009-07-28, 05:02
Re: bz2 Splits. - Hive - [mail # user]
... Is there a configuration parameter which controls this? Is it io.seqfile.compress.blocksize? It was set to 1,000,000 in hadoop-default.xml, which is approx 1MB.  Saurabh. http://n...
   Author: Saurabh Nanda, 2009-07-28, 04:13
Re: counting different regexes in a single pass - Hive - [mail # user]
...I think you can do that with regex replace and GROUP BY. Something like this- select replaced_col, count(1) from (select regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as repla...
   Author: Saurabh Nanda, 2009-07-27, 19:00
Re: counting different regexes in a single pass - Hive - [mail # user]
...I think you can do that with regex replace and GROUP BY. Something like this- select replaced_col, count(1) from (select regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as repla...
   Author: Saurabh Nanda, 2009-07-27, 18:59
Re: Re: bz2 Splits. - Hive - [mail # user]
...Why is there such a *big* difference in compression ratios between the gzip utility and Hive?  Uncompressed file size: approx 3500 MB Gzip utility: approx 250 MB org.apache.hadoop.io.co...
   Author: Saurabh Nanda, 2009-07-27, 15:38
Re: Re: bz2 Splits. - Hive - [mail # user]
...  http://wiki.apache.org/hadoop/CompressedStorage (please QC and correct where wrong) http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff http://wiki.apache.org/hadoop/Hiv...
   Author: Saurabh Nanda, 2009-07-27, 10:51
Re: Re: bz2 Splits. - Hive - [mail # user]
...One last question here. If both, TextFile and SequenceFile can be compressed, then what's the advantage of the SequenceFile format?  Is it that a compressed file can be split into chunk...
   Author: Saurabh Nanda, 2009-07-27, 10:06
Re: Re: bz2 Splits. - Hive - [mail # user]
...Some more stats, if anyone's interested. I ran all the three tables (described above) through my ETL query (as described in http://nandz.blogspot.com/2009/07/using-hive-for-weblog-analysis.h...
   Author: Saurabh Nanda, 2009-07-27, 08:29
Sort:
project
Hive (112)
Hadoop (4)
type
mail # user (111)
issue (1)
date
last 7 days (0)
last 30 days (0)
last 90 days (0)
last 6 months (1)
last 9 months (112)
author
Namit Jain (802)
Edward Capriolo (631)
Zheng Shao (613)
Carl Steinbach (604)
John Sichi (297)
Mark Grover (269)
Ning Zhang (255)
Ashutosh Chauhan (250)
Nitin Pawar (220)
Ashish Thusoo (169)
Kevin Wilfong (161)
He Yongqiang (156)
Prasad Chakka (152)
Bejoy Ks (132)
Navis (128)
Saurabh Nanda