| clear query|facets|time |
Search criteria: .
Results from 71 to 80 from
112 (0.06s).
|
|
|
Loading phrases to help you refine your search...
|
|
Re: Re: bz2 Splits. - Hive - [mail # user]
|
|
... Makes sense. Better compression brought down a count(1) query from 100+ sec down to 40sec. The ETL phase is now taking 510sec as opposed to 700sec earlier. Do you also compress a...
|
|
|
Author: Saurabh Nanda,
2009-07-28, 06:08
|
|
|
Re: Re: bz2 Splits. - Hive - [mail # user]
|
|
... I've set mapred.output.compression.type and changed io.seqfile.compress.blocksize to 100,000,000 (100MB) and now 3600 MB files are down to 260MB! Is such high compression recomme...
|
|
|
Author: Saurabh Nanda,
2009-07-28, 05:38
|
|
|
Re: Re: bz2 Splits. - Hive - [mail # user]
|
|
... Here's the exact snippet from my shell script. Do I have to set these configuration parameters directly in the hadoop configuration file: ${HIVE_COMMAND} -e "set hi...
|
|
|
Author: Saurabh Nanda,
2009-07-28, 05:02
|
|
|
Re: bz2 Splits. - Hive - [mail # user]
|
|
... Is there a configuration parameter which controls this? Is it io.seqfile.compress.blocksize? It was set to 1,000,000 in hadoop-default.xml, which is approx 1MB. Saurabh. http://n...
|
|
|
Author: Saurabh Nanda,
2009-07-28, 04:13
|
|
|
Re: counting different regexes in a single pass - Hive - [mail # user]
|
|
...I think you can do that with regex replace and GROUP BY. Something like this- select replaced_col, count(1) from (select regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as repla...
|
|
|
Author: Saurabh Nanda,
2009-07-27, 19:00
|
|
|
Re: counting different regexes in a single pass - Hive - [mail # user]
|
|
...I think you can do that with regex replace and GROUP BY. Something like this- select replaced_col, count(1) from (select regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as repla...
|
|
|
Author: Saurabh Nanda,
2009-07-27, 18:59
|
|
|
Re: Re: bz2 Splits. - Hive - [mail # user]
|
|
...Why is there such a *big* difference in compression ratios between the gzip utility and Hive? Uncompressed file size: approx 3500 MB Gzip utility: approx 250 MB org.apache.hadoop.io.co...
|
|
|
Author: Saurabh Nanda,
2009-07-27, 15:38
|
|
|
Re: Re: bz2 Splits. - Hive - [mail # user]
|
|
... http://wiki.apache.org/hadoop/CompressedStorage (please QC and correct where wrong) http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff http://wiki.apache.org/hadoop/Hiv...
|
|
|
Author: Saurabh Nanda,
2009-07-27, 10:51
|
|
|
Re: Re: bz2 Splits. - Hive - [mail # user]
|
|
...One last question here. If both, TextFile and SequenceFile can be compressed, then what's the advantage of the SequenceFile format? Is it that a compressed file can be split into chunk...
|
|
|
Author: Saurabh Nanda,
2009-07-27, 10:06
|
|
|
Re: Re: bz2 Splits. - Hive - [mail # user]
|
|
...Some more stats, if anyone's interested. I ran all the three tables (described above) through my ETL query (as described in http://nandz.blogspot.com/2009/07/using-hive-for-weblog-analysis.h...
|
|
|
Author: Saurabh Nanda,
2009-07-27, 08:29
|
|
|
|