Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Review Request: Improve RCFile::sync(long) by 10x


Copy link to this message
-
Re: Review Request: Improve RCFile::sync(long) by 10x
Ashutosh Chauhan 2013-04-26, 15:13

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10795/#review19770
-----------------------------------------------------------

Ship it!
Ship It!

- Ashutosh Chauhan
On April 26, 2013, 11:25 a.m., Gopal V wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10795/
> -----------------------------------------------------------
>
> (Updated April 26, 2013, 11:25 a.m.)
>
>
> Review request for hive, Ashutosh Chauhan and Gunther Hagleitner.
>
>
> Description
> -------
>
> Speed up RCFile::sync() by reading large blocks of data from HDFS rather than using readByte() on the input stream.
>
> This improves the loop behaviour and reduces the number of calls on the synchronized read() methods within HDFS, resulting in a 10x performance boost to this function.
>
> In real time, it converts a call that takes upto a second and brings it below 100ms, by reading 512 byte chunks instead of reading data 1 byte at a time.
>
>
> This addresses bug HIVE-4423.
>     https://issues.apache.org/jira/browse/HIVE-4423
>
>
> Diffs
> -----
>
>   ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java d3d98d0
>
> Diff: https://reviews.apache.org/r/10795/diff/
>
>
> Testing
> -------
>
> ant test -Dtestcase=TestRCFile -Dmodule=ql
> ant test -Dtestcase=TestCliDriver -Dqfile_regex=.*rcfile.* -Dmodule=ql
>
> And benchmarking with count(1) on the store_sales rcfile table at scale=10
>
> before: 43.8, after: 39.5
>
>
> Thanks,
>
> Gopal V
>
>