Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - [VOTE] port HADOOP-6218 (Split TFile by Record Sequence Number) to hadoop 0.20/0.21


Copy link to this message
-
[VOTE] port HADOOP-6218 (Split TFile by Record Sequence Number) to hadoop 0.20/0.21
Hong Tang 2009-10-12, 22:55
HADOOP-6218 exposed the internal "Location" object as a global Record  
Sequence Number (RecNum). The feature is useful in a number of ways:  
(1) support progress reporting for upper layers (object file, zebra);  
(2) use RecNum as cursor by a secondary index; (3) support aligned  
split across multiple parallel TFiles. Given that TFile is still at  
its early stage of being adopted, I suggest that we port the patch  
back to hadoop 0.20/0.21 now.

-Hong