2) Number of records getting filtered: We don't have a counter specifically for this, but you can guess it by looking at map/reduce input/output records before/after the filter-by. If you use a visualization tool such as Lipstick, the input/output records of each MR job is displayed in the DAG.
On Tue, Jun 10, 2014 at 7:49 AM, Abhishek Agarwal <[EMAIL PROTECTED]> wrote:
2) is the approach to go for if there is only one filter on the map side. However, if you have operations, such as flatten or other filters on the map, you cannot associate the difference between map input and output records with particular filter operation. On Tue, Jun 10, 2014 at 8:30 PM, Cheolsoo Park <[EMAIL PROTECTED]> wrote: Regards, Abhishek Agarwal