Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - how to figure out the range of a split that failed?


Copy link to this message
-
Re: how to figure out the range of a split that failed?
edward choi 2010-07-06, 05:07
Thanks for the tip.
I actually already have tried your method. The command I wrote is like below

cerr << "reporter:counter:SkippingTaskCounters,MapProcessedRecords,1\n";

This actually produced some skipped records in skip folder. But the problem
is that the skipped records' text was all messed up. So I couldn't recycle
them. The broken text is at the end of this mail.
I don't know the reason. Maybe it's because I wrote some other information
on error stream(such as document ID. The command below is the one)

cerr << "Processing: " << docID << endl;

Anyway if I happen to have any progress, I will update through this post.

Broken Text:
SEQ#6;!org.apache.hadoop.io.LongWritable#25;org.apache.hadoop.io.Text#1;#1;*org.apache.hadoop.io.compress.DefaultCodec�9/#6;S#11;#2;#15;庇4z����#25;9/#6;S#11;#2;#15;庇4z��
x��#26;#17;#18;x�`#1;[#15;0텬�#6;,#1;w#14;x�c�c]#1;#7;r#2;8�$x�U]o#19;W#16;}i�H�Hi��Z�b+X�r�y접#16;�#4;"H���\#17;��탈�y��
6�%�b��EML#18;の]`[}Ø�}�則�霙�綴#23;�愁P�#18;<#18;#31;�q�賽���\��5~庾X�xp�c≠�=��<�嵬#14;p|3:#24;#28;#25;�B�A?-8�勁쫑�2��_�@���#18;F#A�6#16;
#15;G�}殘�蛛甦Q�$潾�,8#16;��#4;0鉛=�츈^曠�#29;�G�└&?日���;=~兪����묽��씔�nl�o]�판��M�d橫罹�|잣��C�#15;#4;噫�0#18;��#20;��p`���i#27;�U���z�4#20;t^d芚Mc}�qf��뎬�����_^#6;朽c���V��#3;}S�V콸�Q�%κ쐴*�O_�����s������:�kn��(����b��RX�+oh#1;S��8���梁#5;=㈍哥�0%DGw當�.
#7;�#20;3#1;V���y#6;�분�!A�0%�;���瀏��茁�'&?
�N#31;�jP��#8;��$��� %�m��밑!c�#8;�l#17;s����<``% �����廣#18;�>Y�i��#8;�璽#6;��Fb賃$x좃_�W�L
�\F#12;�g#7;i�Ix8#22;#25;���廟別<�毫�<u9�BI(5S��u#21;�#6;�後#3;�
#��#14;#15;~��弛欒��鯖�`h�����|�9�c�高#5;#16;�:課Y�Q��������#31;銳�占�쇽#31;��p
-�cK��#20;qY#4;%�OR�#18;��55�pJ��潁~�#14;�Qt�>HywK�\Dz觴���U��(����/����渾�Re'%�s#30;0k咫키L�#11;%{#17;K$��U+���1�B5�#12;7j�-�f~��~頃�K�`c\#26;�G�+t#23;-��dJc|s�#11;�b�#16;#6;vWA�2�荳�f�X�却"��M�����W�.��#23;��D=O~��#25;/���$
#20;s#14;��헷1M�e��#29;��긺 ^#3;�9��1�揷�]奸��;���0-�굇���
�#8;w욜=l泊�踰=c��u��瓣��S��#18;*�����奈y1룡#18;bM]��X�씬�7h#27;짼Zx�\琅윰'��n?-�s#31;�?�#17;�q�弘벧MV�*距�#28;已희�岸侖:N#2;�����刃#2;2�#20;
2010/7/6 Sharad Agarwal <[EMAIL PROTECTED]>

> to be precise you have to write on error stream ->
> for map:
> reporter:counter:SkippingTaskCounters,MapProcessedRecords,<count>
>
> for reduce:
> reporter:counter:SkippingTaskCounters,ReduceProcessedGroups,<count>
>
>
> edward choi wrote:
>
>> Thanks for the response. I went to the web page you told me and several
>> other pages that I found.
>> I am still not sure if I got it right.
>> If I am trying to increment COUNTER_MAP_PROCESS_RECORDS using Hadoop
>> Streaming, is the example below the way to do it? (assuming that I am using
>> c++)
>>
>> example:
>> cerr << "reporter:counter:counters,linecount,1" << endl;
>>
>>
>>
>