Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Reduce task failing on job with error java.lang.IllegalStateException: Keys appended out-of-order


+
Andrew Catterall 2012-12-06, 14:03
+
William Slacum 2012-12-06, 14:07
+
William Slacum 2012-12-06, 14:08
+
Chris Burrell 2012-12-06, 14:15
+
Josh Elser 2012-12-06, 15:15
+
Chris Burrell 2012-12-06, 18:35
Copy link to this message
-
Re: Reduce task failing on job with error java.lang.IllegalStateException: Keys appended out-of-order
Josh Elser 2012-12-07, 03:33
Yup, the AccumuloOutputFormat essentially uses a BatchWriter.

I'm all in favor of any for fixing any inconsistencies or ambiguous areas
that might exist in the manual.

"high-speed" is a very subjective term, I do agree. It falls back to the
common assumption that a MapReduce job can sort resulting KeyValue pairs
quicker than Accumulo can for a given table (which is usually the case).

Please feel free to make a ticket, submit a patch with some wording
changes, etc. Making the end-user experience better is the ultimate goal.
On Thu, Dec 6, 2012 at 1:35 PM, Chris Burrell <[EMAIL PROTECTED]> wrote:

> Thanks. By limitation, I wasn't meaning drawback. I completely agree that
> this is a very useful feature. I was just trying to understand the
> requirements for the various "high-speed ingest" methods outlines on the
> Accumulo User manual.
>
> Can I suggest that we add a bit more detail to the Accumulo User Manual?
> In particular, the two points mentioned above.
>
>    - The AccumuloOutputFileFormat is the format used internally by
>    Accumulo to store the files.
>    - As a results of the above, the MapReduce job is required to create
>    the keys (rowId, columnFamily, columnQualifier, timestamp) in
>    lexicographical order.
>
> Presumably the MapReduce Ingest (AccumuloOutputFormat) uses the
> BatchWriters in the background?
> Chris
>
>
>
> On 6 December 2012 15:15, Josh Elser <[EMAIL PROTECTED]> wrote:
>
>>  The point of bulk-ingest is that you can perform this work "out of band"
>> from Accumulo. You can perform the work "somewhere else" and just tell
>> Accumulo to bring files online. The only potential work Accumulo has to do
>> at that point is maintain the internal tree of files (merging and splitting
>> as the table is configured). Given that we have this massively popular tool
>> for performing distributed sorting (cough MapReduce cough), I don't agree
>> with your assertion.
>>
>> If you don't want to be burdened with sorting output during the ingest
>> task, use live ingest (BatchWriters). For reasonable data flows, live
>> ingest tends to be faster; however, bulk ingest provides the ability to
>> scale to much larger flows of data while not tanking Accumulo.
>>
>>
>> On 12/6/12 9:15 AM, Chris Burrell wrote:
>>
>> Is this a limitation of the bulk ingest approach? Does the MapReduce job
>> need to give the data to the AccumuloOutputFileFormat in
>> a lexicographically-sorted manner? If so, is this not a rather big
>> limitation of this approach, as you need to ensure your data comes in from
>> your various data sources in a form such that the accumulo keys are then
>> sorted.
>>
>>  This seems to suggest that although the bulk ingest would be very
>> quick, you would lose most of the time trying to sort and adapt the source
>> files themselves in the MR job.
>>
>>  Chris
>>
>>
>>
>> On 6 December 2012 14:08, William Slacum <[EMAIL PROTECTED]>wrote:
>>
>>> Excuse me, 'col3' sorts lexicographically *after* 'col16'.
>>>
>>>
>>>  On Thu, Dec 6, 2012 at 9:07 AM, William Slacum <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> 'col3' sorts lexicographically before 'col16'. you'll either need to
>>>> encode your numerics or zero pad them.
>>>>
>>>>
>>>> On Thu, Dec 6, 2012 at 9:03 AM, Andrew Catterall <
>>>> [EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>  I am trying to run a bulk ingest to import data into Accumulo but it
>>>>> is failing at the reduce task with the below error:
>>>>>
>>>>>
>>>>>
>>>>> java.lang.IllegalStateException: Keys appended out-of-order.  New key
>>>>> client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:col3 [
>>>>> myVis] 9223372036854775807 false, previous key client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
>>>>> foo:col16 [myVis] 9223372036854775807 false
>>>>>
>>>>>         at
>>>>> org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:378)
>>>>>
>>>>>
>>>>>
>>>>> Could this be caused by the order at which the writes are being done?
+
Michael Flester 2012-12-07, 03:34