Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Reduce task failing on job with error java.lang.IllegalStateException: Keys appended out-of-order


+
Andrew Catterall 2012-12-06, 14:03
+
William Slacum 2012-12-06, 14:07
+
William Slacum 2012-12-06, 14:08
Copy link to this message
-
Re: Reduce task failing on job with error java.lang.IllegalStateException: Keys appended out-of-order
Is this a limitation of the bulk ingest approach? Does the MapReduce job
need to give the data to the AccumuloOutputFileFormat in
a lexicographically-sorted manner? If so, is this not a rather big
limitation of this approach, as you need to ensure your data comes in from
your various data sources in a form such that the accumulo keys are then
sorted.

This seems to suggest that although the bulk ingest would be very quick,
you would lose most of the time trying to sort and adapt the source files
themselves in the MR job.

Chris

On 6 December 2012 14:08, William Slacum <[EMAIL PROTECTED]>wrote:

> Excuse me, 'col3' sorts lexicographically *after* 'col16'.
>
>
> On Thu, Dec 6, 2012 at 9:07 AM, William Slacum <
> [EMAIL PROTECTED]> wrote:
>
>> 'col3' sorts lexicographically before 'col16'. you'll either need to
>> encode your numerics or zero pad them.
>>
>>
>> On Thu, Dec 6, 2012 at 9:03 AM, Andrew Catterall <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>>
>>> I am trying to run a bulk ingest to import data into Accumulo but it is
>>> failing at the reduce task with the below error:
>>>
>>>
>>>
>>> java.lang.IllegalStateException: Keys appended out-of-order.  New key
>>> client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:col3 [
>>> myVis] 9223372036854775807 false, previous key client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a
>>> foo:col16 [myVis] 9223372036854775807 false
>>>
>>>         at
>>> org.apache.accumulo.core.file.rfile.RFile$Writer.append(RFile.java:378)
>>>
>>>
>>>
>>> Could this be caused by the order at which the writes are being done?
>>>
>>>
>>> *-- Background*
>>>
>>> *
>>> *
>>>
>>> The input file is a tab separated file.  A sample row would look like:
>>>
>>> Data1    Data2    Data3    Data4    Data5    …             DataN
>>>
>>>
>>>
>>> The map parses the data, for each row, into a Map<String, String>.
>>> This will contain the following:
>>>
>>> Col1       Data1
>>>
>>> Col2       Data2
>>>
>>> Col3       Data3
>>>
>>> …
>>>
>>> ColN      DataN
>>>
>>>
>>> An outputKey is then generated for this row in the format *
>>> client@timeStamp@randomUUID*
>>>
>>> Then for each entry in Map<String, String> a outputValue is generated
>>> in the format *ColN|DataN*
>>>
>>> The outputKey and outputValue are written to Context.
>>>
>>>
>>>
>>> This completes successfully, however, the reduce task fails.
>>>
>>>
>>> My ReduceClass is as follows:
>>>
>>>
>>>
>>>       *public* *static* *class* ReduceClass *extends* Reducer<Text,Text,Key,Value>
>>> {
>>>
>>>          *public* *void* reduce(Text key, Iterable<Text> keyValues,
>>> Context output) *throws* IOException, InterruptedException {
>>>
>>>
>>>
>>>                 // for each value belonging to the key
>>>
>>>                 *for* (Text keyValue : keyValues) {
>>>
>>>
>>>
>>>                        //split the keyValue into *Col* and Data
>>>
>>>                      String[] values = keyValue.toString().split("\\|");
>>>
>>>
>>>
>>>                      // Generate key
>>>
>>>                      Key outputKey = *new* Key(key, *new* Text("foo"), *
>>> new* Text(values[0]), *new* Text("myVis"));
>>>
>>>
>>>
>>>                      // Generate value
>>>
>>>                      Value outputValue = *new* Value(values[1].getBytes(),
>>> 0, values[1].length());
>>>
>>>
>>>
>>>                      // Write to context
>>>
>>>                      output.write(outputKey, outputValue);
>>>
>>>                 }
>>>
>>>          }
>>>
>>>       }
>>>
>>>
>>>
>>>
>>> *-- Expected output*
>>>
>>>
>>>
>>> I am expecting the contents of the Accumulo table to be as follows:
>>>
>>>
>>>
>>> client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col1 [
>>> myVis] Data1
>>>
>>> client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col2 [
>>> myVis] Data2
>>>
>>> client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col3 [
>>> myVis] Data3
>>>
>>> client@20121206123059@0014efca-d8e8-492e-83cb-e5b6b7c49f7a foo:Col4 [
+
Josh Elser 2012-12-06, 15:15
+
Chris Burrell 2012-12-06, 18:35
+
Josh Elser 2012-12-07, 03:33
+
Michael Flester 2012-12-07, 03:34
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB