Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # user - Mappers for Accumulo


Copy link to this message
-
Re: Mappers for Accumulo
William Slacum 2013-03-12, 17:21
Depending on the size of the tablet, you can lower the split threshold
and/or set new split points on the table.

On Mon, Mar 11, 2013 at 5:39 PM, Aji Janis <[EMAIL PROTECTED]> wrote:

> So we realized that all my data for the table of interest fits onto one
> tablet (HUGE tablet isn't it) ie we always had ONE mapper. So we said lets
> split the table by range so now we can have more mappers. So the next
> problem is  what if someone puts in start range as first row and end range
> as last row..... now I am back to One mapper. So what i need is some way to
> take in a range and split into a List<Range>.
>
>
>
> On Mon, Mar 11, 2013 at 5:13 PM, William Slacum <
> [EMAIL PROTECTED]> wrote:
>
>> So you want both auto adjusting and not auto adjusting depending on the
>> size of a range? I suppose you could lift the code for doing the adjusting,
>> and do some introspection on the ranges (such as "how may tablets do I have
>> in this range?") and apply as necessary.
>>
>>
>> On Mon, Mar 11, 2013 at 4:47 PM, Aji Janis <[EMAIL PROTECTED]> wrote:
>>
>>> So looks like doing a List<Range> is what I need so that I can have a
>>> mapper per range. However, a more interesting scenario is one when given a
>>> big range I want to split it into multiple ranges. In other words if my
>>> rowid was 1_hello, 2_hello, .... 9_hello, 10_hello. And the range given was
>>> 2 to 5. But i want one mapper per integer so 4 mappers in this case... any
>>> ideas on how I can accomplish that?
>>>
>>>
>>> Thanks all for suggestions.
>>>
>>>
>>> On Fri, Mar 8, 2013 at 7:02 PM, Keith Turner <[EMAIL PROTECTED]> wrote:
>>>
>>>> On Fri, Mar 8, 2013 at 4:17 PM, Aji Janis <[EMAIL PROTECTED]> wrote:
>>>> > Thank you. Follow up question.
>>>> >
>>>> > Would this enforce one mapper per range even if all the data (From
>>>> three
>>>> > ranges) is on one node/tablet?
>>>>
>>>> Look at disableAutoAdjustRanges(). This determines wether it creates a
>>>> mapper per tablet per range OR per range.
>>>>
>>>>
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Mar 8, 2013 at 1:17 PM, Mike Hugo <[EMAIL PROTECTED]> wrote:
>>>> >>
>>>> >> See AccumuloInputFormat
>>>> >>
>>>> >> ArrayList<Range> ranges = new ArrayList<Range>();
>>>> >> // populate array list of row ranges ...
>>>> >> AccumuloInputFormat.setRanges(job, ranges);
>>>> >>
>>>> >>
>>>> >> You should get one mapper per range.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <[EMAIL PROTECTED]>
>>>> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>>  I am trying to figure out how I can configure number of mappers
>>>> (if its
>>>> >>> even possible) based on a Accumulo row range. My accumulo rowid
>>>> uses the
>>>> >>> format:
>>>> >>>
>>>> >>> abc/1
>>>> >>> abc/2
>>>> >>> ...
>>>> >>> def/3
>>>> >>> ....
>>>> >>> xyz/13...
>>>> >>>
>>>> >>> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def
>>>> 5] ,
>>>> >>> [jkl/13 to klm 15]. and have one mapper work on one range, is there
>>>> a way I
>>>> >>> can do this?? How do I even set up my mapreduce job to accept these
>>>> >>> ranges??? Thankyou for all feedback.
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>