re: "data from raw data file into hbase table"
One approach is bulk loading..
If he's talking about using an Hbase table as the source of a MR job, then
On 5/25/12 2:35 AM, "Florin P" <[EMAIL PROTECTED]> wrote:
>I've read Lars George's blog
>at the end of the article, he mentioned "In the next post I will show you
>how to import data from a raw data
>file into a HBase table and how you eventually process the data in the
>HBase table. We will address questions like how many mappers and/or
>reducers are needed and how can I improve import and processing
>performance.". I looked in the blog up for these questions, but it seems
>that there is no article related. Do you knoe if he you touched these
>subjects into a different post or book? Particular I am interested
>1. how you can set up the number of mappers?
>2. number of mappers can be set up per region server? If yes how?
>3. How the big number of set up mappers can affect the data locality?
>4. is this algorithm for computing the number of mappers
>(https://issues.apache.org/jira/browse/HBASE-1172) still available
>the number of mappers specified when using TableInputFormat is strictly
>followed if less than total regions on the input table. If greater, the
>number of regions is used.
>This will modify the splitting algorithm to do the following:
> * Specify 0 mappers when you want # mappers = # regions
> * If you specify fewer mappers than regions, will use exactly the number
>you specify based on the current algorithm
> * If
>you specify more mappers than regions, will divide regions up by
>determining [start,X) [X,end). The number of mappers will always be a
>multiple of number of regions. This is so we do not have scanners
>spanning multiple regions.
>There is an additional issue in that the default number of mappers
>in JobConf is set to 1. That means if a user does not explicitly set
>number of map tasks, a single mapper will be used. "
>I'll look forward for you answers. Thank you.
>Kind regards, Florin