Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Re: importing a large table


Copy link to this message
-
Re: importing a large table
Rita 2012-03-31, 20:26
Heh. Thanks for the links. I already read the Do and Donts :-). The videos
volume is rather low.
I am already using lzo as my compression method. My regions are set to 30GB
in resident memory.
On Sat, Mar 31, 2012 at 1:19 PM, Marcos Ortiz <[EMAIL PROTECTED]> wrote:

> Well, doing some calculations, you have 18 TB of data, divided in 9200
> regions, you have approximately 2.4 GB by regions. Is this correct?
>
> Well, my first advice is that you have to unable the automatic split
> mechanism in HBase. It better to do this manually, but you will have an
> insane number on regions in short time.
>
> The second is to enable compression (Gzip, LZO, Snappy) in all your HBase
> cluster. This brings to you less data to work, and less network
> overhead.
>
> Omer, one of the Software Engineer at the LA Hadoop User Group gave a
> excellent talk about HBase called: "HBase Do's and Don'ts". I recommend
> that you should see this talk.
>
> See the post first in the Cloudera's blog:
> http://www.cloudera.com/blog/**2011/04/hbase-dos-and-donts/<http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/>
>
> - Video
> http://www.meetup.com/LA-HUG/**pages/Video_from_April_13th_**
> HBASE_DO%27S_and_DON%27TS/<http://www.meetup.com/LA-HUG/pages/Video_from_April_13th_HBASE_DO%27S_and_DON%27TS/>
>
>
>
> On 3/31/2012 5:33 AM, Rita wrote:
>
>> I have close to 9200 regions. Is there an example I can follow? or are
>> there tools to do this already?
>>
>>
>>
>> On Fri, Mar 30, 2012 at 10:11 AM, Marcos Ortiz <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>
>>
>>    On 03/30/2012 04:54 AM, Rita wrote:
>>
>>>    Thanks for the responses. I am using 0.90.4-cdh3. i exported the table
>>>    using hbase exporter. Yes, the previous table still exists but on a
>>>    different cluster.My region servers are large, close to 12GB in size.
>>>
>>    Which is the total number of your regions?
>>
>>     I want to understand regarding Hfiles. We export the table as a
>>> series of
>>>    Hfiles and then import them in?
>>>
>>    Yes, The simplest way to do this is using the TableOutputFormat, but
>>    if you use instead the HFileOutputFormat, the process will be more
>>    efficient, because using this feature (bulk loads) will use less CPU
>>    and network. With a MapReduce job, you prepare your data using the
>>    HFileOutputFormat (Hadoop's TotalOrderPartitioner class in used to
>>    partition the map output
>>    into disjoint ranges of the key space, corresponding to the key
>>    ranges of the regions in the table).
>>
>>
>>     What is the difference between that in the
>>>    regular MR export job?
>>>
>>    The main difference with regular MR jobs is the output, instead to
>>    use the classic ouput formats like TextOutputFormat,
>>    MultipleOutputFormat, SequenceFileOutputFormat, etc, you will use
>>    the HFileOutputFormat, that is the native data file type for HBase
>>    (HFile).
>>
>>       I idea sounds good because it sounds simple on the
>>>    surface :-)
>>>
>>
>>
>>>    On Fri, Mar 30, 2012 at 12:08 AM, Stack<[EMAIL PROTECTED]>  <mailto:
>>> [EMAIL PROTECTED]>  wrote:
>>>
>>>     On Thu, Mar 29, 2012 at 7:57 PM, Rita<[EMAIL PROTECTED]>
>>>>  <mailto:[EMAIL PROTECTED]>  wrote:
>>>>
>>>>     Hello,
>>>>>
>>>>>    I am importing a 40+ billion row table which I exported several
>>>>> months
>>>>>
>>>>    ago.
>>>>
>>>>>    The data size is close to 18TB on hdfs (3x replication).
>>>>>
>>>>>     Does the table from back then still exist?  Or do you remember what
>>>>    the key spread was like?  Could you precreate the old table?
>>>>
>>>>     My problem is when I try to import it with mapreduce it takes a few
>>>>> days
>>>>>
>>>>    --
>>>>
>>>>>    which is ok -- however when the job fails to whatever reason, I
>>>>> have to
>>>>>    restart everything. Is it possible to import the table in chunks
>>>>> like,
>>>>>    import 1/3, 2/3, and then finally 3/3  of the table?
>>>>>
>>>>>     Yeah.  Funny how the plug gets pulled on the rack when the three