Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Hbase import Tsv performance (slow import)


+
Nick maillard 2012-10-24, 11:40
+
ramkrishna vasudevan 2012-10-24, 13:47
+
Nick maillard 2012-10-24, 10:15
+
Sonal Goyal 2012-10-24, 11:18
+
Nick maillard 2012-10-24, 10:05
+
Nick maillard 2012-10-24, 09:23
+
Nick maillard 2012-10-24, 14:35
+
Kevin Odell 2012-10-24, 16:18
+
anil gupta 2012-10-24, 16:30
+
Nick maillard 2012-10-24, 16:29
+
nick maillard 2012-10-24, 19:08
+
Nick maillard 2012-10-23, 17:13
+
Nicolas Liochon 2012-10-23, 17:32
+
Kevin Odell 2012-10-23, 17:47
+
lars hofhansl 2012-10-25, 04:10
+
Nick maillard 2012-10-23, 15:48
+
Anoop John 2012-10-24, 03:29
+
ramkrishna vasudevan 2012-10-24, 04:55
+
anil gupta 2012-10-24, 05:09
+
Anoop John 2012-10-24, 05:11
Copy link to this message
-
Re: Hbase import Tsv performance (slow import)
Hi Anil
                In case of bulk loading it is not like data is put into
HBase one by one.. The MR job will create an o/p like HFile.. It will
create the KVs and write to file in order as how HFile will look like.. The
the file is loaded into HBase finally.. Only for this final step HBase RS
will be used.. So there is no point in WAL there...  I am making it clear
for you?   The data is already present in form of raw data in some txt or
csv file  :)

-Anoop-

On Wed, Oct 24, 2012 at 10:41 AM, Anoop John <[EMAIL PROTECTED]> wrote:

> Hi Anil
>
>
>
> On Wed, Oct 24, 2012 at 10:39 AM, anil gupta <[EMAIL PROTECTED]>wrote:
>
>> Hi Anoop,
>>
>> As per your last email, did you mean that WAL is not used while using
>> HBase
>> Bulk Loader? If yes, then how we ensure "no data loss" in case of
>> RegionServer failure?
>>
>> Thanks,
>> Anil Gupta
>>
>> On Tue, Oct 23, 2012 at 9:55 PM, ramkrishna vasudevan <
>> [EMAIL PROTECTED]> wrote:
>>
>> > As Kevin suggested we can make use of bulk load that goes thro WAL and
>> > Memstore.  Or the second option will be to use the o/p of mappers to
>> create
>> > HFiles directly.
>> >
>> > Regards
>> > Ram
>> >
>> > On Wed, Oct 24, 2012 at 8:59 AM, Anoop John <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > Hi
>> > >     Using ImportTSV tool you are trying to bulk load your data. Can
>> you
>> > see
>> > > and tell how many mappers and reducers were there. Out of total time
>> what
>> > > is the time taken by the mapper phase and by the reducer phase.  Seems
>> > like
>> > > MR related issue (may be some conf issue). In this bulk load case
>> most of
>> > > the work is done by the MR job. It will read the raw data and convert
>> it
>> > > into Puts and write to HFiles. MR o/p is HFiles itself. The next part
>> in
>> > > ImportTSV will just put the HFiles under the table region store..
>>  There
>> > > wont be WAL usage in this bulk load.
>> > >
>> > > -Anoop-
>> > >
>> > > On Tue, Oct 23, 2012 at 9:18 PM, Nick maillard <
>> > > [EMAIL PROTECTED]> wrote:
>> > >
>> > > > Hi everyone
>> > > >
>> > > > I'm starting with hbase and testing for our needs. I have set up a
>> > hadoop
>> > > > cluster of Three machines and A Hbase cluster atop on the same three
>> > > > machines,
>> > > > one master two slaves.
>> > > >
>> > > > I am testing the Import of a 5GB csv file with the importTsv tool. I
>> > > > import the
>> > > > file in the HDFS and use the importTsv tool to import in Hbase.
>> > > >
>> > > > Right now it takes a little over an hour to complete. It creates
>> > around 2
>> > > > million entries in one table with a single family.
>> > > > If I use bulk uploading it goes down to 20 minutes.
>> > > >
>> > > > My hadoop has 21 map tasks but they all seem to be taking a very
>> long
>> > > time
>> > > > to
>> > > > finish many tasks end up in time out.
>> > > >
>> > > > I am wondering what I have missed in my configuration. I have
>> followed
>> > > the
>> > > > different prerequisites in the documentations but I am really
>> unsure as
>> > > to
>> > > > what
>> > > > is causing this slow down. If I were to apply the wordcount example
>> to
>> > > the
>> > > > same
>> > > > file it takes only minutes to complete so I am guessing the issue
>> lies
>> > in
>> > > > my
>> > > > Hbase configuration.
>> > > >
>> > > > Any help or pointers would by appreciated
>> > > >
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>
>
+
anil gupta 2012-10-24, 05:28
+
Anoop John 2012-10-24, 06:07
+
anil gupta 2012-10-24, 06:14
+
Anoop John 2012-10-24, 06:31
+
anil gupta 2012-10-24, 06:43
+
ramkrishna vasudevan 2012-10-24, 05:52
+
anil gupta 2012-10-24, 06:11
+
Jonathan Bishop 2012-10-25, 15:57
+
anil gupta 2012-10-25, 20:33
+
anil gupta 2012-10-25, 20:35
+
Anoop Sam John 2012-10-26, 04:07
+
Nicolas Liochon 2012-10-23, 16:46