Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Problem doing bulk load programmatically


Copy link to this message
-
Re: Problem doing bulk load programmatically
Harsh J 2012-05-24, 14:26
Sakin,

The bulk load method does run "hbase.loadincremental.threads.max"
threads (configurable) with the default being dynamically equal to the
number of CPUs available on the machine.

On Thu, May 24, 2012 at 3:10 PM, sakin cali <[EMAIL PROTECTED]> wrote:
> Thank you,
>
> As said the problem was about hdfs,
> putting dfs configuration directly into configuration object solved my
> problem
>
> --------------
> Configuration conf = HBaseConfiguration.create();
> conf.set("fs.default.name", "hdfs://master:54310");
>
> ..
> ..
> FileSystem fs = FileSystem.get(conf);
>
> -------------
>
> I have one more question,
> I have to get the execution time for bulk load
> Simply I get the time before and after the doBulkLoad method,
> But doBulkLoad spawns threads inside, Although I get a time difference, it
> will not be real execution time...
> What do you think?
>
> -------------
> start = System.currentTimeMillis();
> loader.doBulkLoad(dir,table);
> end = System.currentTimeMillis();
> diff= end-start;
> -------------
>
> On 24 May 2012 11:13, shashwat shriparv <[EMAIL PROTECTED]> wrote:
>
>> your hbase is not able to communicate to hadoop, so before starting hbase
>> put the core-site.xml, hdfs-site.xml  and mapred-site.xml from hadoop conf
>> dir to hbase conf dir and then start hbase and try,, also you can copy
>> hadoop core from hadoop to hbase library..
>>
>> Regards
>> ∞
>>
>> Shashwat Shriparv
>>
>>
>>
>>
>> On Thu, May 24, 2012 at 1:41 PM, shashwat shriparv <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Check if hdfs is running fine and you can access it, and haddop is able
>> to
>> > access the file try putting that file on hdfs and then try.
>> >
>> > Regards
>> >
>> > ∞
>> > Shashwat Shriparv
>> >
>> >
>> >
>> > On Thu, May 24, 2012 at 12:35 PM, sakin cali <[EMAIL PROTECTED]
>> >wrote:
>> >
>> >> Something new...
>> >>
>> >> When I start MiniDfsCluster and use its filesystem in my code,
>> >> bulk load finishes successfully..
>> >>
>> >> dfs = new MiniDFSCluster(conf, 2, true, (String[]) null);
>> >> 41.// set file system to the mini dfs just started up
>> >> 42.FileSystem fs = dfs.getFileSystem();
>> >>
>> >> I think I have a problem accessing running HDFS ..
>> >>
>> >>
>> >> On 24 May 2012 09:12, sakin cali <[EMAIL PROTECTED]> wrote:
>> >>
>> >> > Hi,
>> >> >
>> >> > I am trying generate an HFile and load it to hbase programmatically.
>> >> > (importtsv does not fit my requirements)
>> >> >
>> >> > I have modified TestLoadIncrementalHFiles class and try to load an
>> >> example
>> >> > hfile.
>> >> >
>> >> > But I get errors at the last step
>> >> >
>> >> > --------------------------------------
>> >> > java.io.IOException: java.io.IOException: Failed rename of *
>> >> > file:/tmp/hbase/test/myfam/_tmp/mytable,1.top* to *
>> >> >
>> >>
>> file:/tmp/hbase-hbase/hbase/mytable/742248ed8cda9fe0dce2f345149fa8d5/myfam/8964877901933124837
>> >> > *
>> >> >  at
>> >> >
>> >>
>> org.apache.hadoop.hbase.regionserver.StoreFile.rename(StoreFile.java:512)
>> >> > ----------------------------------------
>> >> >
>> >> > Although such an error occured, renaming seems succesfully because
>> when
>> >> I
>> >> > checked their md5sum values, I saw that both have the same values...
>> >> >
>> >> > I could not figure out the problem,
>> >> > any ideas?
>> >> >
>> >> > I have attached test code and logs.
>> >> >
>> >> > I'am using cloudera's virtualbox vm image for testing environment
>> >> > ( 64 bit centos, version Cdh3u4 )
>> >> >
>> >> >
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> >
>> >
>> > ∞
>> > Shashwat Shriparv
>> >
>> >
>> >
>>
>>
>> --
>>
>>
>> ∞
>> Shashwat Shriparv
>>

--
Harsh J