Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Bulk loading a CSV file into HBase


Copy link to this message
-
Re: Bulk loading a CSV file into HBase
Harsh J 2012-05-28, 12:36
Anil,

Sorry for the late bump but just for your reference, this is cause of:
https://issues.apache.org/jira/browse/HADOOP-7995

On Fri, Mar 9, 2012 at 9:59 PM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi Lakshman,
>
> As per your last email, it seems that updating the doc seems to be an easy
> and right approach.
>
> Thanks,
> Anil Gupta
>
> On Fri, Mar 9, 2012 at 12:20 AM, Laxman <[EMAIL PROTECTED]> wrote:
>
>> Hi Anil,
>>
>> > instead of invoking "parser.parse(opts, args, true);" if somehow we can
>> > invoke "parser.parse(opts, args, false);" then all will be good. I
>> > haven't
>> > looked at the api to know about the possibility of same.
>>
>> Changing to parser.parse(opts, args, false) solves this problem.
>> I think, we need to consider the following before going for this change.
>>
>> This involves behavior change in legacy hadoop code.
>> Directly changing from true to false may cause behavioral compatibility
>> issue.
>>
>> Also, Setting it to false may not be correct all the times.
>>
>> Case #1 java
>> "java -Dprop1=val1 <Class> arg1 arg2" is different from "java <Class> arg1
>> arg2 -Dprop1=val1
>>
>> In this case it looks like parser.parse(opts, args, true) is correct
>>
>>
>> Case #2 linux
>> "ls -l /home" is same as "ls /home -l"
>>
>> In this case it looks like parser.parse(opts, args, false) is correct
>>
>> >> This is probably too late IIRC
>> Hope, Stack also meant the same point here.
>>
>> > Could you please tell me the meaning of "IIRC"?
>> IIRC - If I Recall/Remember Correctly
>>
>> --
>> Regards,
>> Laxman
>>
>> > -----Original Message-----
>> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
>> > anil gupta
>> > Sent: Friday, March 09, 2012 3:12 AM
>> > To: [EMAIL PROTECTED]
>> > Subject: Re: Bulk loading a CSV file into HBase
>> >
>> > Yeah after digging further into the code: Line#374 in
>> > GenericOptionsParser.java "commandLine = parser.parse(opts, args,
>> > true);"
>> > is the culprit. Nice find, Shrijeet. That answers my question. :)
>> >
>> > Stack:
>> > Could you please tell me the meaning of "IIRC"? Updating the document
>> > is
>> > good but as per the behavior of parse() other -D option will also be
>> > ignored if  tablename is followed by any -D option .
>> > Duplicating the GOP functionality does not seems to be a good idea .
>> > Maybe
>> > instead of invoking "parser.parse(opts, args, true);" if somehow we can
>> > invoke "parser.parse(opts, args, false);" then all will be good. I
>> > haven't
>> > looked at the api to know about the possibility of same. This is just
>> > food
>> > for thought.
>> >
>> > Thanks,
>> > Anil
>> >
>> >
>> >
>> > On Thu, Mar 8, 2012 at 12:06 PM, Shrijeet Paliwal
>> > <[EMAIL PROTECTED]>wrote:
>> >
>> > > GenericOptionsParser stops parsing the arguments as soon as first non
>> > > option is specified (refer :
>> > >
>> > > http://commons.apache.org/cli/api-
>> > 1.2/org/apache/commons/cli/Parser.html#parse(org.apache.commons.cli.Opt
>> > ions
>> > > ,
>> > > java.lang.String[], boolean))
>> > >
>> > > So in this cases as soon parses sees the table name arg , it ignore
>> > all
>> > > other properties specified with -D opt. Note it not only ignores
>> > separator
>> > > it is also ignoring importtsv.skip.bad.lines option in your run which
>> > > failed.
>> > >
>> > >
>> > >
>> > > On Thu, Mar 8, 2012 at 11:27 AM, Stack <[EMAIL PROTECTED]> wrote:
>> > >
>> > > > On Thu, Mar 8, 2012 at 11:14 AM, anil gupta <[EMAIL PROTECTED]>
>> > > wrote:
>> > > > > 1. Update the HBase bulk load documentation and specify that
>> > separator
>> > > > > argument should be next to program name.
>> > > >
>> > > > This would help.
>> > > >
>> > > > > 2. Fix the problem in the code itself by handling the separator
>> > > argument
>> > > > > explicitly. (Still, i am wondering why only separator value is
>> > not
>> > > being
>> > > > > set in jobconf automatically if it is not provided next to
>> > program
>> > > > name??)
>> > > > >
>>
Harsh J