Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Bulk loading a CSV file into HBase


Copy link to this message
-
Re: Bulk loading a CSV file into HBase
Anil,

Sorry for the late bump but just for your reference, this is cause of:
https://issues.apache.org/jira/browse/HADOOP-7995

On Fri, Mar 9, 2012 at 9:59 PM, anil gupta <[EMAIL PROTECTED]> wrote:
> Hi Lakshman,
>
> As per your last email, it seems that updating the doc seems to be an easy
> and right approach.
>
> Thanks,
> Anil Gupta
>
> On Fri, Mar 9, 2012 at 12:20 AM, Laxman <[EMAIL PROTECTED]> wrote:
>
>> Hi Anil,
>>
>> > instead of invoking "parser.parse(opts, args, true);" if somehow we can
>> > invoke "parser.parse(opts, args, false);" then all will be good. I
>> > haven't
>> > looked at the api to know about the possibility of same.
>>
>> Changing to parser.parse(opts, args, false) solves this problem.
>> I think, we need to consider the following before going for this change.
>>
>> This involves behavior change in legacy hadoop code.
>> Directly changing from true to false may cause behavioral compatibility
>> issue.
>>
>> Also, Setting it to false may not be correct all the times.
>>
>> Case #1 java
>> "java -Dprop1=val1 <Class> arg1 arg2" is different from "java <Class> arg1
>> arg2 -Dprop1=val1
>>
>> In this case it looks like parser.parse(opts, args, true) is correct
>>
>>
>> Case #2 linux
>> "ls -l /home" is same as "ls /home -l"
>>
>> In this case it looks like parser.parse(opts, args, false) is correct
>>
>> >> This is probably too late IIRC
>> Hope, Stack also meant the same point here.
>>
>> > Could you please tell me the meaning of "IIRC"?
>> IIRC - If I Recall/Remember Correctly
>>
>> --
>> Regards,
>> Laxman
>>
>> > -----Original Message-----
>> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
>> > anil gupta
>> > Sent: Friday, March 09, 2012 3:12 AM
>> > To: [EMAIL PROTECTED]
>> > Subject: Re: Bulk loading a CSV file into HBase
>> >
>> > Yeah after digging further into the code: Line#374 in
>> > GenericOptionsParser.java "commandLine = parser.parse(opts, args,
>> > true);"
>> > is the culprit. Nice find, Shrijeet. That answers my question. :)
>> >
>> > Stack:
>> > Could you please tell me the meaning of "IIRC"? Updating the document
>> > is
>> > good but as per the behavior of parse() other -D option will also be
>> > ignored if  tablename is followed by any -D option .
>> > Duplicating the GOP functionality does not seems to be a good idea .
>> > Maybe
>> > instead of invoking "parser.parse(opts, args, true);" if somehow we can
>> > invoke "parser.parse(opts, args, false);" then all will be good. I
>> > haven't
>> > looked at the api to know about the possibility of same. This is just
>> > food
>> > for thought.
>> >
>> > Thanks,
>> > Anil
>> >
>> >
>> >
>> > On Thu, Mar 8, 2012 at 12:06 PM, Shrijeet Paliwal
>> > <[EMAIL PROTECTED]>wrote:
>> >
>> > > GenericOptionsParser stops parsing the arguments as soon as first non
>> > > option is specified (refer :
>> > >
>> > > http://commons.apache.org/cli/api-
>> > 1.2/org/apache/commons/cli/Parser.html#parse(org.apache.commons.cli.Opt
>> > ions
>> > > ,
>> > > java.lang.String[], boolean))
>> > >
>> > > So in this cases as soon parses sees the table name arg , it ignore
>> > all
>> > > other properties specified with -D opt. Note it not only ignores
>> > separator
>> > > it is also ignoring importtsv.skip.bad.lines option in your run which
>> > > failed.
>> > >
>> > >
>> > >
>> > > On Thu, Mar 8, 2012 at 11:27 AM, Stack <[EMAIL PROTECTED]> wrote:
>> > >
>> > > > On Thu, Mar 8, 2012 at 11:14 AM, anil gupta <[EMAIL PROTECTED]>
>> > > wrote:
>> > > > > 1. Update the HBase bulk load documentation and specify that
>> > separator
>> > > > > argument should be next to program name.
>> > > >
>> > > > This would help.
>> > > >
>> > > > > 2. Fix the problem in the code itself by handling the separator
>> > > argument
>> > > > > explicitly. (Still, i am wondering why only separator value is
>> > not
>> > > being
>> > > > > set in jobconf automatically if it is not provided next to
>> > program
>> > > > name??)
>> > > > >
>>
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB