|
|
-
Inserting Data from CSV into HBase
Savant, Keshav 2012-03-02, 12:54
Hi All, I am looking for a way so that I can map my existing CSV file to HBase table, basically for each column family I want only one value (just like RDBMS). Just to illustrate more suppose I define a HBase table as create 'inventory', 'item', 'supplier', 'quantity' (here table name is inventory and it has three columns named as item, supplier and quantity) Now I want to load my N number of CSVs in following format into this HBase table Burger,abc confectionary,100 Pizza,xyz bakers,50 ... ... ... Here I want to put the data of CSV into my inventory table on HBase, the number of lines in a CSV and even number of CSVs are dynamic, and this will be a continuous process. What I want to know that, do we have any way by which we can achieve above goal, I tried SampleUploader as specified on http://svn.apache.org/repos/asf/hbase/trunk/src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java, but it did not worked and data does not gets populated in HBase table though the program ran successfully. Please suggest on this, any help is appreciated. Kind regards, Keshav C Savant _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
-
Re: Inserting Data from CSV into HBase
Harsh J 2012-03-02, 13:21
Hi, You may use the importtsv tool and the bulk-load utilities in HBase to achieve this fast-and-easy. This is detailed at http://hbase.apache.org/bulk-loads.html (See section about importtsv along the bottom) and also under section "Using the importtsv tool" on Page 460 of Lars George's "HBase: The Definitive Guide" (O'Reilly). Also when you say something didn't work, please also supply any errors you encountered and the configuration you used. Its hard to help without those. On Fri, Mar 2, 2012 at 6:24 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: > Hi All, > > I am looking for a way so that I can map my existing CSV file to HBase table, basically for each column family I want only one value (just like RDBMS). > > Just to illustrate more suppose I define a HBase table as > > create 'inventory', 'item', 'supplier', 'quantity' > (here table name is inventory and it has three columns named as item, supplier and quantity) > > Now I want to load my N number of CSVs in following format into this HBase table > > Burger,abc confectionary,100 > Pizza,xyz bakers,50 > ... > ... > ... > > Here I want to put the data of CSV into my inventory table on HBase, the number of lines in a CSV and even number of CSVs are dynamic, and this will be a continuous process. > > What I want to know that, do we have any way by which we can achieve above goal, I tried SampleUploader as specified on http://svn.apache.org/repos/asf/hbase/trunk/src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java, but it did not worked and data does not gets populated in HBase table though the program ran successfully. > > Please suggest on this, any help is appreciated. > > Kind regards, > Keshav C Savant > > _____________ > The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. -- Harsh J
-
RE: Inserting Data from CSV into HBase
Savant, Keshav 2012-03-02, 13:31
Hi Harsh, Thanks for your response, I don't get any error using the code mentioned in that URL. I will get back to you after analyzing the tools suggested by you. Thanks again. Kind regards, Keshav C Savant -----Original Message----- From: Harsh J [mailto:[EMAIL PROTECTED]] Sent: Friday, March 02, 2012 6:51 PM To: [EMAIL PROTECTED] Subject: Re: Inserting Data from CSV into HBase Hi, You may use the importtsv tool and the bulk-load utilities in HBase to achieve this fast-and-easy. This is detailed at http://hbase.apache.org/bulk-loads.html (See section about importtsv along the bottom) and also under section "Using the importtsv tool" on Page 460 of Lars George's "HBase: The Definitive Guide" (O'Reilly). Also when you say something didn't work, please also supply any errors you encountered and the configuration you used. Its hard to help without those. On Fri, Mar 2, 2012 at 6:24 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: > Hi All, > > I am looking for a way so that I can map my existing CSV file to HBase table, basically for each column family I want only one value (just like RDBMS). > > Just to illustrate more suppose I define a HBase table as > > create 'inventory', 'item', 'supplier', 'quantity' > (here table name is inventory and it has three columns named as item, > supplier and quantity) > > Now I want to load my N number of CSVs in following format into this > HBase table > > Burger,abc confectionary,100 > Pizza,xyz bakers,50 > ... > ... > ... > > Here I want to put the data of CSV into my inventory table on HBase, the number of lines in a CSV and even number of CSVs are dynamic, and this will be a continuous process. > > What I want to know that, do we have any way by which we can achieve above goal, I tried SampleUploader as specified on http://svn.apache.org/repos/asf/hbase/trunk/src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java, but it did not worked and data does not gets populated in HBase table though the program ran successfully. > > Please suggest on this, any help is appreciated. > > Kind regards, > Keshav C Savant > > _____________ > The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. -- Harsh J _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
-
RE: Inserting Data from CSV into HBase
Savant, Keshav 2012-03-06, 10:02
Hi, I tried bulk uploading and it ran well with TSV files, we first ran importtsv and then completebulkload, after doing these two steps I can scan my HBase table and see the data. I can also see the data when I traverse HDFS of my Hadoop cluster using web browser. But when I try to upload my CSVs in a folder, I get bad lines for all the lines of my CSV files. I use following command to upload my CSVs on my local file system to HDFS, HADOOP_CLASSPATH=`hbase classpath` $HADOOP_HOME/bin/hadoop jar /hbase_home/hbase-0.92.0/hbase-0.92.0.jar importtsv -Dimporttsv.bulk.output=/my_output_dir -Dimporttsv.columns=HBASE_ROW_KEY,SerialNumber,Column1,Column2 my_table file:/my_csv/data.txt '-Dimporttsv.separator=,' my csv file is of following format 1,data11,data12 2,data21,data22 3,data31,data32 ..... ..... And my HBase table has 3 columns Please let me know what is the exact problem and how this can be resolved? Kind regards, Keshav -----Original Message----- From: Savant, Keshav Sent: Friday, March 02, 2012 7:02 PM To: [EMAIL PROTECTED] Cc: '[EMAIL PROTECTED]' Subject: RE: Inserting Data from CSV into HBase Hi Harsh, Thanks for your response, I don't get any error using the code mentioned in that URL. I will get back to you after analyzing the tools suggested by you. Thanks again. Kind regards, Keshav C Savant -----Original Message----- From: Harsh J [mailto:[EMAIL PROTECTED]] Sent: Friday, March 02, 2012 6:51 PM To: [EMAIL PROTECTED] Subject: Re: Inserting Data from CSV into HBase Hi, You may use the importtsv tool and the bulk-load utilities in HBase to achieve this fast-and-easy. This is detailed at http://hbase.apache.org/bulk-loads.html (See section about importtsv along the bottom) and also under section "Using the importtsv tool" on Page 460 of Lars George's "HBase: The Definitive Guide" (O'Reilly). Also when you say something didn't work, please also supply any errors you encountered and the configuration you used. Its hard to help without those. On Fri, Mar 2, 2012 at 6:24 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: > Hi All, > > I am looking for a way so that I can map my existing CSV file to HBase table, basically for each column family I want only one value (just like RDBMS). > > Just to illustrate more suppose I define a HBase table as > > create 'inventory', 'item', 'supplier', 'quantity' > (here table name is inventory and it has three columns named as item, > supplier and quantity) > > Now I want to load my N number of CSVs in following format into this > HBase table > > Burger,abc confectionary,100 > Pizza,xyz bakers,50 > ... > ... > ... > > Here I want to put the data of CSV into my inventory table on HBase, the number of lines in a CSV and even number of CSVs are dynamic, and this will be a continuous process. > > What I want to know that, do we have any way by which we can achieve above goal, I tried SampleUploader as specified on http://svn.apache.org/repos/asf/hbase/trunk/src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java, but it did not worked and data does not gets populated in HBase table though the program ran successfully. > > Please suggest on this, any help is appreciated. > > Kind regards, > Keshav C Savant > > _____________ > The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. -- Harsh J _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
-
Re: Inserting Data from CSV into HBase
Harsh J 2012-03-06, 11:59
Hi, Can you share the exact error message/stack trace you get? One observation though: Given that data, I notice it has just 3 columns but you are specifying four elements in the columns argument of the importer: Data: "3,data31,data32" (3 elements) Cols passed: "-Dimporttsv.columns=HBASE_ROW_KEY,SerialNumber,Column1,Column2" (4 elements) Perhaps this may be why your importtsv complains about bad lines. On Tue, Mar 6, 2012 at 3:32 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: > Hi, > > I tried bulk uploading and it ran well with TSV files, we first ran importtsv and then completebulkload, after doing these two steps I can scan my HBase table and see the data. I can also see the data when I traverse HDFS of my Hadoop cluster using web browser. > > But when I try to upload my CSVs in a folder, I get bad lines for all the lines of my CSV files. I use following command to upload my CSVs on my local file system to HDFS, > > HADOOP_CLASSPATH=`hbase classpath` $HADOOP_HOME/bin/hadoop jar /hbase_home/hbase-0.92.0/hbase-0.92.0.jar importtsv -Dimporttsv.bulk.output=/my_output_dir -Dimporttsv.columns=HBASE_ROW_KEY,SerialNumber,Column1,Column2 my_table file:/my_csv/data.txt '-Dimporttsv.separator=,' > > my csv file is of following format > > 1,data11,data12 > 2,data21,data22 > 3,data31,data32 > ..... > ..... > > And my HBase table has 3 columns > > > Please let me know what is the exact problem and how this can be resolved? > > Kind regards, > Keshav > > > > -----Original Message----- > From: Savant, Keshav > Sent: Friday, March 02, 2012 7:02 PM > To: [EMAIL PROTECTED] > Cc: '[EMAIL PROTECTED]' > Subject: RE: Inserting Data from CSV into HBase > > Hi Harsh, > > Thanks for your response, I don't get any error using the code mentioned in that URL. I will get back to you after analyzing the tools suggested by you. > Thanks again. > > > Kind regards, > Keshav C Savant > > -----Original Message----- > From: Harsh J [mailto:[EMAIL PROTECTED]] > Sent: Friday, March 02, 2012 6:51 PM > To: [EMAIL PROTECTED] > Subject: Re: Inserting Data from CSV into HBase > > Hi, > > You may use the importtsv tool and the bulk-load utilities in HBase to achieve this fast-and-easy. > > This is detailed at http://hbase.apache.org/bulk-loads.html (See section about importtsv along the bottom) and also under section "Using the importtsv tool" on Page 460 of Lars George's "HBase: The Definitive Guide" (O'Reilly). > > Also when you say something didn't work, please also supply any errors you encountered and the configuration you used. Its hard to help without those. > > On Fri, Mar 2, 2012 at 6:24 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: >> Hi All, >> >> I am looking for a way so that I can map my existing CSV file to HBase table, basically for each column family I want only one value (just like RDBMS). >> >> Just to illustrate more suppose I define a HBase table as >> >> create 'inventory', 'item', 'supplier', 'quantity' >> (here table name is inventory and it has three columns named as item, >> supplier and quantity) >> >> Now I want to load my N number of CSVs in following format into this >> HBase table >> >> Burger,abc confectionary,100 >> Pizza,xyz bakers,50 >> ... >> ... >> ... >> >> Here I want to put the data of CSV into my inventory table on HBase, the number of lines in a CSV and even number of CSVs are dynamic, and this will be a continuous process. >> >> What I want to know that, do we have any way by which we can achieve above goal, I tried SampleUploader as specified on http://svn.apache.org/repos/asf/hbase/trunk/src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java, but it did not worked and data does not gets populated in HBase table though the program ran successfully. >> >> Please suggest on this, any help is appreciated. >> >> Kind regards, >> Keshav C Savant >> >> _____________ >> The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. Harsh J
-
RE: Inserting Data from CSV into HBase
Savant, Keshav 2012-03-06, 12:49
Hi Harsh,
If I try without using the HBASE_ROW_KEY, it says must specify the HBASE_ROW_KEY and does not goes further.
With the HBASE_ROW_KEY added with column names I get no error but following output
12/03/07 02:52:30 INFO mapreduce.HFileOutputFormat: Looking up current regions for table org.apache.hadoop.hbase.client.HTable@6bade9 12/03/07 02:52:30 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce partitions to match current region count 12/03/07 02:52:30 INFO mapreduce.HFileOutputFormat: Writing partition information to hdfs://master/user/hadoop/partitions_1331088750667 12/03/07 02:52:30 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/03/07 02:52:30 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 12/03/07 02:52:30 INFO compress.CodecPool: Got brand-new compressor 12/03/07 02:52:31 INFO mapreduce.HFileOutputFormat: Incremental table output configured. 12/03/07 02:52:31 INFO input.FileInputFormat: Total input paths to process : 1 12/03/07 02:52:32 INFO mapred.JobClient: Running job: job_201203062231_0002 12/03/07 02:52:33 INFO mapred.JobClient: map 0% reduce 0% 12/03/07 02:52:49 INFO mapred.JobClient: map 100% reduce 0% 12/03/07 02:53:01 INFO mapred.JobClient: map 100% reduce 100% 12/03/07 02:53:06 INFO mapred.JobClient: Job complete: job_201203062231_0002 12/03/07 02:53:06 INFO mapred.JobClient: Counters: 25 12/03/07 02:53:06 INFO mapred.JobClient: Job Counters 12/03/07 02:53:06 INFO mapred.JobClient: Launched reduce tasks=1 12/03/07 02:53:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=13349 12/03/07 02:53:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/03/07 02:53:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/03/07 02:53:06 INFO mapred.JobClient: Rack-local map tasks=1 12/03/07 02:53:06 INFO mapred.JobClient: Launched map tasks=1 12/03/07 02:53:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10467 12/03/07 02:53:06 INFO mapred.JobClient: ImportTsv 12/03/07 02:53:06 INFO mapred.JobClient: Bad Lines=6 12/03/07 02:53:06 INFO mapred.JobClient: File Output Format Counters 12/03/07 02:53:06 INFO mapred.JobClient: Bytes Written=0 12/03/07 02:53:06 INFO mapred.JobClient: FileSystemCounters 12/03/07 02:53:06 INFO mapred.JobClient: FILE_BYTES_READ=282 12/03/07 02:53:06 INFO mapred.JobClient: HDFS_BYTES_READ=102 12/03/07 02:53:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=65025 12/03/07 02:53:06 INFO mapred.JobClient: File Input Format Counters 12/03/07 02:53:06 INFO mapred.JobClient: Bytes Read=123 12/03/07 02:53:06 INFO mapred.JobClient: Map-Reduce Framework 12/03/07 02:53:06 INFO mapred.JobClient: Reduce input groups=0 12/03/07 02:53:06 INFO mapred.JobClient: Map output materialized bytes=6 12/03/07 02:53:06 INFO mapred.JobClient: Combine output records=0 12/03/07 02:53:06 INFO mapred.JobClient: Map input records=6 12/03/07 02:53:06 INFO mapred.JobClient: Reduce shuffle bytes=0 12/03/07 02:53:06 INFO mapred.JobClient: Reduce output records=0 12/03/07 02:53:06 INFO mapred.JobClient: Spilled Records=0 12/03/07 02:53:06 INFO mapred.JobClient: Map output bytes=0 12/03/07 02:53:06 INFO mapred.JobClient: Combine input records=0 12/03/07 02:53:06 INFO mapred.JobClient: Map output records=0 12/03/07 02:53:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=102 12/03/07 02:53:06 INFO mapred.JobClient: Reduce input records=0 Kind regards, Keshav C Savant
-----Original Message----- From: Harsh J [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 06, 2012 5:29 PM To: Savant, Keshav Cc: [EMAIL PROTECTED] Subject: Re: Inserting Data from CSV into HBase
Hi,
Can you share the exact error message/stack trace you get?
One observation though: Given that data, I notice it has just 3 columns but you are specifying four elements in the columns argument of the importer:
Data: "3,data31,data32" (3 elements) Cols passed: "-Dimporttsv.columns=HBASE_ROW_KEY,SerialNumber,Column1,Column2" (4 elements)
Perhaps this may be why your importtsv complains about bad lines.
On Tue, Mar 6, 2012 at 3:32 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote:
Harsh J
_____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
-
Re: Inserting Data from CSV into HBase
victor.hong@... 2012-03-06, 14:37
Did you try to add a comma at the end of line? Just to see how it will do? On Mar 6, 2012, at 5:02 AM, ext Savant, Keshav wrote: > Hi, > > I tried bulk uploading and it ran well with TSV files, we first ran importtsv and then completebulkload, after doing these two steps I can scan my HBase table and see the data. I can also see the data when I traverse HDFS of my Hadoop cluster using web browser. > > But when I try to upload my CSVs in a folder, I get bad lines for all the lines of my CSV files. I use following command to upload my CSVs on my local file system to HDFS, > > HADOOP_CLASSPATH=`hbase classpath` $HADOOP_HOME/bin/hadoop jar /hbase_home/hbase-0.92.0/hbase-0.92.0.jar importtsv -Dimporttsv.bulk.output=/my_output_dir -Dimporttsv.columns=HBASE_ROW_KEY,SerialNumber,Column1,Column2 my_table file:/my_csv/data.txt '-Dimporttsv.separator=,' > > my csv file is of following format > > 1,data11,data12 > 2,data21,data22 > 3,data31,data32 > ..... > ..... > > And my HBase table has 3 columns > > > Please let me know what is the exact problem and how this can be resolved? > > Kind regards, > Keshav > > > > -----Original Message----- > From: Savant, Keshav > Sent: Friday, March 02, 2012 7:02 PM > To: [EMAIL PROTECTED] > Cc: '[EMAIL PROTECTED]' > Subject: RE: Inserting Data from CSV into HBase > > Hi Harsh, > > Thanks for your response, I don't get any error using the code mentioned in that URL. I will get back to you after analyzing the tools suggested by you. > Thanks again. > > > Kind regards, > Keshav C Savant > > -----Original Message----- > From: Harsh J [mailto:[EMAIL PROTECTED]] > Sent: Friday, March 02, 2012 6:51 PM > To: [EMAIL PROTECTED] > Subject: Re: Inserting Data from CSV into HBase > > Hi, > > You may use the importtsv tool and the bulk-load utilities in HBase to achieve this fast-and-easy. > > This is detailed at http://hbase.apache.org/bulk-loads.html (See section about importtsv along the bottom) and also under section "Using the importtsv tool" on Page 460 of Lars George's "HBase: The Definitive Guide" (O'Reilly). > > Also when you say something didn't work, please also supply any errors you encountered and the configuration you used. Its hard to help without those. > > On Fri, Mar 2, 2012 at 6:24 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: >> Hi All, >> >> I am looking for a way so that I can map my existing CSV file to HBase table, basically for each column family I want only one value (just like RDBMS). >> >> Just to illustrate more suppose I define a HBase table as >> >> create 'inventory', 'item', 'supplier', 'quantity' >> (here table name is inventory and it has three columns named as item, >> supplier and quantity) >> >> Now I want to load my N number of CSVs in following format into this >> HBase table >> >> Burger,abc confectionary,100 >> Pizza,xyz bakers,50 >> ... >> ... >> ... >> >> Here I want to put the data of CSV into my inventory table on HBase, the number of lines in a CSV and even number of CSVs are dynamic, and this will be a continuous process. >> >> What I want to know that, do we have any way by which we can achieve above goal, I tried SampleUploader as specified on http://svn.apache.org/repos/asf/hbase/trunk/src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java, but it did not worked and data does not gets populated in HBase table though the program ran successfully. >> >> Please suggest on this, any help is appreciated. >> >> Kind regards, >> Keshav C Savant >> >> _____________ >> The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
-
Re: Inserting Data from CSV into HBase
Anil Gupta 2012-03-06, 23:58
Hi keshav, Seemingly there is a problem with bulk load when we try to import data from csv file. I also ran into this problem yesterday and posted the same on mailing list. I got pulled into some other task at work so unable to devote much time on it. I have identified the problem but I still need to figure out the fix of it. I will post the solution once I finish it. Best Regards, Anil On Mar 6, 2012, at 6:37 AM, <[EMAIL PROTECTED]> wrote: > Did you try to add a comma at the end of line? Just to see how it will do? > > > On Mar 6, 2012, at 5:02 AM, ext Savant, Keshav wrote: > >> Hi, >> >> I tried bulk uploading and it ran well with TSV files, we first ran importtsv and then completebulkload, after doing these two steps I can scan my HBase table and see the data. I can also see the data when I traverse HDFS of my Hadoop cluster using web browser. >> >> But when I try to upload my CSVs in a folder, I get bad lines for all the lines of my CSV files. I use following command to upload my CSVs on my local file system to HDFS, >> >> HADOOP_CLASSPATH=`hbase classpath` $HADOOP_HOME/bin/hadoop jar /hbase_home/hbase-0.92.0/hbase-0.92.0.jar importtsv -Dimporttsv.bulk.output=/my_output_dir -Dimporttsv.columns=HBASE_ROW_KEY,SerialNumber,Column1,Column2 my_table file:/my_csv/data.txt '-Dimporttsv.separator=,' >> >> my csv file is of following format >> >> 1,data11,data12 >> 2,data21,data22 >> 3,data31,data32 >> ..... >> ..... >> >> And my HBase table has 3 columns >> >> >> Please let me know what is the exact problem and how this can be resolved? >> >> Kind regards, >> Keshav >> >> >> >> -----Original Message----- >> From: Savant, Keshav >> Sent: Friday, March 02, 2012 7:02 PM >> To: [EMAIL PROTECTED] >> Cc: '[EMAIL PROTECTED]' >> Subject: RE: Inserting Data from CSV into HBase >> >> Hi Harsh, >> >> Thanks for your response, I don't get any error using the code mentioned in that URL. I will get back to you after analyzing the tools suggested by you. >> Thanks again. >> >> >> Kind regards, >> Keshav C Savant >> >> -----Original Message----- >> From: Harsh J [mailto:[EMAIL PROTECTED]] >> Sent: Friday, March 02, 2012 6:51 PM >> To: [EMAIL PROTECTED] >> Subject: Re: Inserting Data from CSV into HBase >> >> Hi, >> >> You may use the importtsv tool and the bulk-load utilities in HBase to achieve this fast-and-easy. >> >> This is detailed at http://hbase.apache.org/bulk-loads.html (See section about importtsv along the bottom) and also under section "Using the importtsv tool" on Page 460 of Lars George's "HBase: The Definitive Guide" (O'Reilly). >> >> Also when you say something didn't work, please also supply any errors you encountered and the configuration you used. Its hard to help without those. >> >> On Fri, Mar 2, 2012 at 6:24 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: >>> Hi All, >>> >>> I am looking for a way so that I can map my existing CSV file to HBase table, basically for each column family I want only one value (just like RDBMS). >>> >>> Just to illustrate more suppose I define a HBase table as >>> >>> create 'inventory', 'item', 'supplier', 'quantity' >>> (here table name is inventory and it has three columns named as item, >>> supplier and quantity) >>> >>> Now I want to load my N number of CSVs in following format into this >>> HBase table >>> >>> Burger,abc confectionary,100 >>> Pizza,xyz bakers,50 >>> ... >>> ... >>> ... >>> >>> Here I want to put the data of CSV into my inventory table on HBase, the number of lines in a CSV and even number of CSVs are dynamic, and this will be a continuous process. >>> >>> What I want to know that, do we have any way by which we can achieve above goal, I tried SampleUploader as specified on http://svn.apache.org/repos/asf/hbase/trunk/src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java, but it did not worked and data does not gets populated in HBase table though the program ran successfully. >>> >>> Please suggest on this, any help is appreciated.
-
RE: Inserting Data from CSV into HBase
Savant, Keshav 2012-03-07, 06:59
Hi Harsh/Anil/Victor,
I was able to run the importtsv tool for my sample CSV (on my file system NOT on HDFS) with a little tweaking in my command, I placed the argument '-Dimporttsv.separator=,' as the first command line argument and then the rest arguments, and it ran successfully. Here is the command
HADOOP_CLASSPATH=`hbase classpath` $HADOOP_HOME/bin/hadoop jar /hbase_home/hbase-0.92.0/hbase-0.92.0.jar importtsv '-Dimporttsv.separator=,' -Dimporttsv.bulk.output=/myHbaseOutputDataDir -Dimporttsv.columns=HBASE_ROW_KEY,Name,Asset Tblassetscsv file:/myHbaseDataDir
Extracts from my CSV file is as below
1,user1,desktop 2,user2,ipad 3,user3,Mc Air ... ... ...
After running importtsv I ran completebulkupload tool and the data was placed in hbase table and was scannable after this. The completebulkupload command goes like this...
HADOOP_CLASSPATH=`hbase classpath` $HADOOP_HOME/bin/hadoop jar /hbase_home/hbase-0.92.0/hbase-0.92.0.jar completebulkload /myHbaseOutputDataDir Tblassetscsv
Just for the information of new users, after running the importtsv, the data (in mix text and binary format) can be seen if we traverse the HDFS using Hadoop name node web application, it will be in the output directory that we specified (in our case myHbaseOutputDataDir). After running completebulkupload the data will be moved to hbase/<table_name> directory on HDFS in it can be seen the same way as above. (Correct me if I am wrong)
My sincere thanks to Harsh for sharing his valuable thoughts.
Just to re-iterate this all exercise was to run a sample/demo, now we will be focusing on real problem/solution. I will also be looking into the source code of importtsv utility, so that if required I can change it to suit my problem domain. Kind regards, Keshav C Savant -----Original Message----- From: Savant, Keshav Sent: Wednesday, March 07, 2012 11:10 AM To: 'Harsh J' Subject: RE: Inserting Data from CSV into HBase
Hi Harsh,
This did not worked, still facing same bad line issue. Tried appending an extra comma at the end of each line, but that did not worked too. For our case as TSVs are working fine but NOT the CSVs.
Kind regards, Keshav
-----Original Message----- From: Harsh J [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 06, 2012 8:18 PM To: Savant, Keshav Subject: Re: Inserting Data from CSV into HBase
Hi,
What you need is just: "-Dimporttsv.columns=HBASE_ROW_KEY,Column1,Column2"
SerialNumber becomes your Key. On Tue, Mar 6, 2012 at 6:19 PM, Savant, Keshav <[EMAIL PROTECTED]> wrote: > Hi Harsh, > > If I try without using the HBASE_ROW_KEY, it says must specify the HBASE_ROW_KEY and does not goes further. > > With the HBASE_ROW_KEY added with column names I get no error but > following output > > 12/03/07 02:52:30 INFO mapreduce.HFileOutputFormat: Looking up current > regions for table org.apache.hadoop.hbase.client.HTable@6bade9 > 12/03/07 02:52:30 INFO mapreduce.HFileOutputFormat: Configuring 1 > reduce partitions to match current region count > 12/03/07 02:52:30 INFO mapreduce.HFileOutputFormat: Writing partition > information to hdfs://master/user/hadoop/partitions_1331088750667 > 12/03/07 02:52:30 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 12/03/07 02:52:30 INFO zlib.ZlibFactory: Successfully loaded & > initialized native-zlib library > 12/03/07 02:52:30 INFO compress.CodecPool: Got brand-new compressor > 12/03/07 02:52:31 INFO mapreduce.HFileOutputFormat: Incremental table output configured. > 12/03/07 02:52:31 INFO input.FileInputFormat: Total input paths to > process : 1 > 12/03/07 02:52:32 INFO mapred.JobClient: Running job: > job_201203062231_0002 > 12/03/07 02:52:33 INFO mapred.JobClient: map 0% reduce 0% > 12/03/07 02:52:49 INFO mapred.JobClient: map 100% reduce 0% > 12/03/07 02:53:01 INFO mapred.JobClient: map 100% reduce 100% > 12/03/07 02:53:06 INFO mapred.JobClient: Job complete: > job_201203062231_0002 > 12/03/07 02:53:06 INFO mapred.JobClient: Counters: 25 > 12/03/07 02:53:06 INFO mapred.JobClient: Job Counters
Harsh J
_____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
|
|