|
Sandeep Reddy P
2012-06-26, 19:07
Harsh J
2012-06-26, 19:51
Michael Segel
2012-06-26, 20:00
Sandeep Reddy P
2012-06-26, 21:30
Hitesh Shah
2012-06-26, 22:01
Michel Segel
2012-06-26, 22:48
Sandeep Reddy P
2012-06-27, 01:58
Michel Segel
2012-06-27, 02:11
Michel Segel
2012-06-27, 02:13
Sandeep Reddy P
2012-06-27, 02:52
ramakanth reddy
2012-06-27, 12:39
|
-
Hive error when loading csv data.Sandeep Reddy P 2012-06-26, 19:07
Hi all,
I have a csv file with 46 columns but i'm getting error when i do some analysis on that data type. For simplification i have taken 3 columns and now my csv is like c,zxy,xyz d,"abc,def",abcd i have created table for this data using, hive> create table test3( > f1 string, > f2 string, > f3 string) > row format delimited > fields terminated by ","; OK Time taken: 0.143 seconds hive> load data local inpath '/home/training/a.csv' > into table test3; Copying data from file:/home/training/a.csv Copying file: file:/home/training/a.csv Loading data to table default.test3 OK Time taken: 0.276 seconds hive> select * from test3; OK c zxy xyz d "abc def" Time taken: 0.156 seconds When i do select f2 from test3; my results are, OK zxy "abc but this should be abc,def When i open the same csv file with Microsoft Excel i got abc,def How should i solve this error?? -- Thanks, sandeep
-
Re: Hive error when loading csv data.Harsh J 2012-06-26, 19:51
Hive's delimited-fields-format record reader does not handle quoted
text that carry the same delimiter within them. Excel supports such records, so it reads it fine. You will need to create your table with a custom InputFormat class that can handle this (Try using OpenCSV readers, they support this), instead of relying on Hive to do this for you. If you're successful in your approach, please also consider contributing something back to Hive/Pig to help others. On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P <[EMAIL PROTECTED]> wrote: > > > Hi all, > I have a csv file with 46 columns but i'm getting error when i do some > analysis on that data type. For simplification i have taken 3 columns and > now my csv is like > c,zxy,xyz > d,"abc,def",abcd > > i have created table for this data using, > hive> create table test3( > > f1 string, > > f2 string, > > f3 string) > > row format delimited > > fields terminated by ","; > OK > Time taken: 0.143 seconds > hive> load data local inpath '/home/training/a.csv' > > into table test3; > Copying data from file:/home/training/a.csv > Copying file: file:/home/training/a.csv > Loading data to table default.test3 > OK > Time taken: 0.276 seconds > hive> select * from test3; > OK > c zxy xyz > d "abc def" > Time taken: 0.156 seconds > > When i do select f2 from test3; > my results are, > OK > zxy > "abc > but this should be abc,def > When i open the same csv file with Microsoft Excel i got abc,def > How should i solve this error?? > > > > -- > Thanks, > sandeep > > -- > > > -- Harsh J
-
Re: Hive error when loading csv data.Michael Segel 2012-06-26, 20:00
Alternatively you could write a simple script to convert the csv to a pipe delimited file so that "abc,def" will be abc,def.
On Jun 26, 2012, at 2:51 PM, Harsh J wrote: > Hive's delimited-fields-format record reader does not handle quoted > text that carry the same delimiter within them. Excel supports such > records, so it reads it fine. > > You will need to create your table with a custom InputFormat class > that can handle this (Try using OpenCSV readers, they support this), > instead of relying on Hive to do this for you. If you're successful in > your approach, please also consider contributing something back to > Hive/Pig to help others. > > On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P > <[EMAIL PROTECTED]> wrote: >> >> >> Hi all, >> I have a csv file with 46 columns but i'm getting error when i do some >> analysis on that data type. For simplification i have taken 3 columns and >> now my csv is like >> c,zxy,xyz >> d,"abc,def",abcd >> >> i have created table for this data using, >> hive> create table test3( >> > f1 string, >> > f2 string, >> > f3 string) >> > row format delimited >> > fields terminated by ","; >> OK >> Time taken: 0.143 seconds >> hive> load data local inpath '/home/training/a.csv' >> > into table test3; >> Copying data from file:/home/training/a.csv >> Copying file: file:/home/training/a.csv >> Loading data to table default.test3 >> OK >> Time taken: 0.276 seconds >> hive> select * from test3; >> OK >> c zxy xyz >> d "abc def" >> Time taken: 0.156 seconds >> >> When i do select f2 from test3; >> my results are, >> OK >> zxy >> "abc >> but this should be abc,def >> When i open the same csv file with Microsoft Excel i got abc,def >> How should i solve this error?? >> >> >> >> -- >> Thanks, >> sandeep >> >> -- >> >> >> > > > > -- > Harsh J >
-
Re: Hive error when loading csv data.Sandeep Reddy P 2012-06-26, 21:30
Thanks for the reply.
I didnt get that Michael. My f2 should be "abc,def" On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > Alternatively you could write a simple script to convert the csv to a pipe > delimited file so that "abc,def" will be abc,def. > > On Jun 26, 2012, at 2:51 PM, Harsh J wrote: > > > Hive's delimited-fields-format record reader does not handle quoted > > text that carry the same delimiter within them. Excel supports such > > records, so it reads it fine. > > > > You will need to create your table with a custom InputFormat class > > that can handle this (Try using OpenCSV readers, they support this), > > instead of relying on Hive to do this for you. If you're successful in > > your approach, please also consider contributing something back to > > Hive/Pig to help others. > > > > On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P > > <[EMAIL PROTECTED]> wrote: > >> > >> > >> Hi all, > >> I have a csv file with 46 columns but i'm getting error when i do some > >> analysis on that data type. For simplification i have taken 3 columns > and > >> now my csv is like > >> c,zxy,xyz > >> d,"abc,def",abcd > >> > >> i have created table for this data using, > >> hive> create table test3( > >> > f1 string, > >> > f2 string, > >> > f3 string) > >> > row format delimited > >> > fields terminated by ","; > >> OK > >> Time taken: 0.143 seconds > >> hive> load data local inpath '/home/training/a.csv' > >> > into table test3; > >> Copying data from file:/home/training/a.csv > >> Copying file: file:/home/training/a.csv > >> Loading data to table default.test3 > >> OK > >> Time taken: 0.276 seconds > >> hive> select * from test3; > >> OK > >> c zxy xyz > >> d "abc def" > >> Time taken: 0.156 seconds > >> > >> When i do select f2 from test3; > >> my results are, > >> OK > >> zxy > >> "abc > >> but this should be abc,def > >> When i open the same csv file with Microsoft Excel i got abc,def > >> How should i solve this error?? > >> > >> > >> > >> -- > >> Thanks, > >> sandeep > >> > >> -- > >> > >> > >> > > > > > > > > -- > > Harsh J > > > > -- Thanks, sandeep
-
Re: Hive error when loading csv data.Hitesh Shah 2012-06-26, 22:01
Michael's suggestion was to change your data to:
c|zxy|xyz d|abc,def|abcd and then use "|" as the delimiter. -- Hitesh On Jun 26, 2012, at 2:30 PM, Sandeep Reddy P wrote: > Thanks for the reply. > I didnt get that Michael. My f2 should be "abc,def" > > On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > >> Alternatively you could write a simple script to convert the csv to a pipe >> delimited file so that "abc,def" will be abc,def. >> >> On Jun 26, 2012, at 2:51 PM, Harsh J wrote: >> >>> Hive's delimited-fields-format record reader does not handle quoted >>> text that carry the same delimiter within them. Excel supports such >>> records, so it reads it fine. >>> >>> You will need to create your table with a custom InputFormat class >>> that can handle this (Try using OpenCSV readers, they support this), >>> instead of relying on Hive to do this for you. If you're successful in >>> your approach, please also consider contributing something back to >>> Hive/Pig to help others. >>> >>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P >>> <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> Hi all, >>>> I have a csv file with 46 columns but i'm getting error when i do some >>>> analysis on that data type. For simplification i have taken 3 columns >> and >>>> now my csv is like >>>> c,zxy,xyz >>>> d,"abc,def",abcd >>>> >>>> i have created table for this data using, >>>> hive> create table test3( >>>>> f1 string, >>>>> f2 string, >>>>> f3 string) >>>>> row format delimited >>>>> fields terminated by ","; >>>> OK >>>> Time taken: 0.143 seconds >>>> hive> load data local inpath '/home/training/a.csv' >>>>> into table test3; >>>> Copying data from file:/home/training/a.csv >>>> Copying file: file:/home/training/a.csv >>>> Loading data to table default.test3 >>>> OK >>>> Time taken: 0.276 seconds >>>> hive> select * from test3; >>>> OK >>>> c zxy xyz >>>> d "abc def" >>>> Time taken: 0.156 seconds >>>> >>>> When i do select f2 from test3; >>>> my results are, >>>> OK >>>> zxy >>>> "abc >>>> but this should be abc,def >>>> When i open the same csv file with Microsoft Excel i got abc,def >>>> How should i solve this error?? >>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> sandeep >>>> >>>> -- >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Harsh J >>> >> >> > > > -- > Thanks, > sandeep
-
Re: Hive error when loading csv data.Michel Segel 2012-06-26, 22:48
Yup. I just didnt add the quotes.
Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P <[EMAIL PROTECTED]> wrote: > Thanks for the reply. > I didnt get that Michael. My f2 should be "abc,def" > > On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > >> Alternatively you could write a simple script to convert the csv to a pipe >> delimited file so that "abc,def" will be abc,def. >> >> On Jun 26, 2012, at 2:51 PM, Harsh J wrote: >> >>> Hive's delimited-fields-format record reader does not handle quoted >>> text that carry the same delimiter within them. Excel supports such >>> records, so it reads it fine. >>> >>> You will need to create your table with a custom InputFormat class >>> that can handle this (Try using OpenCSV readers, they support this), >>> instead of relying on Hive to do this for you. If you're successful in >>> your approach, please also consider contributing something back to >>> Hive/Pig to help others. >>> >>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P >>> <[EMAIL PROTECTED]> wrote: >>>> >>>> >>>> Hi all, >>>> I have a csv file with 46 columns but i'm getting error when i do some >>>> analysis on that data type. For simplification i have taken 3 columns >> and >>>> now my csv is like >>>> c,zxy,xyz >>>> d,"abc,def",abcd >>>> >>>> i have created table for this data using, >>>> hive> create table test3( >>>>> f1 string, >>>>> f2 string, >>>>> f3 string) >>>>> row format delimited >>>>> fields terminated by ","; >>>> OK >>>> Time taken: 0.143 seconds >>>> hive> load data local inpath '/home/training/a.csv' >>>>> into table test3; >>>> Copying data from file:/home/training/a.csv >>>> Copying file: file:/home/training/a.csv >>>> Loading data to table default.test3 >>>> OK >>>> Time taken: 0.276 seconds >>>> hive> select * from test3; >>>> OK >>>> c zxy xyz >>>> d "abc def" >>>> Time taken: 0.156 seconds >>>> >>>> When i do select f2 from test3; >>>> my results are, >>>> OK >>>> zxy >>>> "abc >>>> but this should be abc,def >>>> When i open the same csv file with Microsoft Excel i got abc,def >>>> How should i solve this error?? >>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> sandeep >>>> >>>> -- >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Harsh J >>> >> >> > > > -- > Thanks, > sandeep
-
Re: Hive error when loading csv data.Sandeep Reddy P 2012-06-27, 01:58
If i do that my data will be d|"abc|def"|abcd my problem is not solved
On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel <[EMAIL PROTECTED]>wrote: > Yup. I just didnt add the quotes. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P <[EMAIL PROTECTED]> > wrote: > > > Thanks for the reply. > > I didnt get that Michael. My f2 should be "abc,def" > > > > On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel < > [EMAIL PROTECTED]>wrote: > > > >> Alternatively you could write a simple script to convert the csv to a > pipe > >> delimited file so that "abc,def" will be abc,def. > >> > >> On Jun 26, 2012, at 2:51 PM, Harsh J wrote: > >> > >>> Hive's delimited-fields-format record reader does not handle quoted > >>> text that carry the same delimiter within them. Excel supports such > >>> records, so it reads it fine. > >>> > >>> You will need to create your table with a custom InputFormat class > >>> that can handle this (Try using OpenCSV readers, they support this), > >>> instead of relying on Hive to do this for you. If you're successful in > >>> your approach, please also consider contributing something back to > >>> Hive/Pig to help others. > >>> > >>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P > >>> <[EMAIL PROTECTED]> wrote: > >>>> > >>>> > >>>> Hi all, > >>>> I have a csv file with 46 columns but i'm getting error when i do some > >>>> analysis on that data type. For simplification i have taken 3 columns > >> and > >>>> now my csv is like > >>>> c,zxy,xyz > >>>> d,"abc,def",abcd > >>>> > >>>> i have created table for this data using, > >>>> hive> create table test3( > >>>>> f1 string, > >>>>> f2 string, > >>>>> f3 string) > >>>>> row format delimited > >>>>> fields terminated by ","; > >>>> OK > >>>> Time taken: 0.143 seconds > >>>> hive> load data local inpath '/home/training/a.csv' > >>>>> into table test3; > >>>> Copying data from file:/home/training/a.csv > >>>> Copying file: file:/home/training/a.csv > >>>> Loading data to table default.test3 > >>>> OK > >>>> Time taken: 0.276 seconds > >>>> hive> select * from test3; > >>>> OK > >>>> c zxy xyz > >>>> d "abc def" > >>>> Time taken: 0.156 seconds > >>>> > >>>> When i do select f2 from test3; > >>>> my results are, > >>>> OK > >>>> zxy > >>>> "abc > >>>> but this should be abc,def > >>>> When i open the same csv file with Microsoft Excel i got abc,def > >>>> How should i solve this error?? > >>>> > >>>> > >>>> > >>>> -- > >>>> Thanks, > >>>> sandeep > >>>> > >>>> -- > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Harsh J > >>> > >> > >> > > > > > > -- > > Thanks, > > sandeep > -- Thanks, sandeep
-
Re: Hive error when loading csv data.Michel Segel 2012-06-27, 02:11
What I am suggesting is to write a simple script , maybe using python, where you replace the commas that are used as field delimiter
Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P <[EMAIL PROTECTED]> wrote: > If i do that my data will be d|"abc|def"|abcd my problem is not solved > > On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel <[EMAIL PROTECTED]>wrote: > >> Yup. I just didnt add the quotes. >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P <[EMAIL PROTECTED]> >> wrote: >> >>> Thanks for the reply. >>> I didnt get that Michael. My f2 should be "abc,def" >>> >>> On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel < >> [EMAIL PROTECTED]>wrote: >>> >>>> Alternatively you could write a simple script to convert the csv to a >> pipe >>>> delimited file so that "abc,def" will be abc,def. >>>> >>>> On Jun 26, 2012, at 2:51 PM, Harsh J wrote: >>>> >>>>> Hive's delimited-fields-format record reader does not handle quoted >>>>> text that carry the same delimiter within them. Excel supports such >>>>> records, so it reads it fine. >>>>> >>>>> You will need to create your table with a custom InputFormat class >>>>> that can handle this (Try using OpenCSV readers, they support this), >>>>> instead of relying on Hive to do this for you. If you're successful in >>>>> your approach, please also consider contributing something back to >>>>> Hive/Pig to help others. >>>>> >>>>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>> >>>>>> Hi all, >>>>>> I have a csv file with 46 columns but i'm getting error when i do some >>>>>> analysis on that data type. For simplification i have taken 3 columns >>>> and >>>>>> now my csv is like >>>>>> c,zxy,xyz >>>>>> d,"abc,def",abcd >>>>>> >>>>>> i have created table for this data using, >>>>>> hive> create table test3( >>>>>>> f1 string, >>>>>>> f2 string, >>>>>>> f3 string) >>>>>>> row format delimited >>>>>>> fields terminated by ","; >>>>>> OK >>>>>> Time taken: 0.143 seconds >>>>>> hive> load data local inpath '/home/training/a.csv' >>>>>>> into table test3; >>>>>> Copying data from file:/home/training/a.csv >>>>>> Copying file: file:/home/training/a.csv >>>>>> Loading data to table default.test3 >>>>>> OK >>>>>> Time taken: 0.276 seconds >>>>>> hive> select * from test3; >>>>>> OK >>>>>> c zxy xyz >>>>>> d "abc def" >>>>>> Time taken: 0.156 seconds >>>>>> >>>>>> When i do select f2 from test3; >>>>>> my results are, >>>>>> OK >>>>>> zxy >>>>>> "abc >>>>>> but this should be abc,def >>>>>> When i open the same csv file with Microsoft Excel i got abc,def >>>>>> How should i solve this error?? >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> sandeep >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>>> >>>> >>>> >>> >>> >>> -- >>> Thanks, >>> sandeep >> > > > > -- > Thanks, > sandeep
-
Re: Hive error when loading csv data.Michel Segel 2012-06-27, 02:13
Sorry,
I was saying that you can write a python script that replaces the delimiter with a | and ignore the commas within quotes. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P <[EMAIL PROTECTED]> wrote: > If i do that my data will be d|"abc|def"|abcd my problem is not solved > > On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel <[EMAIL PROTECTED]>wrote: > >> Yup. I just didnt add the quotes. >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P <[EMAIL PROTECTED]> >> wrote: >> >>> Thanks for the reply. >>> I didnt get that Michael. My f2 should be "abc,def" >>> >>> On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel < >> [EMAIL PROTECTED]>wrote: >>> >>>> Alternatively you could write a simple script to convert the csv to a >> pipe >>>> delimited file so that "abc,def" will be abc,def. >>>> >>>> On Jun 26, 2012, at 2:51 PM, Harsh J wrote: >>>> >>>>> Hive's delimited-fields-format record reader does not handle quoted >>>>> text that carry the same delimiter within them. Excel supports such >>>>> records, so it reads it fine. >>>>> >>>>> You will need to create your table with a custom InputFormat class >>>>> that can handle this (Try using OpenCSV readers, they support this), >>>>> instead of relying on Hive to do this for you. If you're successful in >>>>> your approach, please also consider contributing something back to >>>>> Hive/Pig to help others. >>>>> >>>>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P >>>>> <[EMAIL PROTECTED]> wrote: >>>>>> >>>>>> >>>>>> Hi all, >>>>>> I have a csv file with 46 columns but i'm getting error when i do some >>>>>> analysis on that data type. For simplification i have taken 3 columns >>>> and >>>>>> now my csv is like >>>>>> c,zxy,xyz >>>>>> d,"abc,def",abcd >>>>>> >>>>>> i have created table for this data using, >>>>>> hive> create table test3( >>>>>>> f1 string, >>>>>>> f2 string, >>>>>>> f3 string) >>>>>>> row format delimited >>>>>>> fields terminated by ","; >>>>>> OK >>>>>> Time taken: 0.143 seconds >>>>>> hive> load data local inpath '/home/training/a.csv' >>>>>>> into table test3; >>>>>> Copying data from file:/home/training/a.csv >>>>>> Copying file: file:/home/training/a.csv >>>>>> Loading data to table default.test3 >>>>>> OK >>>>>> Time taken: 0.276 seconds >>>>>> hive> select * from test3; >>>>>> OK >>>>>> c zxy xyz >>>>>> d "abc def" >>>>>> Time taken: 0.156 seconds >>>>>> >>>>>> When i do select f2 from test3; >>>>>> my results are, >>>>>> OK >>>>>> zxy >>>>>> "abc >>>>>> but this should be abc,def >>>>>> When i open the same csv file with Microsoft Excel i got abc,def >>>>>> How should i solve this error?? >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> sandeep >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>>> >>>> >>>> >>> >>> >>> -- >>> Thanks, >>> sandeep >> > > > > -- > Thanks, > sandeep
-
Re: Hive error when loading csv data.Sandeep Reddy P 2012-06-27, 02:52
Thanks Michael Sorry i didnt get that soon. I'll try that and reply you
back. On Tue, Jun 26, 2012 at 10:13 PM, Michel Segel <[EMAIL PROTECTED]>wrote: > Sorry, > I was saying that you can write a python script that replaces the > delimiter with a | and ignore the commas within quotes. > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P <[EMAIL PROTECTED]> > wrote: > > > If i do that my data will be d|"abc|def"|abcd my problem is not solved > > > > On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel <[EMAIL PROTECTED] > >wrote: > > > >> Yup. I just didnt add the quotes. > >> > >> Sent from a remote device. Please excuse any typos... > >> > >> Mike Segel > >> > >> On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P < > [EMAIL PROTECTED]> > >> wrote: > >> > >>> Thanks for the reply. > >>> I didnt get that Michael. My f2 should be "abc,def" > >>> > >>> On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel < > >> [EMAIL PROTECTED]>wrote: > >>> > >>>> Alternatively you could write a simple script to convert the csv to a > >> pipe > >>>> delimited file so that "abc,def" will be abc,def. > >>>> > >>>> On Jun 26, 2012, at 2:51 PM, Harsh J wrote: > >>>> > >>>>> Hive's delimited-fields-format record reader does not handle quoted > >>>>> text that carry the same delimiter within them. Excel supports such > >>>>> records, so it reads it fine. > >>>>> > >>>>> You will need to create your table with a custom InputFormat class > >>>>> that can handle this (Try using OpenCSV readers, they support this), > >>>>> instead of relying on Hive to do this for you. If you're successful > in > >>>>> your approach, please also consider contributing something back to > >>>>> Hive/Pig to help others. > >>>>> > >>>>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P > >>>>> <[EMAIL PROTECTED]> wrote: > >>>>>> > >>>>>> > >>>>>> Hi all, > >>>>>> I have a csv file with 46 columns but i'm getting error when i do > some > >>>>>> analysis on that data type. For simplification i have taken 3 > columns > >>>> and > >>>>>> now my csv is like > >>>>>> c,zxy,xyz > >>>>>> d,"abc,def",abcd > >>>>>> > >>>>>> i have created table for this data using, > >>>>>> hive> create table test3( > >>>>>>> f1 string, > >>>>>>> f2 string, > >>>>>>> f3 string) > >>>>>>> row format delimited > >>>>>>> fields terminated by ","; > >>>>>> OK > >>>>>> Time taken: 0.143 seconds > >>>>>> hive> load data local inpath '/home/training/a.csv' > >>>>>>> into table test3; > >>>>>> Copying data from file:/home/training/a.csv > >>>>>> Copying file: file:/home/training/a.csv > >>>>>> Loading data to table default.test3 > >>>>>> OK > >>>>>> Time taken: 0.276 seconds > >>>>>> hive> select * from test3; > >>>>>> OK > >>>>>> c zxy xyz > >>>>>> d "abc def" > >>>>>> Time taken: 0.156 seconds > >>>>>> > >>>>>> When i do select f2 from test3; > >>>>>> my results are, > >>>>>> OK > >>>>>> zxy > >>>>>> "abc > >>>>>> but this should be abc,def > >>>>>> When i open the same csv file with Microsoft Excel i got abc,def > >>>>>> How should i solve this error?? > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Thanks, > >>>>>> sandeep > >>>>>> > >>>>>> -- > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Harsh J > >>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Thanks, > >>> sandeep > >> > > > > > > > > -- > > Thanks, > > sandeep > -- Thanks, sandeep
-
Re: Hive error when loading csv data.ramakanth reddy 2012-06-27, 12:39
Hi
Can any help me how to start working with hadoop in single Node and cluster environment,please send me some useful links. On Wed, Jun 27, 2012 at 4:50 PM, Subir S <[EMAIL PROTECTED]> wrote: > Pig has this CSVExcelStorage [1] and CSVLoader [2] as part of PiggyBank. It > may help. > > [1] > > http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html > [2] > > http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/CSVLoader.html > > CCed pig user-list also. > > > On Wed, Jun 27, 2012 at 8:22 AM, Sandeep Reddy P < > [EMAIL PROTECTED]> wrote: > > > Thanks Michael Sorry i didnt get that soon. I'll try that and reply you > > back. > > > > On Tue, Jun 26, 2012 at 10:13 PM, Michel Segel < > [EMAIL PROTECTED] > > >wrote: > > > > > Sorry, > > > I was saying that you can write a python script that replaces the > > > delimiter with a | and ignore the commas within quotes. > > > > > > > > > Sent from a remote device. Please excuse any typos... > > > > > > Mike Segel > > > > > > On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P < > > [EMAIL PROTECTED]> > > > wrote: > > > > > > > If i do that my data will be d|"abc|def"|abcd my problem is not > solved > > > > > > > > On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel < > > [EMAIL PROTECTED] > > > >wrote: > > > > > > > >> Yup. I just didnt add the quotes. > > > >> > > > >> Sent from a remote device. Please excuse any typos... > > > >> > > > >> Mike Segel > > > >> > > > >> On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P < > > > [EMAIL PROTECTED]> > > > >> wrote: > > > >> > > > >>> Thanks for the reply. > > > >>> I didnt get that Michael. My f2 should be "abc,def" > > > >>> > > > >>> On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel < > > > >> [EMAIL PROTECTED]>wrote: > > > >>> > > > >>>> Alternatively you could write a simple script to convert the csv > to > > a > > > >> pipe > > > >>>> delimited file so that "abc,def" will be abc,def. > > > >>>> > > > >>>> On Jun 26, 2012, at 2:51 PM, Harsh J wrote: > > > >>>> > > > >>>>> Hive's delimited-fields-format record reader does not handle > quoted > > > >>>>> text that carry the same delimiter within them. Excel supports > such > > > >>>>> records, so it reads it fine. > > > >>>>> > > > >>>>> You will need to create your table with a custom InputFormat > class > > > >>>>> that can handle this (Try using OpenCSV readers, they support > > this), > > > >>>>> instead of relying on Hive to do this for you. If you're > successful > > > in > > > >>>>> your approach, please also consider contributing something back > to > > > >>>>> Hive/Pig to help others. > > > >>>>> > > > >>>>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P > > > >>>>> <[EMAIL PROTECTED]> wrote: > > > >>>>>> > > > >>>>>> > > > >>>>>> Hi all, > > > >>>>>> I have a csv file with 46 columns but i'm getting error when i > do > > > some > > > >>>>>> analysis on that data type. For simplification i have taken 3 > > > columns > > > >>>> and > > > >>>>>> now my csv is like > > > >>>>>> c,zxy,xyz > > > >>>>>> d,"abc,def",abcd > > > >>>>>> > > > >>>>>> i have created table for this data using, > > > >>>>>> hive> create table test3( > > > >>>>>>> f1 string, > > > >>>>>>> f2 string, > > > >>>>>>> f3 string) > > > >>>>>>> row format delimited > > > >>>>>>> fields terminated by ","; > > > >>>>>> OK > > > >>>>>> Time taken: 0.143 seconds > > > >>>>>> hive> load data local inpath '/home/training/a.csv' > > > >>>>>>> into table test3; > > > >>>>>> Copying data from file:/home/training/a.csv > > > >>>>>> Copying file: file:/home/training/a.csv > > > >>>>>> Loading data to table default.test3 > > > >>>>>> OK > > > >>>>>> Time taken: 0.276 seconds > > > >>>>>> hive> select * from test3; > > > >>>>>> OK > > > >>>>>> c zxy xyz > > > >>>>>> d "abc def" > > > >>>>>> Time taken: 0.156 seconds > > > >>>>>> > > > >>>>>> When i do select f2 from test3; > > > >>>>>> my results are, > > > >>>>>> OK Thanks&Regards, Ramakanth, +91-8884035968. |