Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Re: Hive error when loading csv data.


Copy link to this message
-
Re: Hive error when loading csv data.
More options -
Official apache instructions for 1.0 -
http://hadoop.apache.org/common/docs/r1.0.3/single_node_setup.html

If you want to try it out on single node on Amazon ec2-
Instructions for HDP distro -
http://hortonworks.com/community/virtual-sandbox/

If you want a wizard based guided install on single node, you can use
HDP for that as well - http://hortonworks.com/download/

Thanks,
Thejas

On 6/27/12 8:38 AM, Ruslan Al-Fakikh wrote:
> Hi,
>
> You may try Cloudera's pseudo-distributed mode
> https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+in+Pseudo-Distributed+Mode
> You may also try Cloudera's demo VM
> https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM
>
> Regards,
> Ruslan Al-Fakikh
>
> On Wed, Jun 27, 2012 at 4:39 PM, ramakanth reddy
> <[EMAIL PROTECTED]>  wrote:
>> Hi
>>
>> Can any help me how to start working with hadoop in single Node and cluster
>> environment,please send me some useful links.
>>
>> On Wed, Jun 27, 2012 at 4:50 PM, Subir S<[EMAIL PROTECTED]>  wrote:
>>
>>> Pig has this CSVExcelStorage [1] and CSVLoader [2] as part of PiggyBank. It
>>> may help.
>>>
>>> [1]
>>>
>>> http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html
>>> [2]
>>>
>>> http://pig.apache.org/docs/r0.9.2/api/org/apache/pig/piggybank/storage/CSVLoader.html
>>>
>>> CCed pig user-list also.
>>>
>>>
>>> On Wed, Jun 27, 2012 at 8:22 AM, Sandeep Reddy P<
>>> [EMAIL PROTECTED]>  wrote:
>>>
>>>> Thanks Michael Sorry i didnt get that soon. I'll try that and reply you
>>>> back.
>>>>
>>>> On Tue, Jun 26, 2012 at 10:13 PM, Michel Segel<
>>> [EMAIL PROTECTED]
>>>>> wrote:
>>>>
>>>>> Sorry,
>>>>> I was saying  that you can write a python script that replaces the
>>>>> delimiter with a | and ignore the commas within quotes.
>>>>>
>>>>>
>>>>> Sent from a remote device. Please excuse any typos...
>>>>>
>>>>> Mike Segel
>>>>>
>>>>> On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P<
>>>> [EMAIL PROTECTED]>
>>>>> wrote:
>>>>>
>>>>>> If i do that my data will be d|"abc|def"|abcd my problem is not
>>> solved
>>>>>>
>>>>>> On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel<
>>>> [EMAIL PROTECTED]
>>>>>> wrote:
>>>>>>
>>>>>>> Yup. I just didnt add the quotes.
>>>>>>>
>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>
>>>>>>> Mike Segel
>>>>>>>
>>>>>>> On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P<
>>>>> [EMAIL PROTECTED]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the reply.
>>>>>>>> I didnt get that Michael. My f2 should be "abc,def"
>>>>>>>>
>>>>>>>> On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel<
>>>>>>> [EMAIL PROTECTED]>wrote:
>>>>>>>>
>>>>>>>>> Alternatively you could write a simple script to convert the csv
>>> to
>>>> a
>>>>>>> pipe
>>>>>>>>> delimited file so that "abc,def" will be abc,def.
>>>>>>>>>
>>>>>>>>> On Jun 26, 2012, at 2:51 PM, Harsh J wrote:
>>>>>>>>>
>>>>>>>>>> Hive's delimited-fields-format record reader does not handle
>>> quoted
>>>>>>>>>> text that carry the same delimiter within them. Excel supports
>>> such
>>>>>>>>>> records, so it reads it fine.
>>>>>>>>>>
>>>>>>>>>> You will need to create your table with a custom InputFormat
>>> class
>>>>>>>>>> that can handle this (Try using OpenCSV readers, they support
>>>> this),
>>>>>>>>>> instead of relying on Hive to do this for you. If you're
>>> successful
>>>>> in
>>>>>>>>>> your approach, please also consider contributing something back
>>> to
>>>>>>>>>> Hive/Pig to help others.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P
>>>>>>>>>> <[EMAIL PROTECTED]>  wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>> I have a csv file with 46 columns but i'm getting error when i
>>> do
>>>>> some
>>>>>>>>>>> analysis on that data type. For simplification i have taken 3
>>>>> columns
>>>>>>>>> and
>>>>>>>>>>> now my csv is like
>>>>>>>>>>> c,zxy,xyz
>>>>>>>>>>> d,"abc,def",abcd
>>>>>>>>>>>
>>>>>>>>>>> i have created table for this data using,
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB