Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> DATA not storing as comma-separted


Copy link to this message
-
Re: DATA not storing as comma-separted
Yogesh -- based your log info you provided, it seems like your input
data is not tab-delimited, which is the default delimiter when using
PigStorage.  As a result, your 3 space-separated fields are being
pulled as one into name:chararray, and then can't being split out
again when your try to store results into HDFS.

Either override the default delimiter (by explicitly specifying
PigStorage(' ')) in your call to LOAD or change your input data to be
tab delimited.

Norbert

 the first chararray (name).  Since you're not passing a separator
toAs a result, , which is the default separator

On Wed, Jul 25, 2012 at 9:07 PM, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> Why are you trying 0.7, yogesh? It's ancient at this point.
>
> " Unable to create input splits for: file:///hello/demotry.txt "
> implies the file does not exist.
>
> Can you show a whole session in which you load data, store it using
> PigStorage(','), cat it, and it comes out wrong?
> So far I've been unable to reproduce your results.
>
> D
>
> On Wed, Jul 25, 2012 at 7:09 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>> Hello Yogesh,
>>
>>        Also add these lines, export PIG_CLASSPATH=/HADOOP_HOME/conf &
>> export HADOOP_CONF_DIR=/HADOOP_HOME/conf, and see if it works for you.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Wed, Jul 25, 2012 at 6:01 PM,  <[EMAIL PROTECTED]> wrote:
>>> Hi mohammad,
>>>
>>> when I try the command
>>>
>>> Pig
>>>
>>> its shows error for 0.7.0 version
>>>
>>> mediaadmin$ pig
>>> 12/07/25 17:54:15 INFO pig.Main: Logging error messages to: /users/mediaadmin/pig_1343219055229.log
>>> 2012-07-25 17:54:15,451 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
>>>
>>> and this  .log file doesn't exist /users/mediaadmin/
>>>
>>> Wht is it so, I have set the thses properties in pig-0.70.0/bin/pig file.
>>>
>>> ---------------------------------------------------------------------
>>>  The Pig command script
>>> #
>>> # Environment Variables
>>> #
>>>     export JAVA_HOME=/Library/Java/Home
>>> #
>>> #     PIG_CLASSPATH Extra Java CLASSPATH entries.
>>> #
>>>       export HADOOP_HOME=/HADOOP/hadoop-0.20.2
>>>
>>>         export HADOOP_CONF_DIR=/HADOOP/hadoop-0.20.2/conf
>>>
>>> #     PIG_HEAPSIZE    The maximum amount of heap to use, in MB.
>>> #                                        Default is 1000.
>>> #
>>> #     PIG_OPTS            Extra Java runtime options.
>>> #
>>>      export PIG_CONF_DIR=/HADOOP/pig-0.7.0/conf
>>> #
>>> #     PIG_ROOT_LOGGER The root appender. Default is INFO,console
>>> #
>>> #     PIG_HADOOP_VERSION Version of hadoop to run with.    Default is 20 (0.20).
>>>
>>> ----------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: Mohammad Tariq [[EMAIL PROTECTED]]
>>> Sent: Wednesday, July 25, 2012 5:34 PM
>>> To: [EMAIL PROTECTED]
>>> Subject: Re: DATA not storing as comma-separted
>>>
>>> Also, it would be help to go to the MapReduce web UI and having a look
>>> at the details of the job corresponding to this query.
>>>
>>> Regards,
>>>     Mohammad Tariq
>>>
>>>
>>> On Wed, Jul 25, 2012 at 5:31 PM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:
>>>> I have worked with pig-0.7.0 once and it was working fine. Try to see
>>>> if there is anything interesting in the log files. Also, if possible,
>>>> share 2-3 lines of your file..I'll give it a try on my machine.
>>>>
>>>> Regards,
>>>>     Mohammad Tariq
>>>>
>>>>
>>>> On Wed, Jul 25, 2012 at 5:20 PM,  <[EMAIL PROTECTED]> wrote:
>>>>> Hi Mohammad,
>>>>>
>>>>> I have switched from pig 0.10.0 to 0.7.0 and its horrible experience.
>>>>> I do perform
>>>>>
>>>>> grunt> A = load '/hello/demotry.txt'
>>>>>>> as (name:chararray, roll:int, mssg:chararray);
>>>>>
>>>>> grunt> dump A;
>>>>>
>>>>> it shows this error:
>>>>>
>>>>> grunt> dump A;
>>>>> 2012-07-25 17:20:34,081 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A