Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> loading data


+
Andrew Hammond 2011-03-10, 23:44
+
Erik Paulson 2008-03-24, 22:07
+
Benjamin Reed 2008-03-24, 22:20
+
Erik Paulson 2008-03-31, 22:10
Copy link to this message
-
RE: Loading data
This works: '\u007c'

grunt> a = load '/homes/amiry/tmp/abc.txt' using PigStorage();
grunt> dump a;
(1|2|3)
grunt> b = load '/homes/amiry/tmp/abc.txt' using PigStorage('\u007c');;
grunt> dump b;
(1, 2, 3)

-Amir

-----Original Message-----
From: Benjamin Reed [mailto:[EMAIL PROTECTED]]
Sent: Monday, March 24, 2008 3:20 PM
To: [EMAIL PROTECTED]
Cc: Erik Paulson
Subject: Re: Loading data

PigStorage uses regex for splitting as defined in:

http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#sum

It looks like you might need to specify PigStorage('[|]').

And yes, pig does process directories just like hadoop.

ben

On Monday 24 March 2008 15:07:39 Erik Paulson wrote:
> Hello all -
>
> I'm trying to load data that is seperated by '|' characters, using the

> PigStorage layer (using today's SVN)
>
> From following the code in Tuple, I think I'm doing this right, but
> maybe something in the parser is eating my character seperators?
>
>
>
> grunt> cat /tmp/pipeseperated
> first|second|third
> grunt> cat /tmp/commaseperated
> first,second,third
> grunt> pipedata = load '/tmp/pipeseperated' using PigStorage('\\|');
> grunt> commadata = load '/tmp/commaseperated' using PigStorage(',');
> grunt> dump pipedata
> (, f, i, r, s, t, |, s, e, c, o, n, d, |, t, h, i, r, d, )
> grunt> dump commadata;
> (first, second, third)
> grunt> trytwo = load '/tmp/pipeseperated' using PigStorage('|'); dump
> grunt> trytwo
> (, f, i, r, s, t, |, s, e, c, o, n, d, |, t, h, i, r, d, )
>
>
> And a second question: in Hadoop, it's customary to give a path to a
> directory containing all of the input files - is the same thing doable

> in Pig?
>
> Thanks!
>
> -Erik