Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> loading data


+
Andrew Hammond 2011-03-10, 23:44
+
Erik Paulson 2008-03-24, 22:07
+
Benjamin Reed 2008-03-24, 22:20
+
Erik Paulson 2008-03-31, 22:10
Copy link to this message
-
RE: Loading data
This works: '\u007c'

grunt> a = load '/homes/amiry/tmp/abc.txt' using PigStorage();
grunt> dump a;
(1|2|3)
grunt> b = load '/homes/amiry/tmp/abc.txt' using PigStorage('\u007c');;
grunt> dump b;
(1, 2, 3)

-Amir

-----Original Message-----
From: Benjamin Reed [mailto:[EMAIL PROTECTED]]
Sent: Monday, March 24, 2008 3:20 PM
To: [EMAIL PROTECTED]
Cc: Erik Paulson
Subject: Re: Loading data

PigStorage uses regex for splitting as defined in:

http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#sum

It looks like you might need to specify PigStorage('[|]').

And yes, pig does process directories just like hadoop.

ben

On Monday 24 March 2008 15:07:39 Erik Paulson wrote:
> Hello all -
>
> I'm trying to load data that is seperated by '|' characters, using the

> PigStorage layer (using today's SVN)
>
> From following the code in Tuple, I think I'm doing this right, but
> maybe something in the parser is eating my character seperators?
>
>
>
> grunt> cat /tmp/pipeseperated
> first|second|third
> grunt> cat /tmp/commaseperated
> first,second,third
> grunt> pipedata = load '/tmp/pipeseperated' using PigStorage('\\|');
> grunt> commadata = load '/tmp/commaseperated' using PigStorage(',');
> grunt> dump pipedata
> (, f, i, r, s, t, |, s, e, c, o, n, d, |, t, h, i, r, d, )
> grunt> dump commadata;
> (first, second, third)
> grunt> trytwo = load '/tmp/pipeseperated' using PigStorage('|'); dump
> grunt> trytwo
> (, f, i, r, s, t, |, s, e, c, o, n, d, |, t, h, i, r, d, )
>
>
> And a second question: in Hadoop, it's customary to give a path to a
> directory containing all of the input files - is the same thing doable

> in Pig?
>
> Thanks!
>
> -Erik
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB