Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Pig write to single file


Copy link to this message
-
Re: Pig write to single file
Mike Sukmanowsky 2013-05-01, 17:17
How many output files are you getting?  You can set SET DEFAULT_PARALLEL 1;
so you don't have to specify parallelism on each reduce phase.

In general though, I wouldn't recommend forcing your output into one file
(parallelism is good).  Just write a shell/python/ruby/perl script that
appends the files after the full job executes.
On Wed, May 1, 2013 at 12:51 PM, Mark <[EMAIL PROTECTED]> wrote:

> Thought I understood how to output to a single file but It doesn't seem to
> be working. Anything I'm missing here?
>
>
> -- Dedupe and store
>
> rows   = LOAD '$input';
> unique = DISTINCT rows PARELLEL 1;
>
> STORE unique INTO '$output';
>
>
>
--
Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248
e: [EMAIL PROTECTED]