Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Pig write to single file


+
Mark 2013-05-01, 16:51
+
Mike Sukmanowsky 2013-05-01, 17:17
Copy link to this message
-
Re: Pig write to single file
Mark 2013-05-01, 17:21
What I'm doing is at the end of each day I deduce and store all my log files in lzo format in an archive directory. I thought that since LZO is splittable and Hadoop likes larger files that this would be best. Is this not the case?

And to answer your question there seems to be 2 files around 800mb in size.

On May 1, 2013, at 10:17 AM, Mike Sukmanowsky <[EMAIL PROTECTED]> wrote:

> How many output files are you getting?  You can set SET DEFAULT_PARALLEL 1;
> so you don't have to specify parallelism on each reduce phase.
>
> In general though, I wouldn't recommend forcing your output into one file
> (parallelism is good).  Just write a shell/python/ruby/perl script that
> appends the files after the full job executes.
>
>
> On Wed, May 1, 2013 at 12:51 PM, Mark <[EMAIL PROTECTED]> wrote:
>
>> Thought I understood how to output to a single file but It doesn't seem to
>> be working. Anything I'm missing here?
>>
>>
>> -- Dedupe and store
>>
>> rows   = LOAD '$input';
>> unique = DISTINCT rows PARELLEL 1;
>>
>> STORE unique INTO '$output';
>>
>>
>>
>
>
> --
> Mike Sukmanowsky
>
> Product Lead, http://parse.ly
> 989 Avenue of the Americas, 3rd Floor
> New York, NY  10018
> p: +1 (416) 953-4248
> e: [EMAIL PROTECTED]