Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Set block size of output


+
Johannes Schwenk 2012-10-15, 10:04
+
Joe Crobak 2012-10-22, 02:01
Copy link to this message
-
Re: Set block size of output
Am 22.10.2012 04:01, schrieb Joe Crobak:
> Hi Johannes,
>
> HDFS block size is controlled by the property 'dfs.blocksize'. You should
> be able to use `set` to control this within your pig script:
> http://pig.apache.org/docs/r0.10.0/cmds.html#set I think that it should
> also work to pass that in via PIG_OPTS, e.g.
> PIG_OPTS='-Ddfs.blocksize=1048576'

Hi Joe,

thanks, this works well. It's dfs.block.size by the way.

Now, is it possible to set this on a per STORE statement basis? If I
have two STORE statements and want the first of them use the default
block size and the second a very small block size, this should be
possible like this:
[...]
STORE a INTO '/user/schwenk/out/a';
SET dfs.block.size 2048;
STORE b INTO '/user/schwenk/out/b';
To my surprise, the files in out/a also had a blocksize of only 2KB!

What can I do? Do I have to write my own storage function for this?

Thanks,
Johannes

> HTH,
> Joe
>
> On Mon, Oct 15, 2012 at 6:04 AM, Johannes Schwenk <
> [EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> I would like to set the HDFS block size of my pig scripts output files.
>> How do I do that? I tried to use
>>
>> PIG_OPTS="-Dpig.path.block.size=1048576";
>>
>> which seemed to me the only appropriate option I could find.
>>
>> Thanks for any hints!
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
>

Johannes Schwenk

--
Softwareentwickler (Reporting)
________________________________________________________

ADITION technologies AG
Schwarzwaldstraße 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434

+
Johannes Schwenk 2012-10-15, 11:53
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB