Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Best Practice: store depending on data content


Copy link to this message
-
Re: Best Practice: store depending on data content
You can use MultiStorage store func -
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/MultiStorage.html

Or if you want something more flexible, and have metadata as well, use
hcatalog . Specify the keys on which you want to partition as your
partition keys in the table. Then use HcatStorer() to store the data.
See http://incubator.apache.org/hcatalog/docs/r0.4.0/index.html

Thanks,
Thejas

On 6/22/12 4:54 AM, Markus Resch wrote:
> Hey everyone,
>
> We're doing some aggregation. The result contains a key where we want to
> have a single output file for each key. Is it possible to store files
> like this? Especially adjusting the path by the key's value.
>
> Example:
> Input = LOAD 'my/data.avro' USING AvroStorage;
> [.... doing stuff....]
> Output = GROUP AggregatesValues BY Key;
> FOREACH Output Store * into
> '/my/output/path/by/$Output.Key/Result.avro'
>
> I know this example does not work. But is there anything similar
> possible? And, as I assume, not: is there some framework in the hadoop
> world that can do such stuff?
>
>
> Thanks
>
> Markus
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB