Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Fallback for output data storage


Copy link to this message
-
Re: Fallback for output data storage
Alan Gates 2012-08-23, 15:01
You can simply store the data twice at the end of your script.  Pig will split it and send it to both.  It shouldn't fail the HDFS storage if the dbstorage fails (but test this first to make sure I'm correct.)

So your script would look like:

A = load ...
store Z into 'db' using DBStorage();
store Z into '/data/fallback';

Alan.

On Aug 23, 2012, at 4:38 AM, Markus Resch wrote:

> Hi everyone,
>
> we are planing to put our aggregations result into an external data
> base. To handle a connection failure to that external resource properly
> we currently store the result onto the hdfs and sync it to the db after
> that by a second pig script using the db's manufacturers pig data
> storage. We do that because we hardly can effort to redo all the
> aggregations in case of an error at the very end of the aggregation.
>
> If we could do something like to define a fallback data storage (e.g. to
> the hdfs) that will be used in case of an connection issue we could drop
> that entire second step an save a lot of effort.
> Is there anything like this?
>
> Kind Regards
>
> Markus
>