Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop, mail # user - How does one preprocess the data so that they can be exported using sqoop


Copy link to this message
-
Re: How does one preprocess the data so that they can be exported using sqoop
Jarek Jarcec Cecho 2013-05-20, 06:37
Hi Matthieu,
Sqoop is currently highly specialized EL tool (extract-load) and not a generic ETL tool (extract-transform-load). Thus you need to execute custom mapreduce/pig/hive job that will separate all three different logical tables and prepare data into format that Sqoop can process.

Jarcec

On Fri, May 17, 2013 at 05:44:03PM -0400, Matthieu Labour wrote:
> Hi
>
> I would be grateful for any tips on how to "prepare" the data so they can
> be exported to a Postgesql Database using sqoop.
>
> As an example:
>
> Provided some files of events. (user events, product events,
> productActivity events)
>
> [file0001]
> event:user propertes:{name:"john" ...}
> event:product properties:{ref:123,color:"blue",...
> event:productActivity properties:{user:"john", product:"ref", action:"buy"}
> .....
>
> How does one come up with the primary keys and the user_product join table
> ready to be exported?
>
> On other words.
>
> function(Input:eventfile) => output:[productFile, userFile,
> user_productFile with auto generated primary keys ]
>
> what goes into function?
>
> I hope this makes sense!
>
> Thank you in advance for any help
>
> -matt