Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # user >> How does one preprocess the data so that they can be exported using sqoop


Copy link to this message
-
Re: How does one preprocess the data so that they can be exported using sqoop
Hi Matthieu,
Sqoop is currently highly specialized EL tool (extract-load) and not a generic ETL tool (extract-transform-load). Thus you need to execute custom mapreduce/pig/hive job that will separate all three different logical tables and prepare data into format that Sqoop can process.

Jarcec

On Fri, May 17, 2013 at 05:44:03PM -0400, Matthieu Labour wrote:
> Hi
>
> I would be grateful for any tips on how to "prepare" the data so they can
> be exported to a Postgesql Database using sqoop.
>
> As an example:
>
> Provided some files of events. (user events, product events,
> productActivity events)
>
> [file0001]
> event:user propertes:{name:"john" ...}
> event:product properties:{ref:123,color:"blue",...
> event:productActivity properties:{user:"john", product:"ref", action:"buy"}
> .....
>
> How does one come up with the primary keys and the user_product join table
> ready to be exported?
>
> On other words.
>
> function(Input:eventfile) => output:[productFile, userFile,
> user_productFile with auto generated primary keys ]
>
> what goes into function?
>
> I hope this makes sense!
>
> Thank you in advance for any help
>
> -matt
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB