Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Bigtop >> mail # user >> A first glance/reminder/hack at the BigPetStore pipeline

Copy link to this message
Re: A first glance/reminder/hack at the BigPetStore pipeline
On 10/08/2013 03:16 PM, Jay Vyas wrote:
> Hi folks.
> Ive been hacking around on the big pet store idea.  So far ive only got
> the template for the synthetic data set generator:
> https://raw.github.com/jayunit100/hadoop-example-jobs/master/src/main/java/org/bigtop/bigpetstore/PetStoreTransactionGeneratorJob.java
> This is the "first" phase implementation of a MapReduce job that will a
> generate synthetic data set of transactions in a petstore.
> It is meant to be configurable: So people can use it to generate as many
> transactions as they want.  I will also add more "products" to it.
> 2) The next step will be to flesh out the transaction data and then
> write up aggregations both in hive, pig, and mapreduce.  That will serve
> as the ETL blueprint.
> 3) Then the interesting part will come:  Feeding those ETL'd statistics
> into an available data store that is bigtop supported : i.e. SOLR
> indices and  HBASE keyvalues.
> At that point the sample application will be ready and the first
> iteration of bigtop.blueprints will be ready to share.
> If Any initial thoughts or anyone else wants to jump in, let me know.? :)
> Jay Vyas
> http://jayunit100.blogspot.com
Looks like a great start!
Can't wait to see the following parts.

Some notss:
* Missing license header
* Package name should probably be org.apache.bigtop.blueprint.bigpetstore
* It would be nice to split all these classes in different files
* It would be nice to group instance variables at the same location (ex:
int soFar is declared right in the middle between two methods)
* It would be nice to extract strings such as "Dud Job", "transactions"
or "transaction_files" into constants
* I have spotted some System.out.println