-Re: A first glance/reminder/hack at the BigPetStore pipeline
Bruno Mahé 2013-10-25, 08:50
On 10/08/2013 03:16 PM, Jay Vyas wrote:
> Hi folks.
> Ive been hacking around on the big pet store idea. So far ive only got
> the template for the synthetic data set generator:
> This is the "first" phase implementation of a MapReduce job that will a
> generate synthetic data set of transactions in a petstore.
> It is meant to be configurable: So people can use it to generate as many
> transactions as they want. I will also add more "products" to it.
> 2) The next step will be to flesh out the transaction data and then
> write up aggregations both in hive, pig, and mapreduce. That will serve
> as the ETL blueprint.
> 3) Then the interesting part will come: Feeding those ETL'd statistics
> into an available data store that is bigtop supported : i.e. SOLR
> indices and HBASE keyvalues.
> At that point the sample application will be ready and the first
> iteration of bigtop.blueprints will be ready to share.
> If Any initial thoughts or anyone else wants to jump in, let me know.? :)
> Jay Vyas
Looks like a great start!
Can't wait to see the following parts.
* Missing license header
* Package name should probably be org.apache.bigtop.blueprint.bigpetstore
* It would be nice to split all these classes in different files
* It would be nice to group instance variables at the same location (ex:
int soFar is declared right in the middle between two methods)
* It would be nice to extract strings such as "Dud Job", "transactions"
or "transaction_files" into constants
* I have spotted some System.out.println