It would depend on the data volume mainly. Hadoop can be used to refine the
data before inserting into a traditional architecture (like a database).
If you want to write jobs, several solutions have emerged :
* plain Mapred/Mapreduce APIs (former is older than the latter but both are
plain default java APIs)
* use python or other languages with Hadoop streaming
* Cascading/Crunch... provides a more high level java APIs (and you have
scalding/cascalog as scala/clojure 'wrapper')
* pig / hive if you want a specific high level language (hive ql is
* and then you have commercial products too...
So it depends really on what you want to use it for and what competencies
you (your team, your company) has.
On Wed, Sep 5, 2012 at 10:42 AM, pgaurav <[EMAIL PROTECTED]> wrote:
> Hi Guys,
> I’m 5 days old in hadoop world and trying to analyse this as a long term
> solution to our client.
> I could do some r&d on Amazon EC2 / EMR:
> Load the data, text / csv, to S3
> Write your mapper / reducer / Jobclient and upload the jar to s3
> Start a job flow
> I tried 2 sample code, word count and csv data process.
> My question is that to further analyse the data / reporting / search, what
> should be done? Do I need to implement in Mapper class itself? Do I need to
> dump the data to the database and then write some custom application? What
> is the standard way to analysing the data?
> View this message in context:
> Sent from the Hadoop core-user mailing list archive at Nabble.com.