|
|
-
[Hadoop-Help]About Map-Reduce implementation
Mayur Patil 2013-02-05, 21:31
Hello,
I am new to Hadoop. I am doing a project in cloud in which I
have to use hadoop for Map-reduce. It is such that I am going
to collect logs from 2-3 machines having different locations.
The logs are also in different formats such as .rtf .log .txt
Later, I have to collect and convert them to one format and
collect to one location.
So I am asking which module of Hadoop that I need to study
for this implementation?? Or whole framework should I need
to study ??
Seeking for guidance,
Thank you !! -- *Cheers,* *Mayur.*
-
[Hadoop-Help]About Map-Reduce implementation
Mayur Patil 2013-02-05, 21:36
Hello,
I am new to Hadoop. I am doing a project in cloud in which I
have to use hadoop for Map-reduce. It is such that I am going
to collect logs from 2-3 machines having different locations.
The logs are also in different formats such as .rtf .log .txt
Later, I have to collect and convert them to one format and
collect to one location.
So I am asking which module of Hadoop that I need to study
for this implementation?? Or whole framework should I need
to study ??
Seeking for guidance,
Thank you !!
-- *Cheers,* *Mayur.*
-
Re: [Hadoop-Help]About Map-Reduce implementation
Nitin Pawar 2013-02-05, 21:39
Hey Mayur,
If you are collecting logs from multiple servers then you can use flume for the same.
if the contents of the logs are different in format then you can just use textfileinput format to read and write into any other format you want for your processing in later part of your projects
first thing you need to learn is how to setup hadoop then you can try writing sample hadoop mapreduce jobs to read from text file and then process them and write the results into another file then you can integrate flume as your log collection mechanism once you get hold on the system then you can decide more on which paths you want to follow based on your requirements for storage, compute time, compute capacity, compression etc On Wed, Feb 6, 2013 at 3:01 AM, Mayur Patil <[EMAIL PROTECTED]>wrote:
> Hello, > > I am new to Hadoop. I am doing a project in cloud in which I > > have to use hadoop for Map-reduce. It is such that I am going > > to collect logs from 2-3 machines having different locations. > > The logs are also in different formats such as .rtf .log .txt > > Later, I have to collect and convert them to one format and > > collect to one location. > > So I am asking which module of Hadoop that I need to study > > for this implementation?? Or whole framework should I need > > to study ?? > > Seeking for guidance, > > Thank you !! > -- > *Cheers,* > *Mayur.* >
-- Nitin Pawar
-
Re: [Hadoop-Help]About Map-Reduce implementation
Jagat Singh 2013-02-05, 21:42
Hi,
Please read basics on how hadoop works.
Then start your hands on with map reduce coding.
The tool which has been made for you is flume , but don't see tool till you complete above two steps.
Good luck , keep us posted.
Regards,
Jagat Singh
----------- Sent from Mobile , short and crisp. On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> wrote:
> Hello, > > I am new to Hadoop. I am doing a project in cloud in which I > > have to use hadoop for Map-reduce. It is such that I am going > > to collect logs from 2-3 machines having different locations. > > The logs are also in different formats such as .rtf .log .txt > > Later, I have to collect and convert them to one format and > > collect to one location. > > So I am asking which module of Hadoop that I need to study > > for this implementation?? Or whole framework should I need > > to study ?? > > Seeking for guidance, > > Thank you !! > -- > *Cheers,* > *Mayur.* >
-
Re: [Hadoop-Help]About Map-Reduce implementation
Mayur Patil 2013-02-06, 11:26
Thanks to you duo. You solved my problem so easily. I want to
ask one more question; for reference. I have
1. hadoop the definitive guide 2. Hadoop In Action
Is it sufficient or do I need some more material to study
your suggested implementation?? * -- Cheers, Mayur*
Hey Mayur, > > If you are collecting logs from multiple servers then you can use flume > for the same. > > if the contents of the logs are different in format then you can just use > textfileinput format to read and write into any other format you want for > your processing in later part of your projects > > first thing you need to learn is how to setup hadoop > then you can try writing sample hadoop mapreduce jobs to read from text > file and then process them and write the results into another file > then you can integrate flume as your log collection mechanism > once you get hold on the system then you can decide more on which paths > you want to follow based on your requirements for storage, compute time, > compute capacity, compression etc > -------------- --------------
> Hi, > > Please read basics on how hadoop works. > > Then start your hands on with map reduce coding. > > The tool which has been made for you is flume , but don't see tool till > you complete above two steps. > > Good luck , keep us posted. > > Regards, > > Jagat Singh > > ----------- > Sent from Mobile , short and crisp. > On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I am new to Hadoop. I am doing a project in cloud in which I >> >> have to use hadoop for Map-reduce. It is such that I am going >> >> to collect logs from 2-3 machines having different locations. >> >> The logs are also in different formats such as .rtf .log .txt >> >> Later, I have to collect and convert them to one format and >> >> collect to one location. >> >> So I am asking which module of Hadoop that I need to study >> >> for this implementation?? Or whole framework should I need >> >> to study ?? >> >> Seeking for guidance, >> >> Thank you !! >> -- >> *Cheers,* >> *Mayur.* >> >
-
Re: [Hadoop-Help]About Map-Reduce implementation
Nitin Pawar 2013-02-06, 13:04
thats more than sufficient On Wed, Feb 6, 2013 at 4:56 PM, Mayur Patil <[EMAIL PROTECTED]>wrote:
> Thanks to you duo. You solved my problem so easily. I want to > > ask one more question; for reference. I have > > 1. hadoop the definitive guide > 2. Hadoop In Action > > Is it sufficient or do I need some more material to study > > your suggested implementation?? > * > -- > Cheers, > Mayur* > > Hey Mayur, >> >> If you are collecting logs from multiple servers then you can use flume >> for the same. >> >> if the contents of the logs are different in format then you can just >> use textfileinput format to read and write into any other format you want >> for your processing in later part of your projects >> >> first thing you need to learn is how to setup hadoop >> then you can try writing sample hadoop mapreduce jobs to read from text >> file and then process them and write the results into another file >> then you can integrate flume as your log collection mechanism >> once you get hold on the system then you can decide more on which paths >> you want to follow based on your requirements for storage, compute time, >> compute capacity, compression etc >> > -------------- > -------------- > >> Hi, >> >> Please read basics on how hadoop works. >> >> Then start your hands on with map reduce coding. >> >> The tool which has been made for you is flume , but don't see tool till >> you complete above two steps. >> >> Good luck , keep us posted. >> >> Regards, >> >> Jagat Singh >> >> ----------- >> Sent from Mobile , short and crisp. >> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> wrote: >> >>> Hello, >>> >>> I am new to Hadoop. I am doing a project in cloud in which I >>> >>> have to use hadoop for Map-reduce. It is such that I am going >>> >>> to collect logs from 2-3 machines having different locations. >>> >>> The logs are also in different formats such as .rtf .log .txt >>> >>> Later, I have to collect and convert them to one format and >>> >>> collect to one location. >>> >>> So I am asking which module of Hadoop that I need to study >>> >>> for this implementation?? Or whole framework should I need >>> >>> to study ?? >>> >>> Seeking for guidance, >>> >>> Thank you !! >>> -- >>> *Cheers,* >>> *Mayur.* >>> >> -- Nitin Pawar
-
Re: [Hadoop-Help]About Map-Reduce implementation
Mayur Patil 2013-02-14, 09:39
Hello,
I just read about Pig
> Pig > A data flow language and execution environment for exploring very large datasets. > Pig runs on HDFS and MapReduce clusters.
What the actual difference between Pig and Flume makes in logs clustering??
Thank you !! -- Cheers, Mayur. > Thanks to you duo. You solved my problem so easily. I want to > > ask one more question; for reference. I have > > 1. hadoop the definitive guide > 2. Hadoop In Action > > Is it sufficient or do I need some more material to study > > your suggested implementation?? > * > -- > Cheers, > Mayur* > > Hey Mayur, >> >> If you are collecting logs from multiple servers then you can use flume >> for the same. >> >> if the contents of the logs are different in format then you can just >> use >> textfileinput format to read and write into any other format you want for >> your processing in later part of your projects >> >> first thing you need to learn is how to setup hadoop >> then you can try writing sample hadoop mapreduce jobs to read from text >> file and then process them and write the results into another file >> then you can integrate flume as your log collection mechanism >> once you get hold on the system then you can decide more on which paths >> you want to follow based on your requirements for storage, compute time, >> compute capacity, compression etc >> > -------------- > -------------- > >> Hi, >> >> Please read basics on how hadoop works. >> >> Then start your hands on with map reduce coding. >> >> The tool which has been made for you is flume , but don't see tool till >> you complete above two steps. >> >> Good luck , keep us posted. >> >> Regards, >> >> Jagat Singh >> >> ----------- >> Sent from Mobile , short and crisp. >> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> wrote: >> >>> Hello, >>> >>> I am new to Hadoop. I am doing a project in cloud in which I >>> >>> have to use hadoop for Map-reduce. It is such that I am going >>> >>> to collect logs from 2-3 machines having different locations. >>> >>> The logs are also in different formats such as .rtf .log .txt >>> >>> Later, I have to collect and convert them to one format and >>> >>> collect to one location. >>> >>> So I am asking which module of Hadoop that I need to study >>> >>> for this implementation?? Or whole framework should I need >>> >>> to study ?? >>> >>> Seeking for guidance, >>> >>> Thank you !! >>> -- >>> *Cheers,* >>> *Mayur.* >>> >> > -- *Cheers, Mayur*.
-
Re: [Hadoop-Help]About Map-Reduce implementation
Prashant Kommireddi 2013-02-14, 09:51
Hi mayur,
Flume is used for data collection. Pig is used for data processing. For eg, if you have a bunch of servers that you want to collect the logs from and push to HDFS - you would use flume. Now if you need to run some analysis on that data, you could use pig to do that.
Sent from my iPhone
On Feb 14, 2013, at 1:39 AM, Mayur Patil <[EMAIL PROTECTED]> wrote:
> Hello, > > I just read about Pig > >> Pig >> A data flow language and execution environment for exploring very > large datasets. >> Pig runs on HDFS and MapReduce clusters. > > What the actual difference between Pig and Flume makes in logs clustering?? > > Thank you !! > -- > Cheers, > Mayur. > > > > >> Thanks to you duo. You solved my problem so easily. I want to >> >> ask one more question; for reference. I have >> >> 1. hadoop the definitive guide >> 2. Hadoop In Action >> >> Is it sufficient or do I need some more material to study >> >> your suggested implementation?? >> * >> -- >> Cheers, >> Mayur* >> >> Hey Mayur, >>> >>> If you are collecting logs from multiple servers then you can use flume >>> for the same. >>> >>> if the contents of the logs are different in format then you can just >>> use >>> textfileinput format to read and write into any other format you want for >>> your processing in later part of your projects >>> >>> first thing you need to learn is how to setup hadoop >>> then you can try writing sample hadoop mapreduce jobs to read from text >>> file and then process them and write the results into another file >>> then you can integrate flume as your log collection mechanism >>> once you get hold on the system then you can decide more on which paths >>> you want to follow based on your requirements for storage, compute time, >>> compute capacity, compression etc >>> >> -------------- >> -------------- >> >>> Hi, >>> >>> Please read basics on how hadoop works. >>> >>> Then start your hands on with map reduce coding. >>> >>> The tool which has been made for you is flume , but don't see tool till >>> you complete above two steps. >>> >>> Good luck , keep us posted. >>> >>> Regards, >>> >>> Jagat Singh >>> >>> ----------- >>> Sent from Mobile , short and crisp. >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> wrote: >>> >>>> Hello, >>>> >>>> I am new to Hadoop. I am doing a project in cloud in which I >>>> >>>> have to use hadoop for Map-reduce. It is such that I am going >>>> >>>> to collect logs from 2-3 machines having different locations. >>>> >>>> The logs are also in different formats such as .rtf .log .txt >>>> >>>> Later, I have to collect and convert them to one format and >>>> >>>> collect to one location. >>>> >>>> So I am asking which module of Hadoop that I need to study >>>> >>>> for this implementation?? Or whole framework should I need >>>> >>>> to study ?? >>>> >>>> Seeking for guidance, >>>> >>>> Thank you !! >>>> -- >>>> *Cheers,* >>>> *Mayur.* >>>> >>> >> > > > -- > *Cheers, > Mayur*.
-
Re: [Hadoop-Help]About Map-Reduce implementation
Mayur Patil 2013-03-04, 07:20
Hello,
Now I am slowly understanding Hadoop working.
As I want to collect the logs from three machines
including Master itself . My small query is
which mode should I implement for this??
- Standalone Operation - Pseudo-Distributed Operation - Fully-Distributed Operation
Seeking for guidance,
Thank you !! *-- Cheers, Mayur* Hi mayur, > > Flume is used for data collection. Pig is used for data processing. > For eg, if you have a bunch of servers that you want to collect the > logs from and push to HDFS - you would use flume. Now if you need to > run some analysis on that data, you could use pig to do that. > > Sent from my iPhone > > On Feb 14, 2013, at 1:39 AM, Mayur Patil <[EMAIL PROTECTED]> wrote: > > > Hello, > > > > I just read about Pig > > > >> Pig > >> A data flow language and execution environment for exploring very > > large datasets. > >> Pig runs on HDFS and MapReduce clusters. > > > > What the actual difference between Pig and Flume makes in logs > clustering?? > > > > Thank you !! > > -- > > Cheers, > > Mayur. > > > > > > > >> Hey Mayur, > >>> > >>> If you are collecting logs from multiple servers then you can use flume > >>> for the same. > >>> > >>> if the contents of the logs are different in format then you can just > >>> use > >>> textfileinput format to read and write into any other format you want > for > >>> your processing in later part of your projects > >>> > >>> first thing you need to learn is how to setup hadoop > >>> then you can try writing sample hadoop mapreduce jobs to read from text > >>> file and then process them and write the results into another file > >>> then you can integrate flume as your log collection mechanism > >>> once you get hold on the system then you can decide more on which paths > >>> you want to follow based on your requirements for storage, compute > time, > >>> compute capacity, compression etc > >>> > >> -------------- > >> -------------- > >> > >>> Hi, > >>> > >>> Please read basics on how hadoop works. > >>> > >>> Then start your hands on with map reduce coding. > >>> > >>> The tool which has been made for you is flume , but don't see tool till > >>> you complete above two steps. > >>> > >>> Good luck , keep us posted. > >>> > >>> Regards, > >>> > >>> Jagat Singh > >>> > >>> ----------- > >>> Sent from Mobile , short and crisp. > >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> > wrote: > >>> > >>>> Hello, > >>>> > >>>> I am new to Hadoop. I am doing a project in cloud in which I > >>>> > >>>> have to use hadoop for Map-reduce. It is such that I am going > >>>> > >>>> to collect logs from 2-3 machines having different locations. > >>>> > >>>> The logs are also in different formats such as .rtf .log .txt > >>>> > >>>> Later, I have to collect and convert them to one format and > >>>> > >>>> collect to one location. > >>>> > >>>> So I am asking which module of Hadoop that I need to study > >>>> > >>>> for this implementation?? Or whole framework should I need > >>>> > >>>> to study ?? > >>>> > >>>> Seeking for guidance, > >>>> > >>>> Thank you !! >
-
Re: [Hadoop-Help]About Map-Reduce implementation
Mayur Patil 2013-03-08, 01:45
Hello,
Now I am slowly understanding Hadoop working.
As I want to collect the logs from three machines
including Master itself . My small query is
which mode should I implement for this??
Standalone Operation Pseudo-Distributed Operation Fully-Distributed Operation
Seeking for guidance,
Thank you !! *-- Cheers, Mayur* Hi mayur, >> >> Flume is used for data collection. Pig is used for data processing. >> For eg, if you have a bunch of servers that you want to collect the >> logs from and push to HDFS - you would use flume. Now if you need to >> run some analysis on that data, you could use pig to do that. >> >> Sent from my iPhone >> >> On Feb 14, 2013, at 1:39 AM, Mayur Patil <[EMAIL PROTECTED]> >> wrote: >> >> > Hello, >> > >> > I just read about Pig >> > >> >> Pig >> >> A data flow language and execution environment for exploring very >> > large datasets. >> >> Pig runs on HDFS and MapReduce clusters. >> > >> > What the actual difference between Pig and Flume makes in logs >> clustering?? >> > >> > Thank you !! >> > -- >> > Cheers, >> > Mayur. >> > >> > >> > >> >> Hey Mayur, >> >>> >> >>> If you are collecting logs from multiple servers then you can use >> flume >> >>> for the same. >> >>> >> >>> if the contents of the logs are different in format then you can just >> >>> use >> >>> textfileinput format to read and write into any other format you want >> for >> >>> your processing in later part of your projects >> >>> >> >>> first thing you need to learn is how to setup hadoop >> >>> then you can try writing sample hadoop mapreduce jobs to read from >> text >> >>> file and then process them and write the results into another file >> >>> then you can integrate flume as your log collection mechanism >> >>> once you get hold on the system then you can decide more on which >> paths >> >>> you want to follow based on your requirements for storage, compute >> time, >> >>> compute capacity, compression etc >> >>> >> >> -------------- >> >> -------------- >> >> >> >>> Hi, >> >>> >> >>> Please read basics on how hadoop works. >> >>> >> >>> Then start your hands on with map reduce coding. >> >>> >> >>> The tool which has been made for you is flume , but don't see tool >> till >> >>> you complete above two steps. >> >>> >> >>> Good luck , keep us posted. >> >>> >> >>> Regards, >> >>> >> >>> Jagat Singh >> >>> >> >>> ----------- >> >>> Sent from Mobile , short and crisp. >> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> >> wrote: >> >>> >> >>>> Hello, >> >>>> >> >>>> I am new to Hadoop. I am doing a project in cloud in which I >> >>>> >> >>>> have to use hadoop for Map-reduce. It is such that I am going >> >>>> >> >>>> to collect logs from 2-3 machines having different locations. >> >>>> >> >>>> The logs are also in different formats such as .rtf .log .txt >> >>>> >> >>>> Later, I have to collect and convert them to one format and >> >>>> >> >>>> collect to one location. >> >>>> >> >>>> So I am asking which module of Hadoop that I need to study >> >>>> >> >>>> for this implementation?? Or whole framework should I need >> >>>> >> >>>> to study ?? >> >>>> >> >>>> Seeking for guidance, >> >>>> >> >>>> Thank you !! >> > -- *Cheers, Mayur*.
-
Re: [Hadoop-Help]About Map-Reduce implementation
Jean-Marc Spaggiari 2013-03-08, 03:00
Hi Mayur,
Those 3 modes are 3 differents ways to use Hadoop, however, the only production mode here is the fully distributed one. The 2 others are more for local testing. How many nodes are you expecting to use hadoop on?
JM 2013/3/7 Mayur Patil <[EMAIL PROTECTED]>: > Hello, > > Now I am slowly understanding Hadoop working. > > As I want to collect the logs from three machines > > including Master itself . My small query is > > which mode should I implement for this?? > > Standalone Operation > Pseudo-Distributed Operation > Fully-Distributed Operation > > Seeking for guidance, > > Thank you !! > -- > Cheers, > Mayur > > > > >>> Hi mayur, >>> >>> Flume is used for data collection. Pig is used for data processing. >>> For eg, if you have a bunch of servers that you want to collect the >>> logs from and push to HDFS - you would use flume. Now if you need to >>> run some analysis on that data, you could use pig to do that. >>> >>> Sent from my iPhone >>> >>> On Feb 14, 2013, at 1:39 AM, Mayur Patil <[EMAIL PROTECTED]> >>> wrote: >>> >>> > Hello, >>> > >>> > I just read about Pig >>> > >>> >> Pig >>> >> A data flow language and execution environment for exploring very >>> > large datasets. >>> >> Pig runs on HDFS and MapReduce clusters. >>> > >>> > What the actual difference between Pig and Flume makes in logs >>> > clustering?? >>> > >>> > Thank you !! >>> > -- >>> > Cheers, >>> > Mayur. >>> > >>> > >>> > >>> >> Hey Mayur, >>> >>> >>> >>> If you are collecting logs from multiple servers then you can use >>> >>> flume >>> >>> for the same. >>> >>> >>> >>> if the contents of the logs are different in format then you can >>> >>> just >>> >>> use >>> >>> textfileinput format to read and write into any other format you want >>> >>> for >>> >>> your processing in later part of your projects >>> >>> >>> >>> first thing you need to learn is how to setup hadoop >>> >>> then you can try writing sample hadoop mapreduce jobs to read from >>> >>> text >>> >>> file and then process them and write the results into another file >>> >>> then you can integrate flume as your log collection mechanism >>> >>> once you get hold on the system then you can decide more on which >>> >>> paths >>> >>> you want to follow based on your requirements for storage, compute >>> >>> time, >>> >>> compute capacity, compression etc >>> >>> >>> >> -------------- >>> >> -------------- >>> >> >>> >>> Hi, >>> >>> >>> >>> Please read basics on how hadoop works. >>> >>> >>> >>> Then start your hands on with map reduce coding. >>> >>> >>> >>> The tool which has been made for you is flume , but don't see tool >>> >>> till >>> >>> you complete above two steps. >>> >>> >>> >>> Good luck , keep us posted. >>> >>> >>> >>> Regards, >>> >>> >>> >>> Jagat Singh >>> >>> >>> >>> ----------- >>> >>> Sent from Mobile , short and crisp. >>> >>> On 06-Feb-2013 8:32 AM, "Mayur Patil" <[EMAIL PROTECTED]> >>> >>> wrote: >>> >>> >>> >>>> Hello, >>> >>>> >>> >>>> I am new to Hadoop. I am doing a project in cloud in which I >>> >>>> >>> >>>> have to use hadoop for Map-reduce. It is such that I am going >>> >>>> >>> >>>> to collect logs from 2-3 machines having different locations. >>> >>>> >>> >>>> The logs are also in different formats such as .rtf .log .txt >>> >>>> >>> >>>> Later, I have to collect and convert them to one format and >>> >>>> >>> >>>> collect to one location. >>> >>>> >>> >>>> So I am asking which module of Hadoop that I need to study >>> >>>> >>> >>>> for this implementation?? Or whole framework should I need >>> >>>> >>> >>>> to study ?? >>> >>>> >>> >>>> Seeking for guidance, >>> >>>> >>> >>>> Thank you !! > > > > > -- > Cheers, > Mayur.
-
Re: [Hadoop-Help]About Map-Reduce implementation
Mayur Patil 2013-03-09, 15:59
Hello,
Thanks sir for your favourable reply.
I study on my needs more and get more insight as follows:
I have to export logs from two machines to rSyslog server related to Snort and Eucalyptus components.
There are also logs generated related to OS. So,my observations are as follows
1. Now, as per I think I just have to reduce data ( because Hadoop,what I understand, is used to solve problem by assigning
jobs to worker node. In my case, problem data is itself on worker node, so I think I have to process problem data on that
nodes themselves.
2. Now what I realise is I have one Master node and two worker node; one is web server and other is operating system.
Seeking for guidance,
Thank you !!
== I have attached files to understand the scenario. Plz download anyone which you find convenient. * -- Cheers, Mayur.*
|
|