Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How to copy log files from remote windows machine to Hadoop cluster


+
Mahesh Balija 2013-01-17, 10:03
Copy link to this message
-
Re: How to copy log files from remote windows machine to Hadoop cluster
Hi Mirko,

           Thanks for your reply. It works for me as well.
           Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.

Thanks,
Mahesh Balija.
Calsof Labs.

On Thu, Jan 17, 2013 at 5:01 PM, Mirko Kämpf <[EMAIL PROTECTED]> wrote:

> One approach I used in my lab was the "data-gateway",
> which is a small linux box which just mounts Windows Shares
> and a single flume node on the gateway corresponds to the
> HDFS cluster. With tail or periodic log rotation you have control
> over all logfiles, depending on your use case. Either grab all
> incomming data and buffer it in Flume or just move all new data
> during night to the cluster. The gateway also contains sqoop
> and HDFS client if needed.
>
> Mirko
>
>
>
>
> 2013/1/17 Mahesh Balija <[EMAIL PROTECTED]>
>
>> That link talks about just installing Flume on Windows machine (NOT even
>> have configs to push logs to the Hadoop cluster), but what if I have to
>> collect logs from various clients, then I will endup installing in all
>> clients.
>>
>> I have installed Flume successfully on Linux but I have to configure it
>> such a way that it should gather the log files from the remote windows box?
>>
>> Harsh can you throw some light on this?
>>
>>
>> On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq <[EMAIL PROTECTED]>wrote:
>>
>>> Yes. It is possible. I haven't tries windows+flume+hadoop combo
>>> personally, but it should work. You may find this link<http://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.html>useful. Alex
>>> has explained beautifully how to run Flume on a windows box.If I
>>> get time i'll try to simulate your use case and let you know.
>>>
>>> BTW, could you please share with us whatever you have tried??
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> I have studied Flume but I didn't find any thing useful in my case.
>>>> My requirement is there is a directory in Windows machine, in which the
>>>> files will be generated and keep updated with new logs. I want to have a
>>>> tail kind of mechanism (using exec source) through which I can push the
>>>> latest updates into the cluster.
>>>> Or I have to simply push once in a day to the cluster using spooling
>>>> directory mechanism.
>>>>
>>>> Can somebody assist whether it is possible using Flume if so the
>>>> configurations needed for this specific to remote windows machine.
>>>>
>>>> But
>>>>
>>>> On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf <[EMAIL PROTECTED]>wrote:
>>>>
>>>>> Give Flume (http://flume.apache.org/) a chance to collect your data.
>>>>>
>>>>> Mirko
>>>>>
>>>>>
>>>>>
>>>>> 2013/1/17 sirenfei <[EMAIL PROTECTED]>
>>>>>
>>>>>> ftp auto upload?
>>>>>>
>>>>>>
>>>>>> 2013/1/17 Mahesh Balija <[EMAIL PROTECTED]>:
>>>>>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB