Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Can I write to an compressed file which is located in hdfs?


Copy link to this message
-
Re: Can I write to an compressed file which is located in hdfs?
Hi
       I agree with David on the point, you can achieve step 1 of my previous response with flume. ie load real time inflow of data in compressed format into hdfs. You can specify a time interval or data size in flume collector that determines when to flush data on to hdfs.

Regards
Bejoy K S

From handheld, Please excuse typos.

-----Original Message-----
From: David Sinclair <[EMAIL PROTECTED]>
Date: Mon, 6 Feb 2012 09:06:00
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Subject: Re: Can I write to an compressed file which is located in hdfs?

Hi,

You may want to have a look at the Flume project from Cloudera. I use it
for writing data into HDFS.

https://ccp.cloudera.com/display/SUPPORT/Downloads

dave

2012/2/6 Xiaobin She <[EMAIL PROTECTED]>

> hi Bejoy ,
>
> thank you for your reply.
>
> actually I have set up an test cluster which has one namenode/jobtracker
> and two datanode/tasktracker, and I have make an test on this cluster.
>
> I fetch the log file of one of our modules from the log collector machines
> by rsync, and then I use hive command line tool to load this log file into
> the hive warehouse which  simply copy the file from the local filesystem to
> hdfs.
>
> And I have run some analysis on these data with hive, all this run well.
>
> But now I want to avoid the fetch section which use rsync, and write the
> logs into hdfs files directly from the servers which generate these logs.
>
> And it seems easy to do this job if the file locate in the hdfs is not
> compressed.
>
> But how to write or append logs to an file that is compressed and located
> in hdfs?
>
> Is this possible?
>
> Or is this an bad practice?
>
> Thanks!
>
>
>
> 2012/2/6 <[EMAIL PROTECTED]>
>
> > Hi
> >     If you have log files enough to become at least one block size in an
> > hour. You can go ahead as
> > - run a scheduled job every hour that compresses the log files for that
> > hour and stores them on to hdfs (can use LZO or even Snappy to compress)
> > - if your hive does more frequent analysis on this data store it as
> > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a
> > directory - sub dir structure. Once data is in hdfs issue a Alter Table
> Add
> > Partition statement on corresponding hive table.
> > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog
> > Input Format already)
> >
> >
> > Regards
> > Bejoy K S
> >
> > From handheld, Please excuse typos.
> >
> > -----Original Message-----
> > From: Xiaobin She <[EMAIL PROTECTED]>
> > Date: Mon, 6 Feb 2012 16:41:50
> > To: <[EMAIL PROTECTED]>; 佘晓彬<[EMAIL PROTECTED]>
> > Reply-To: [EMAIL PROTECTED]
> > Subject: Re: Can I write to an compressed file which is located in hdfs?
> >
> > sorry, this sentence is wrong,
> >
> > I can't compress these logs every hour and them put them into hdfs.
> >
> > it should be
> >
> > I can  compress these logs every hour and them put them into hdfs.
> >
> >
> >
> >
> > 2012/2/6 Xiaobin She <[EMAIL PROTECTED]>
> >
> > >
> > > hi all,
> > >
> > > I'm testing hadoop and hive, and I want to use them in log analysis.
> > >
> > > Here I have a question, can I write/append log to  an compressed file
> > > which is located in hdfs?
> > >
> > > Our system generate lots of log files every day, I can't compress these
> > > logs every hour and them put them into hdfs.
> > >
> > > But what if I want to write logs into files that was already in the
> hdfs
> > > and was compressed?
> > >
> > > Is these files were not compressed, then this job seems easy, but how
> to
> > > write or append logs into an compressed log?
> > >
> > > Can I do that?
> > >
> > > Can anyone give me some advices or give me some examples?
> > >
> > > Thank you very much!
> > >
> > > xiaobin
> > >
> >
> >
>

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB