|
Xiaobin She
2012-02-06, 08:40
Xiaobin She
2012-02-06, 08:41
bejoy.hadoop@...
2012-02-06, 11:53
Xiaobin She
2012-02-06, 12:11
David Sinclair
2012-02-06, 14:06
bejoy.hadoop@...
2012-02-06, 14:23
Xiaobin She
2012-02-07, 06:24
bejoy.hadoop@...
2012-02-07, 07:53
Xiaobin She
2012-02-07, 09:11
Adam Brown
2012-02-07, 17:54
Raj Vishwanathan
2012-02-07, 18:06
|
-
Can I write to an compressed file which is located in hdfs?Xiaobin She 2012-02-06, 08:40
hi all,
I'm testing hadoop and hive, and I want to use them in log analysis. Here I have a question, can I write/append log to an compressed file which is located in hdfs? Our system generate lots of log files every day, I can't compress these logs every hour and them put them into hdfs. But what if I want to write logs into files that was already in the hdfs and was compressed? Is these files were not compressed, then this job seems easy, but how to write or append logs into an compressed log? Can I do that? Can anyone give me some advices or give me some examples? Thank you very much! xiaobin
-
Re: Can I write to an compressed file which is located in hdfs?Xiaobin She 2012-02-06, 08:41
sorry, this sentence is wrong,
I can't compress these logs every hour and them put them into hdfs. it should be I can compress these logs every hour and them put them into hdfs. 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > hi all, > > I'm testing hadoop and hive, and I want to use them in log analysis. > > Here I have a question, can I write/append log to an compressed file > which is located in hdfs? > > Our system generate lots of log files every day, I can't compress these > logs every hour and them put them into hdfs. > > But what if I want to write logs into files that was already in the hdfs > and was compressed? > > Is these files were not compressed, then this job seems easy, but how to > write or append logs into an compressed log? > > Can I do that? > > Can anyone give me some advices or give me some examples? > > Thank you very much! > > xiaobin >
-
Re: Can I write to an compressed file which is located in hdfs?bejoy.hadoop@... 2012-02-06, 11:53
Hi
If you have log files enough to become at least one block size in an hour. You can go ahead as - run a scheduled job every hour that compresses the log files for that hour and stores them on to hdfs (can use LZO or even Snappy to compress) - if your hive does more frequent analysis on this data store it as PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a directory - sub dir structure. Once data is in hdfs issue a Alter Table Add Partition statement on corresponding hive table. -in Hive DDL use the appropriate Input format (Hive has some ApacheLog Input Format already) Regards Bejoy K S From handheld, Please excuse typos. -----Original Message----- From: Xiaobin She <[EMAIL PROTECTED]> Date: Mon, 6 Feb 2012 16:41:50 To: <[EMAIL PROTECTED]>; 佘晓彬<[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: Can I write to an compressed file which is located in hdfs? sorry, this sentence is wrong, I can't compress these logs every hour and them put them into hdfs. it should be I can compress these logs every hour and them put them into hdfs. 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > hi all, > > I'm testing hadoop and hive, and I want to use them in log analysis. > > Here I have a question, can I write/append log to an compressed file > which is located in hdfs? > > Our system generate lots of log files every day, I can't compress these > logs every hour and them put them into hdfs. > > But what if I want to write logs into files that was already in the hdfs > and was compressed? > > Is these files were not compressed, then this job seems easy, but how to > write or append logs into an compressed log? > > Can I do that? > > Can anyone give me some advices or give me some examples? > > Thank you very much! > > xiaobin >
-
Re: Can I write to an compressed file which is located in hdfs?Xiaobin She 2012-02-06, 12:11
hi Bejoy ,
thank you for your reply. actually I have set up an test cluster which has one namenode/jobtracker and two datanode/tasktracker, and I have make an test on this cluster. I fetch the log file of one of our modules from the log collector machines by rsync, and then I use hive command line tool to load this log file into the hive warehouse which simply copy the file from the local filesystem to hdfs. And I have run some analysis on these data with hive, all this run well. But now I want to avoid the fetch section which use rsync, and write the logs into hdfs files directly from the servers which generate these logs. And it seems easy to do this job if the file locate in the hdfs is not compressed. But how to write or append logs to an file that is compressed and located in hdfs? Is this possible? Or is this an bad practice? Thanks! 2012/2/6 <[EMAIL PROTECTED]> > Hi > If you have log files enough to become at least one block size in an > hour. You can go ahead as > - run a scheduled job every hour that compresses the log files for that > hour and stores them on to hdfs (can use LZO or even Snappy to compress) > - if your hive does more frequent analysis on this data store it as > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a > directory - sub dir structure. Once data is in hdfs issue a Alter Table Add > Partition statement on corresponding hive table. > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog > Input Format already) > > > Regards > Bejoy K S > > From handheld, Please excuse typos. > > -----Original Message----- > From: Xiaobin She <[EMAIL PROTECTED]> > Date: Mon, 6 Feb 2012 16:41:50 > To: <[EMAIL PROTECTED]>; 佘晓彬<[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > Subject: Re: Can I write to an compressed file which is located in hdfs? > > sorry, this sentence is wrong, > > I can't compress these logs every hour and them put them into hdfs. > > it should be > > I can compress these logs every hour and them put them into hdfs. > > > > > 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > > > > hi all, > > > > I'm testing hadoop and hive, and I want to use them in log analysis. > > > > Here I have a question, can I write/append log to an compressed file > > which is located in hdfs? > > > > Our system generate lots of log files every day, I can't compress these > > logs every hour and them put them into hdfs. > > > > But what if I want to write logs into files that was already in the hdfs > > and was compressed? > > > > Is these files were not compressed, then this job seems easy, but how to > > write or append logs into an compressed log? > > > > Can I do that? > > > > Can anyone give me some advices or give me some examples? > > > > Thank you very much! > > > > xiaobin > > > >
-
Re: Can I write to an compressed file which is located in hdfs?David Sinclair 2012-02-06, 14:06
Hi,
You may want to have a look at the Flume project from Cloudera. I use it for writing data into HDFS. https://ccp.cloudera.com/display/SUPPORT/Downloads dave 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > hi Bejoy , > > thank you for your reply. > > actually I have set up an test cluster which has one namenode/jobtracker > and two datanode/tasktracker, and I have make an test on this cluster. > > I fetch the log file of one of our modules from the log collector machines > by rsync, and then I use hive command line tool to load this log file into > the hive warehouse which simply copy the file from the local filesystem to > hdfs. > > And I have run some analysis on these data with hive, all this run well. > > But now I want to avoid the fetch section which use rsync, and write the > logs into hdfs files directly from the servers which generate these logs. > > And it seems easy to do this job if the file locate in the hdfs is not > compressed. > > But how to write or append logs to an file that is compressed and located > in hdfs? > > Is this possible? > > Or is this an bad practice? > > Thanks! > > > > 2012/2/6 <[EMAIL PROTECTED]> > > > Hi > > If you have log files enough to become at least one block size in an > > hour. You can go ahead as > > - run a scheduled job every hour that compresses the log files for that > > hour and stores them on to hdfs (can use LZO or even Snappy to compress) > > - if your hive does more frequent analysis on this data store it as > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a > > directory - sub dir structure. Once data is in hdfs issue a Alter Table > Add > > Partition statement on corresponding hive table. > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog > > Input Format already) > > > > > > Regards > > Bejoy K S > > > > From handheld, Please excuse typos. > > > > -----Original Message----- > > From: Xiaobin She <[EMAIL PROTECTED]> > > Date: Mon, 6 Feb 2012 16:41:50 > > To: <[EMAIL PROTECTED]>; 佘晓彬<[EMAIL PROTECTED]> > > Reply-To: [EMAIL PROTECTED] > > Subject: Re: Can I write to an compressed file which is located in hdfs? > > > > sorry, this sentence is wrong, > > > > I can't compress these logs every hour and them put them into hdfs. > > > > it should be > > > > I can compress these logs every hour and them put them into hdfs. > > > > > > > > > > 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > > > > > > > hi all, > > > > > > I'm testing hadoop and hive, and I want to use them in log analysis. > > > > > > Here I have a question, can I write/append log to an compressed file > > > which is located in hdfs? > > > > > > Our system generate lots of log files every day, I can't compress these > > > logs every hour and them put them into hdfs. > > > > > > But what if I want to write logs into files that was already in the > hdfs > > > and was compressed? > > > > > > Is these files were not compressed, then this job seems easy, but how > to > > > write or append logs into an compressed log? > > > > > > Can I do that? > > > > > > Can anyone give me some advices or give me some examples? > > > > > > Thank you very much! > > > > > > xiaobin > > > > > > > >
-
Re: Can I write to an compressed file which is located in hdfs?bejoy.hadoop@... 2012-02-06, 14:23
Hi
I agree with David on the point, you can achieve step 1 of my previous response with flume. ie load real time inflow of data in compressed format into hdfs. You can specify a time interval or data size in flume collector that determines when to flush data on to hdfs. Regards Bejoy K S From handheld, Please excuse typos. -----Original Message----- From: David Sinclair <[EMAIL PROTECTED]> Date: Mon, 6 Feb 2012 09:06:00 To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Subject: Re: Can I write to an compressed file which is located in hdfs? Hi, You may want to have a look at the Flume project from Cloudera. I use it for writing data into HDFS. https://ccp.cloudera.com/display/SUPPORT/Downloads dave 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > hi Bejoy , > > thank you for your reply. > > actually I have set up an test cluster which has one namenode/jobtracker > and two datanode/tasktracker, and I have make an test on this cluster. > > I fetch the log file of one of our modules from the log collector machines > by rsync, and then I use hive command line tool to load this log file into > the hive warehouse which simply copy the file from the local filesystem to > hdfs. > > And I have run some analysis on these data with hive, all this run well. > > But now I want to avoid the fetch section which use rsync, and write the > logs into hdfs files directly from the servers which generate these logs. > > And it seems easy to do this job if the file locate in the hdfs is not > compressed. > > But how to write or append logs to an file that is compressed and located > in hdfs? > > Is this possible? > > Or is this an bad practice? > > Thanks! > > > > 2012/2/6 <[EMAIL PROTECTED]> > > > Hi > > If you have log files enough to become at least one block size in an > > hour. You can go ahead as > > - run a scheduled job every hour that compresses the log files for that > > hour and stores them on to hdfs (can use LZO or even Snappy to compress) > > - if your hive does more frequent analysis on this data store it as > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a > > directory - sub dir structure. Once data is in hdfs issue a Alter Table > Add > > Partition statement on corresponding hive table. > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog > > Input Format already) > > > > > > Regards > > Bejoy K S > > > > From handheld, Please excuse typos. > > > > -----Original Message----- > > From: Xiaobin She <[EMAIL PROTECTED]> > > Date: Mon, 6 Feb 2012 16:41:50 > > To: <[EMAIL PROTECTED]>; 佘晓彬<[EMAIL PROTECTED]> > > Reply-To: [EMAIL PROTECTED] > > Subject: Re: Can I write to an compressed file which is located in hdfs? > > > > sorry, this sentence is wrong, > > > > I can't compress these logs every hour and them put them into hdfs. > > > > it should be > > > > I can compress these logs every hour and them put them into hdfs. > > > > > > > > > > 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > > > > > > > hi all, > > > > > > I'm testing hadoop and hive, and I want to use them in log analysis. > > > > > > Here I have a question, can I write/append log to an compressed file > > > which is located in hdfs? > > > > > > Our system generate lots of log files every day, I can't compress these > > > logs every hour and them put them into hdfs. > > > > > > But what if I want to write logs into files that was already in the > hdfs > > > and was compressed? > > > > > > Is these files were not compressed, then this job seems easy, but how > to > > > write or append logs into an compressed log? > > > > > > Can I do that? > > > > > > Can anyone give me some advices or give me some examples? > > > > > > Thank you very much! > > > > > > xiaobin > > > > > > > >
-
Re: Can I write to an compressed file which is located in hdfs?Xiaobin She 2012-02-07, 06:24
hi Bejoy and David,
thank you for you help. So I can't directly write logs or append logs into an compressed file in hdfs, right? Can I compress an file which is already in hdfs and has not been compressed? If I can , how can I do that? Thanks! 2012/2/6 <[EMAIL PROTECTED]> > Hi > I agree with David on the point, you can achieve step 1 of my > previous response with flume. ie load real time inflow of data in > compressed format into hdfs. You can specify a time interval or data size > in flume collector that determines when to flush data on to hdfs. > > Regards > Bejoy K S > > From handheld, Please excuse typos. > > -----Original Message----- > From: David Sinclair <[EMAIL PROTECTED]> > Date: Mon, 6 Feb 2012 09:06:00 > To: <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Subject: Re: Can I write to an compressed file which is located in hdfs? > > Hi, > > You may want to have a look at the Flume project from Cloudera. I use it > for writing data into HDFS. > > https://ccp.cloudera.com/display/SUPPORT/Downloads > > dave > > 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > > hi Bejoy , > > > > thank you for your reply. > > > > actually I have set up an test cluster which has one namenode/jobtracker > > and two datanode/tasktracker, and I have make an test on this cluster. > > > > I fetch the log file of one of our modules from the log collector > machines > > by rsync, and then I use hive command line tool to load this log file > into > > the hive warehouse which simply copy the file from the local filesystem > to > > hdfs. > > > > And I have run some analysis on these data with hive, all this run well. > > > > But now I want to avoid the fetch section which use rsync, and write the > > logs into hdfs files directly from the servers which generate these logs. > > > > And it seems easy to do this job if the file locate in the hdfs is not > > compressed. > > > > But how to write or append logs to an file that is compressed and located > > in hdfs? > > > > Is this possible? > > > > Or is this an bad practice? > > > > Thanks! > > > > > > > > 2012/2/6 <[EMAIL PROTECTED]> > > > > > Hi > > > If you have log files enough to become at least one block size in > an > > > hour. You can go ahead as > > > - run a scheduled job every hour that compresses the log files for that > > > hour and stores them on to hdfs (can use LZO or even Snappy to > compress) > > > - if your hive does more frequent analysis on this data store it as > > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a > > > directory - sub dir structure. Once data is in hdfs issue a Alter Table > > Add > > > Partition statement on corresponding hive table. > > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog > > > Input Format already) > > > > > > > > > Regards > > > Bejoy K S > > > > > > From handheld, Please excuse typos. > > > > > > -----Original Message----- > > > From: Xiaobin She <[EMAIL PROTECTED]> > > > Date: Mon, 6 Feb 2012 16:41:50 > > > To: <[EMAIL PROTECTED]>; 佘晓彬<[EMAIL PROTECTED]> > > > Reply-To: [EMAIL PROTECTED] > > > Subject: Re: Can I write to an compressed file which is located in > hdfs? > > > > > > sorry, this sentence is wrong, > > > > > > I can't compress these logs every hour and them put them into hdfs. > > > > > > it should be > > > > > > I can compress these logs every hour and them put them into hdfs. > > > > > > > > > > > > > > > 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > > > > > > > > > > hi all, > > > > > > > > I'm testing hadoop and hive, and I want to use them in log analysis. > > > > > > > > Here I have a question, can I write/append log to an compressed file > > > > which is located in hdfs? > > > > > > > > Our system generate lots of log files every day, I can't compress > these > > > > logs every hour and them put them into hdfs. > > > > > > > > But what if I want to write logs into files that was already in the
-
Re: Can I write to an compressed file which is located in hdfs?bejoy.hadoop@... 2012-02-07, 07:53
Hi
AFAIK I don't think it is possible to append into a compressed file. If you have files in hdfs on a dir and you need to compress the same (like files for an hour) you can use MapReduce to do that by setting mapred.output.compress = true and mapred.output.compression.codec='theCodecYouPrefer' You'd get the blocks compressed in the output dir. You can use the API to read from standard input like -get hadoop conf -register the required compression codec -write to CompressionOutputStream. You should get a well detailed explanation on the same from the book 'Hadoop - The definitive guide' by Tom White. Regards Bejoy K S From handheld, Please excuse typos. -----Original Message----- From: Xiaobin She <[EMAIL PROTECTED]> Date: Tue, 7 Feb 2012 14:24:01 To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; David Sinclair<[EMAIL PROTECTED]> Subject: Re: Can I write to an compressed file which is located in hdfs? hi Bejoy and David, thank you for you help. So I can't directly write logs or append logs into an compressed file in hdfs, right? Can I compress an file which is already in hdfs and has not been compressed? If I can , how can I do that? Thanks! 2012/2/6 <[EMAIL PROTECTED]> > Hi > I agree with David on the point, you can achieve step 1 of my > previous response with flume. ie load real time inflow of data in > compressed format into hdfs. You can specify a time interval or data size > in flume collector that determines when to flush data on to hdfs. > > Regards > Bejoy K S > > From handheld, Please excuse typos. > > -----Original Message----- > From: David Sinclair <[EMAIL PROTECTED]> > Date: Mon, 6 Feb 2012 09:06:00 > To: <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Subject: Re: Can I write to an compressed file which is located in hdfs? > > Hi, > > You may want to have a look at the Flume project from Cloudera. I use it > for writing data into HDFS. > > https://ccp.cloudera.com/display/SUPPORT/Downloads > > dave > > 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > > > hi Bejoy , > > > > thank you for your reply. > > > > actually I have set up an test cluster which has one namenode/jobtracker > > and two datanode/tasktracker, and I have make an test on this cluster. > > > > I fetch the log file of one of our modules from the log collector > machines > > by rsync, and then I use hive command line tool to load this log file > into > > the hive warehouse which simply copy the file from the local filesystem > to > > hdfs. > > > > And I have run some analysis on these data with hive, all this run well. > > > > But now I want to avoid the fetch section which use rsync, and write the > > logs into hdfs files directly from the servers which generate these logs. > > > > And it seems easy to do this job if the file locate in the hdfs is not > > compressed. > > > > But how to write or append logs to an file that is compressed and located > > in hdfs? > > > > Is this possible? > > > > Or is this an bad practice? > > > > Thanks! > > > > > > > > 2012/2/6 <[EMAIL PROTECTED]> > > > > > Hi > > > If you have log files enough to become at least one block size in > an > > > hour. You can go ahead as > > > - run a scheduled job every hour that compresses the log files for that > > > hour and stores them on to hdfs (can use LZO or even Snappy to > compress) > > > - if your hive does more frequent analysis on this data store it as > > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a > > > directory - sub dir structure. Once data is in hdfs issue a Alter Table > > Add > > > Partition statement on corresponding hive table. > > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog > > > Input Format already) > > > > > > > > > Regards > > > Bejoy K S > > > > > > From handheld, Please excuse typos. > > > > > > -----Original Message----- > > > From: Xiaobin She <[EMAIL PROTECTED]> > > > Date: Mon, 6 Feb 2012 16:41:50
-
Re: Can I write to an compressed file which is located in hdfs?Xiaobin She 2012-02-07, 09:11
thank you Bejoy, I will look at that book.
Thanks again! 2012/2/7 <[EMAIL PROTECTED]> > ** > Hi > AFAIK I don't think it is possible to append into a compressed file. > > If you have files in hdfs on a dir and you need to compress the same (like > files for an hour) you can use MapReduce to do that by setting > mapred.output.compress = true and > mapred.output.compression.codec='theCodecYouPrefer' > You'd get the blocks compressed in the output dir. > > You can use the API to read from standard input like > -get hadoop conf > -register the required compression codec > -write to CompressionOutputStream. > > You should get a well detailed explanation on the same from the book > 'Hadoop - The definitive guide' by Tom White. > Regards > Bejoy K S > > From handheld, Please excuse typos. > ------------------------------ > *From: * Xiaobin She <[EMAIL PROTECTED]> > *Date: *Tue, 7 Feb 2012 14:24:01 +0800 > *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; David > Sinclair<[EMAIL PROTECTED]> > *Subject: *Re: Can I write to an compressed file which is located in hdfs? > > hi Bejoy and David, > > thank you for you help. > > So I can't directly write logs or append logs into an compressed file in > hdfs, right? > > Can I compress an file which is already in hdfs and has not been > compressed? > > If I can , how can I do that? > > Thanks! > > > > 2012/2/6 <[EMAIL PROTECTED]> > >> Hi >> I agree with David on the point, you can achieve step 1 of my >> previous response with flume. ie load real time inflow of data in >> compressed format into hdfs. You can specify a time interval or data size >> in flume collector that determines when to flush data on to hdfs. >> >> Regards >> Bejoy K S >> >> From handheld, Please excuse typos. >> >> -----Original Message----- >> From: David Sinclair <[EMAIL PROTECTED]> >> Date: Mon, 6 Feb 2012 09:06:00 >> To: <[EMAIL PROTECTED]> >> Cc: <[EMAIL PROTECTED]> >> Subject: Re: Can I write to an compressed file which is located in hdfs? >> >> Hi, >> >> You may want to have a look at the Flume project from Cloudera. I use it >> for writing data into HDFS. >> >> https://ccp.cloudera.com/display/SUPPORT/Downloads >> >> dave >> >> 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> >> >> > hi Bejoy , >> > >> > thank you for your reply. >> > >> > actually I have set up an test cluster which has one namenode/jobtracker >> > and two datanode/tasktracker, and I have make an test on this cluster. >> > >> > I fetch the log file of one of our modules from the log collector >> machines >> > by rsync, and then I use hive command line tool to load this log file >> into >> > the hive warehouse which simply copy the file from the local >> filesystem to >> > hdfs. >> > >> > And I have run some analysis on these data with hive, all this run well. >> > >> > But now I want to avoid the fetch section which use rsync, and write the >> > logs into hdfs files directly from the servers which generate these >> logs. >> > >> > And it seems easy to do this job if the file locate in the hdfs is not >> > compressed. >> > >> > But how to write or append logs to an file that is compressed and >> located >> > in hdfs? >> > >> > Is this possible? >> > >> > Or is this an bad practice? >> > >> > Thanks! >> > >> > >> > >> > 2012/2/6 <[EMAIL PROTECTED]> >> > >> > > Hi >> > > If you have log files enough to become at least one block size in >> an >> > > hour. You can go ahead as >> > > - run a scheduled job every hour that compresses the log files for >> that >> > > hour and stores them on to hdfs (can use LZO or even Snappy to >> compress) >> > > - if your hive does more frequent analysis on this data store it as >> > > PARTITIONED BY (Date,Hour) . While loading into hdfs also follow a >> > > directory - sub dir structure. Once data is in hdfs issue a Alter >> Table >> > Add >> > > Partition statement on corresponding hive table. >> > > -in Hive DDL use the appropriate Input format (Hive has some ApacheLog
-
Re: Can I write to an compressed file which is located in hdfs?Adam Brown 2012-02-07, 17:54
Hi Xiobin,
what build of hadoop are you using, and what type of compression is being used? thanks, 2012/2/7 Xiaobin She <[EMAIL PROTECTED]> > thank you Bejoy, I will look at that book. > > Thanks again! > > > > 2012/2/7 <[EMAIL PROTECTED]> > > > ** > > Hi > > AFAIK I don't think it is possible to append into a compressed file. > > > > If you have files in hdfs on a dir and you need to compress the same > (like > > files for an hour) you can use MapReduce to do that by setting > > mapred.output.compress = true and > > mapred.output.compression.codec='theCodecYouPrefer' > > You'd get the blocks compressed in the output dir. > > > > You can use the API to read from standard input like > > -get hadoop conf > > -register the required compression codec > > -write to CompressionOutputStream. > > > > You should get a well detailed explanation on the same from the book > > 'Hadoop - The definitive guide' by Tom White. > > Regards > > Bejoy K S > > > > From handheld, Please excuse typos. > > ------------------------------ > > *From: * Xiaobin She <[EMAIL PROTECTED]> > > *Date: *Tue, 7 Feb 2012 14:24:01 +0800 > > *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; David > > Sinclair<[EMAIL PROTECTED]> > > *Subject: *Re: Can I write to an compressed file which is located in > hdfs? > > > > hi Bejoy and David, > > > > thank you for you help. > > > > So I can't directly write logs or append logs into an compressed file in > > hdfs, right? > > > > Can I compress an file which is already in hdfs and has not been > > compressed? > > > > If I can , how can I do that? > > > > Thanks! > > > > > > > > 2012/2/6 <[EMAIL PROTECTED]> > > > >> Hi > >> I agree with David on the point, you can achieve step 1 of my > >> previous response with flume. ie load real time inflow of data in > >> compressed format into hdfs. You can specify a time interval or data > size > >> in flume collector that determines when to flush data on to hdfs. > >> > >> Regards > >> Bejoy K S > >> > >> From handheld, Please excuse typos. > >> > >> -----Original Message----- > >> From: David Sinclair <[EMAIL PROTECTED]> > >> Date: Mon, 6 Feb 2012 09:06:00 > >> To: <[EMAIL PROTECTED]> > >> Cc: <[EMAIL PROTECTED]> > >> Subject: Re: Can I write to an compressed file which is located in hdfs? > >> > >> Hi, > >> > >> You may want to have a look at the Flume project from Cloudera. I use it > >> for writing data into HDFS. > >> > >> https://ccp.cloudera.com/display/SUPPORT/Downloads > >> > >> dave > >> > >> 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> > >> > >> > hi Bejoy , > >> > > >> > thank you for your reply. > >> > > >> > actually I have set up an test cluster which has one > namenode/jobtracker > >> > and two datanode/tasktracker, and I have make an test on this cluster. > >> > > >> > I fetch the log file of one of our modules from the log collector > >> machines > >> > by rsync, and then I use hive command line tool to load this log file > >> into > >> > the hive warehouse which simply copy the file from the local > >> filesystem to > >> > hdfs. > >> > > >> > And I have run some analysis on these data with hive, all this run > well. > >> > > >> > But now I want to avoid the fetch section which use rsync, and write > the > >> > logs into hdfs files directly from the servers which generate these > >> logs. > >> > > >> > And it seems easy to do this job if the file locate in the hdfs is not > >> > compressed. > >> > > >> > But how to write or append logs to an file that is compressed and > >> located > >> > in hdfs? > >> > > >> > Is this possible? > >> > > >> > Or is this an bad practice? > >> > > >> > Thanks! > >> > > >> > > >> > > >> > 2012/2/6 <[EMAIL PROTECTED]> > >> > > >> > > Hi > >> > > If you have log files enough to become at least one block size > in > >> an > >> > > hour. You can go ahead as > >> > > - run a scheduled job every hour that compresses the log files for > >> that > >> > > hour and stores them on to hdfs (can use LZO or even Snappy to Adam Brown Enablement Engineer Hortonworks <http://www.hadoopsummit.org/>
-
Re: Can I write to an compressed file which is located in hdfs?Raj Vishwanathan 2012-02-07, 18:06
Hi
Here is a piece of code that does the reverse of what you want; it takes a bunch of compressed files ( gzip, in this case ) and converts them to text. You can tweak the code to do the reverse http://pastebin.com/mBHVHtrm Raj >________________________________ > From: Xiaobin She <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Cc: [EMAIL PROTECTED]; David Sinclair <[EMAIL PROTECTED]> >Sent: Tuesday, February 7, 2012 1:11 AM >Subject: Re: Can I write to an compressed file which is located in hdfs? > >thank you Bejoy, I will look at that book. > >Thanks again! > > > >2012/2/7 <[EMAIL PROTECTED]> > >> ** >> Hi >> AFAIK I don't think it is possible to append into a compressed file. >> >> If you have files in hdfs on a dir and you need to compress the same (like >> files for an hour) you can use MapReduce to do that by setting >> mapred.output.compress = true and >> mapred.output.compression.codec='theCodecYouPrefer' >> You'd get the blocks compressed in the output dir. >> >> You can use the API to read from standard input like >> -get hadoop conf >> -register the required compression codec >> -write to CompressionOutputStream. >> >> You should get a well detailed explanation on the same from the book >> 'Hadoop - The definitive guide' by Tom White. >> Regards >> Bejoy K S >> >> From handheld, Please excuse typos. >> ------------------------------ >> *From: * Xiaobin She <[EMAIL PROTECTED]> >> *Date: *Tue, 7 Feb 2012 14:24:01 +0800 >> *To: *<[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; David >> Sinclair<[EMAIL PROTECTED]> >> *Subject: *Re: Can I write to an compressed file which is located in hdfs? >> >> hi Bejoy and David, >> >> thank you for you help. >> >> So I can't directly write logs or append logs into an compressed file in >> hdfs, right? >> >> Can I compress an file which is already in hdfs and has not been >> compressed? >> >> If I can , how can I do that? >> >> Thanks! >> >> >> >> 2012/2/6 <[EMAIL PROTECTED]> >> >>> Hi >>> I agree with David on the point, you can achieve step 1 of my >>> previous response with flume. ie load real time inflow of data in >>> compressed format into hdfs. You can specify a time interval or data size >>> in flume collector that determines when to flush data on to hdfs. >>> >>> Regards >>> Bejoy K S >>> >>> From handheld, Please excuse typos. >>> >>> -----Original Message----- >>> From: David Sinclair <[EMAIL PROTECTED]> >>> Date: Mon, 6 Feb 2012 09:06:00 >>> To: <[EMAIL PROTECTED]> >>> Cc: <[EMAIL PROTECTED]> >>> Subject: Re: Can I write to an compressed file which is located in hdfs? >>> >>> Hi, >>> >>> You may want to have a look at the Flume project from Cloudera. I use it >>> for writing data into HDFS. >>> >>> https://ccp.cloudera.com/display/SUPPORT/Downloads >>> >>> dave >>> >>> 2012/2/6 Xiaobin She <[EMAIL PROTECTED]> >>> >>> > hi Bejoy , >>> > >>> > thank you for your reply. >>> > >>> > actually I have set up an test cluster which has one namenode/jobtracker >>> > and two datanode/tasktracker, and I have make an test on this cluster. >>> > >>> > I fetch the log file of one of our modules from the log collector >>> machines >>> > by rsync, and then I use hive command line tool to load this log file >>> into >>> > the hive warehouse which simply copy the file from the local >>> filesystem to >>> > hdfs. >>> > >>> > And I have run some analysis on these data with hive, all this run well. >>> > >>> > But now I want to avoid the fetch section which use rsync, and write the >>> > logs into hdfs files directly from the servers which generate these >>> logs. >>> > >>> > And it seems easy to do this job if the file locate in the hdfs is not >>> > compressed. >>> > >>> > But how to write or append logs to an file that is compressed and >>> located >>> > in hdfs? >>> > >>> > Is this possible? >>> > >>> > Or is this an bad practice? >>> > >>> > Thanks! >>> > >>> > >>> > >>> > 2012/2/6 <[EMAIL PROTECTED]> |