|
Felix GV
2013-01-07, 19:21
Neha Narkhede
2013-01-07, 19:29
Felix GV
2013-01-07, 19:32
Matt Lieber
2013-03-14, 21:34
David Arthur
2013-03-15, 03:29
Matthew Rathbone
2013-03-15, 16:20
Craig Lancaster
2013-03-15, 17:44
|
-
Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?Felix GV 2013-01-07, 19:21
Hello all,
I haven't been reading the list for the past couple weeks, I've quite busy... but I've searched and didn't find any discussions related to my current issue, so I thought I'd ask while I'm still investigating on my own...! We've been running a Kafka 0.7.0 cluster without problem for a while now. I've played around<http://felixgv.com/post/88/kafka-distributed-incremental-hadoop-consumer/>with importing data from our Kafka cluster into hadoop a while ago, using the simple Kafka consumer located in the contrib directory of the Kafka source, and that worked properly. At the time, the Hadoop cluster I was running was CDH3u3, IIRC. I'm now revisiting that project with a brand new CDH4.1.2 Hadoop cluster (using MR1, not YARN), and I'm having difficulty getting it to work. At first, the run-class.sh script in kafka/contrib/hadoop-consumer wasn't using the proper hadoop jars to connect to my cluster, so I tweaked it so that it includes the output of the `hadoop classpath` command in its classpath. It's now able to connect to my hadoop cluster, but it's telling me that the versions don't match: Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 3 at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy0.getProtocolVersion(Unknown Source) ... (I could give the whole stacktrace if you want, but I didn't think that's really relevant...) So anyway, I've messed around with the kafka/project/build/KafkaProject.scala file so that it uses the "2.0.0-mr1-cdh4.1.2" version of hadoop-core, and fetches it from the cloudera repo. I've added the cloudera repo by adding this line at the beginning of the HadoopConsumerProject class section: val clouderaRepo = "Cloudera" at " https://repository.cloudera.com/artifactory/cloudera-repos/" When I run ./sbt update, it fetches the new jars correctly, but then, when I run ./sbt package, it's not able to find a bunch of hadoop related classes and packages in the hadoop-consumer code, which I guess means that a few APIs have changed between the two versions of CDH. I've tried this on the 0.7.0 branch of Kafka (from the Apache git repo) as well as on the 0.7.2 branch, and I get the same result on both (I can't successfully run ./sbt package). The easiest for me would be to get it to work on Kafka 0.7.0, but I guess I could persuade my people to upgrade to 0.7.2 if it's necessary (I'd like us to upgrade, but I guess you all know how it is... getting a working system to change is a political hassle). I don't think we'd be willing to move to Kafka 0.8 just yet, so hopefully that won't be necessary. *TLDR: Is anyone pumping data from Kafka 0.7.x to CDH4.x ? And if so, how? Using the example consumer from kafka's contrib, or another one?* Perhaps this one <https://github.com/miniway/kafka-hadoop-consumer>? (I'll probably give it a try soon, BTW, so I'll keep you guys posted...). I may also try porting the hadoop-consumer contrib to CDH4. Finally, I haven't seen anything mentioned about the LinkedIn kafka/avro/hadoop ETL stuff we've been hearing about for a while. I saw the new LinkedIn DataFu stuff but it seems unrelated. Are there any updates about whether or when the ETL code would get open sourced? As far as we're concerned, we're using avro quite a bit, so in our case, the avro coupling would definitely not be an issue. I don't know what version(s) of hadoop LinkedIn is running, though, so perhaps their stuff wouldn't work out of the box with CDH4 either anyway... Any advice would be appreciated! Thanks :) ! -- Felix
-
Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?Neha Narkhede 2013-01-07, 19:29
> Finally, I haven't seen anything mentioned about the LinkedIn
> kafka/avro/hadoop ETL stuff we've been hearing about for a while. > The LinkedIn ETL kafka/avro/hadoop project is open sourced. See here - https://github.com/linkedin/camus/wiki/Camus-Overview Thanks, Neha
-
Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?Felix GV 2013-01-07, 19:32
OOOH that's awesome :D !!
I'll take a look at this shiny stuff right away! Thanks a lot :D !! -- Felix On Mon, Jan 7, 2013 at 2:28 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote: > > Finally, I haven't seen anything mentioned about the LinkedIn > > kafka/avro/hadoop ETL stuff we've been hearing about for a while. > > > > The LinkedIn ETL kafka/avro/hadoop project is open sourced. See here - > https://github.com/linkedin/camus/wiki/Camus-Overview > > Thanks, > Neha >
-
Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?Matt Lieber 2013-03-14, 21:34
Just curious, were you able to make Camus work with CDH4 then ?
Cheers, Matt ________________________________ NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
-
Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?David Arthur 2013-03-15, 03:29
I have used KafkaETLJob to write a job that consumes from Kafka and
writes to HDFS. Kafka version 0.7.2 rc5 and CDH 4.1.2. Is anything in particular not working? -David On 3/14/13 5:31 PM, Matt Lieber wrote: > Just curious, were you able to make Camus work with CDH4 then ? > > Cheers, > Matt > > ________________________________ > > > > > > > NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. >
-
Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?Matthew Rathbone 2013-03-15, 16:20
=david, we use a subset of the KafkaETLJob in cdh4 with great success. Just
make sure to compile your mapreduce against CDH4 On Thu, Mar 14, 2013 at 10:28 PM, David Arthur <[EMAIL PROTECTED]> wrote: > I have used KafkaETLJob to write a job that consumes from Kafka and writes > to HDFS. Kafka version 0.7.2 rc5 and CDH 4.1.2. > > Is anything in particular not working? > > -David > > > On 3/14/13 5:31 PM, Matt Lieber wrote: > >> Just curious, were you able to make Camus work with CDH4 then ? >> >> Cheers, >> Matt >> >> ______________________________**__ >> >> >> >> >> >> >> NOTE: This message may contain information that is confidential, >> proprietary, privileged or otherwise protected by law. The message is >> intended solely for the named addressee. If received in error, please >> destroy and notify the sender. Any use of this email is prohibited when >> received in error. Impetus does not represent, warrant and/or guarantee, >> that the integrity of this communication has been maintained nor that the >> communication is free of errors, virus, interception or interference. >> >> > -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> | 4sq<http://foursquare.com/rathboma>
-
Re: Is anyone able to consume from Kafka 0.7.x and write into Hadoop CDH 4.x ?Craig Lancaster 2013-03-15, 17:44
We're successfully using Camus to move data from Kafka 0.7.x into CDH 4.x.
I didn't hit any particular problems getting that to work, I only had tweaked the pom.xml files. Craig On Fri, Mar 15, 2013 at 12:20 PM, Matthew Rathbone <[EMAIL PROTECTED]>wrote: > =david, we use a subset of the KafkaETLJob in cdh4 with great success. Just > make sure to compile your mapreduce against CDH4 > > > On Thu, Mar 14, 2013 at 10:28 PM, David Arthur <[EMAIL PROTECTED]> wrote: > > > I have used KafkaETLJob to write a job that consumes from Kafka and > writes > > to HDFS. Kafka version 0.7.2 rc5 and CDH 4.1.2. > > > > Is anything in particular not working? > > > > -David > > > > > > On 3/14/13 5:31 PM, Matt Lieber wrote: > > > >> Just curious, were you able to make Camus work with CDH4 then ? > >> > >> Cheers, > >> Matt > >> > >> ______________________________**__ > >> > >> > >> > >> > >> > >> > >> NOTE: This message may contain information that is confidential, > >> proprietary, privileged or otherwise protected by law. The message is > >> intended solely for the named addressee. If received in error, please > >> destroy and notify the sender. Any use of this email is prohibited when > >> received in error. Impetus does not represent, warrant and/or guarantee, > >> that the integrity of this communication has been maintained nor that > the > >> communication is free of errors, virus, interception or interference. > >> > >> > > > > > -- > Matthew Rathbone > Foursquare | Software Engineer | Server Engineering Team > [EMAIL PROTECTED] | @rathboma <http://twitter.com/rathboma> | > 4sq<http://foursquare.com/rathboma> > |