Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # dev >> Questions re integrating Avro into Cascading process


+
Ken Krugler 2010-04-15, 17:33
+
Scott Carey 2010-04-16, 18:04
+
Ken Krugler 2010-04-16, 18:20
+
Scott Carey 2010-04-16, 18:28
+
Ken Krugler 2010-04-18, 14:49
+
Doug Cutting 2010-04-21, 22:22
+
Ken Krugler 2010-04-23, 04:40
+
Doug Cutting 2010-04-23, 19:33
Copy link to this message
-
Re: Questions re integrating Avro into Cascading process

On Apr 23, 2010, at 12:33pm, Doug Cutting wrote:

> Ken Krugler wrote:
>> 1. I'm assuming there's no compelling reason to read the file  
>> headers - in fact, not sure how you'd even get at the data, much  
>> less how you'd deal with potentially partial/missing data from a  
>> set of Avro files being read as part files.
>
> I'm not sure what you're asking here.

Sorry, I should have been clearer.

I was thinking about the read side of things, when using the Cascading  
Scheme to pull data from Avro files. If these files have metadata,  
there's no good way to get at it via the Cascading interface, and  
given that a directory will typically contain a set of part-xxxxx  
files, it didn't seem like you could do much with the results in any  
case. So just checking to make sure I wasn't overlooking something.

>> 2. We'd like to not include Avro source in the Cascading scheme  
>> project, but rather just have a dependency on the Avro jar.
>> We have a similar relationship between Bixo and Tika, and what's  
>> worked well is for the Bixo master branch to have a dependency on  
>> the Tika snapshot builds, so we can quickly iterate on both projects.
>> So are there plans to start pushing Avro snapshot builds to the  
>> Apache snapshots repository? I see occasional Avro releases to the  
>> Maven central repo (1.0, 1.2, 1.3.2) but nothing for snapshots.
>
> I'm okay if someone wants to, e.g., configure a nightly Hudson build  
> that pushes out an Avro snapshot jar.  Apache releases should not  
> depend on snapshots, but snapshots are useful for development.
>
> Avro's build.xml already includes a task to post a snapshot jar.  I  
> tested it once, which accounts for the single Avro snapshot that  
> exists.  So it should be simple to configure Hudson to do this.  
> Philip was going to setup Hudson builds for Avro.  Philip?

That would be great, thanks!

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB