Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig + Cassandra Example !!


+
Mohammed Abdelkhalek 2013-03-18, 15:15
+
Dan DeCapria, CivicScienc... 2013-03-18, 15:44
+
Mohammed Abdelkhalek 2013-03-18, 15:56
+
Dan DeCapria, CivicScienc... 2013-03-18, 16:00
+
Mohammed Abdelkhalek 2013-03-18, 16:20
+
Dan DeCapria, CivicScienc... 2013-03-18, 16:48
+
Mohammed Abdelkhalek 2013-03-18, 16:57
+
Dan DeCapria, CivicScienc... 2013-03-18, 17:24
+
Dan DeCapria, CivicScienc... 2013-03-18, 17:27
+
Mohammed Abdelkhalek 2013-03-18, 17:29
+
Dan DeCapria, CivicScienc... 2013-03-18, 17:36
Copy link to this message
-
Re: Pig + Cassandra Example !!
Here's an example:
http://hortonworks.com/blog/pig-as-duct-tape-part-three-tf-idf-topics-with-cassandra-python-streaming-and-flask/
On Mon, Mar 18, 2013 at 10:36 AM, Dan DeCapria, CivicScience <
[EMAIL PROTECTED]> wrote:

> So yes, this is just a copy&paste baseline for what someone could use to go
> to/from Cassandra; the idea here is that you do require the correct
> dependencies in the /lib directory, but also the correct aliasing of
> exports and modifications to the config files on master and slaves.
>  Hopefully this is a good starting point for you.
>
> Good luck!
>
> -Dan
>
> On Mon, Mar 18, 2013 at 1:29 PM, Mohammed Abdelkhalek <
> [EMAIL PROTECTED]> wrote:
>
> > Thank you.
> > i'll try it !
> >
> >
> > 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> >
> > > Also,
> > >
> > > // ruby script modified for cassandra, from amazon
> > >
> > > #!/usr/bin/ruby
> > > require 'hpricot'
> > > require 'tempfile'
> > >
> > > CONFIG_HEADER = "<?xml version=\"1.0\"?>\n<?xml-stylesheet
> > > type=\"text/xsl\" href=\"configuration.xsl\"?>"
> > >
> > > def parse_config_file(config_file_path)
> > >   ret = []
> > >   if File.exist?(config_file_path) then
> > >     doc = open(config_file_path) { |f| Hpricot(f) }
> > >     (doc/"configuration"/"property").each do |property|
> > >       val = {:name => (property/"name").inner_html, :value =>
> > > (property/"value").inner_html }
> > >       if (property/"final").inner_html != "" then
> > >         val[:final] =  (property/"final").inner_html
> > >       end
> > >       ret << val
> > >     end
> > >   else
> > >     puts "#{config_file_path} does not exist, assuming empty
> > configuration"
> > >   end
> > >   return ret
> > > end
> > >
> > > def dump_config_file(file_name, config)
> > >   open(file_name, 'w') do |f|
> > >     f.puts CONFIG_HEADER
> > >     f.puts '<configuration>'
> > >     for entry in config
> > >       f.print "
> > >  <property><name>#{entry[:name]}</name><value>#{entry[:value]}</value>"
> > >       if entry[:final] then
> > >         f.print "<final>#{entry[:final]}</final>"
> > >       end
> > >       f.puts '</property>'
> > >     end
> > >     f.puts '</configuration>'
> > >   end
> > > end
> > >
> > > def merge_config(default, overwrite)
> > >   for entry in overwrite
> > >     cells = default.select { |x| x[:name] == entry[:name]}
> > >     if cells.size == 0 then
> > >       puts "'#{entry[:name]}': default does not have key, appending
> value
> > > '#{entry[:value]}'"
> > >       default << entry
> > >     elsif cells.size == 1 then
> > >       puts "'#{entry[:name]}': new value '#{entry[:value]}' overwriting
> > > '#{cells[0][:value]}'"
> > >       cells[0].replace(entry)
> > >     else
> > >       raise "'#{entry[:name]}': default has #{cells.size} keys"
> > >     end
> > >   end
> > > end
> > >
> > > def add_cassandra_settings()
> > >   file = "/home/hadoop/conf/mapred-site.xml"
> > >   default = parse_config_file(file)
> > >   merge_config(default,[{:name => "cassandra.thrift.address", :value =>
> > > "THISIPADDRESS" }])
> > >   merge_config(default,[{:name => "cassandra.input.thrift.address",
> > :value
> > > => "THISIPADDRESS" }])
> > >   merge_config(default,[{:name => "cassandra.output.thrift.address",
> > :value
> > > => "THISIPADDRESS" }])
> > >   merge_config(default,[{:name => "cassandra.thrift.port", :value =>
> > "9160"
> > > }])
> > >   merge_config(default,[{:name => "cassandra.input.thrift.port", :value
> > =>
> > > "9160" }])
> > >   merge_config(default,[{:name => "cassandra.output.thrift.port",
> :value
> > =>
> > > "9160" }])
> > >   merge_config(default,[{:name => "cassandra.partitioner.class", :value
> > =>
> > > "org.apache.cassandra.dht.RandomPartitioner" }])
> > >   merge_config(default,[{:name => "cassandra.input.partitioner.class",
> > > :value => "org.apache.cassandra.dht.RandomPartitioner" }])
> > >   merge_config(default,[{:name => "cassandra.output.partitioner.class",
> > > :value => "org.apache.cassandra.dht.RandomPartitioner" }])

Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB