Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Pig + Cassandra Example !!


Copy link to this message
-
Re: Pig + Cassandra Example !!
So yes, this is just a copy&paste baseline for what someone could use to go
to/from Cassandra; the idea here is that you do require the correct
dependencies in the /lib directory, but also the correct aliasing of
exports and modifications to the config files on master and slaves.
 Hopefully this is a good starting point for you.

Good luck!

-Dan

On Mon, Mar 18, 2013 at 1:29 PM, Mohammed Abdelkhalek <
[EMAIL PROTECTED]> wrote:

> Thank you.
> i'll try it !
>
>
> 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
>
> > Also,
> >
> > // ruby script modified for cassandra, from amazon
> >
> > #!/usr/bin/ruby
> > require 'hpricot'
> > require 'tempfile'
> >
> > CONFIG_HEADER = "<?xml version=\"1.0\"?>\n<?xml-stylesheet
> > type=\"text/xsl\" href=\"configuration.xsl\"?>"
> >
> > def parse_config_file(config_file_path)
> >   ret = []
> >   if File.exist?(config_file_path) then
> >     doc = open(config_file_path) { |f| Hpricot(f) }
> >     (doc/"configuration"/"property").each do |property|
> >       val = {:name => (property/"name").inner_html, :value =>
> > (property/"value").inner_html }
> >       if (property/"final").inner_html != "" then
> >         val[:final] =  (property/"final").inner_html
> >       end
> >       ret << val
> >     end
> >   else
> >     puts "#{config_file_path} does not exist, assuming empty
> configuration"
> >   end
> >   return ret
> > end
> >
> > def dump_config_file(file_name, config)
> >   open(file_name, 'w') do |f|
> >     f.puts CONFIG_HEADER
> >     f.puts '<configuration>'
> >     for entry in config
> >       f.print "
> >  <property><name>#{entry[:name]}</name><value>#{entry[:value]}</value>"
> >       if entry[:final] then
> >         f.print "<final>#{entry[:final]}</final>"
> >       end
> >       f.puts '</property>'
> >     end
> >     f.puts '</configuration>'
> >   end
> > end
> >
> > def merge_config(default, overwrite)
> >   for entry in overwrite
> >     cells = default.select { |x| x[:name] == entry[:name]}
> >     if cells.size == 0 then
> >       puts "'#{entry[:name]}': default does not have key, appending value
> > '#{entry[:value]}'"
> >       default << entry
> >     elsif cells.size == 1 then
> >       puts "'#{entry[:name]}': new value '#{entry[:value]}' overwriting
> > '#{cells[0][:value]}'"
> >       cells[0].replace(entry)
> >     else
> >       raise "'#{entry[:name]}': default has #{cells.size} keys"
> >     end
> >   end
> > end
> >
> > def add_cassandra_settings()
> >   file = "/home/hadoop/conf/mapred-site.xml"
> >   default = parse_config_file(file)
> >   merge_config(default,[{:name => "cassandra.thrift.address", :value =>
> > "THISIPADDRESS" }])
> >   merge_config(default,[{:name => "cassandra.input.thrift.address",
> :value
> > => "THISIPADDRESS" }])
> >   merge_config(default,[{:name => "cassandra.output.thrift.address",
> :value
> > => "THISIPADDRESS" }])
> >   merge_config(default,[{:name => "cassandra.thrift.port", :value =>
> "9160"
> > }])
> >   merge_config(default,[{:name => "cassandra.input.thrift.port", :value
> =>
> > "9160" }])
> >   merge_config(default,[{:name => "cassandra.output.thrift.port", :value
> =>
> > "9160" }])
> >   merge_config(default,[{:name => "cassandra.partitioner.class", :value
> =>
> > "org.apache.cassandra.dht.RandomPartitioner" }])
> >   merge_config(default,[{:name => "cassandra.input.partitioner.class",
> > :value => "org.apache.cassandra.dht.RandomPartitioner" }])
> >   merge_config(default,[{:name => "cassandra.output.partitioner.class",
> > :value => "org.apache.cassandra.dht.RandomPartitioner" }])
> >   dump_config_file(file + ".new", default)
> >   if File.exist?(file) then
> >     File.rename(file, file + ".old")
> >   end
> >   File.rename(file + ".new", file)
> >   puts "Saved #{file} with overwrites. Original saved to #{file}.old"
> > end
> >
> > def warn(msg)
> >   STDERR.puts "#{Time.now.utc} WARN " + msg
> > end
> >
> > add_cassandra_settings()
> >
> >
> > On Mon, Mar 18, 2013 at 1:24 PM, Dan DeCapria, CivicScience <
Dan DeCapria
CivicScience, Inc.
Senior Informatics / DM / ML / BI Specialist