Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Pig + Cassandra Example !!


+
Mohammed Abdelkhalek 2013-03-18, 15:15
+
Dan DeCapria, CivicScienc... 2013-03-18, 15:44
+
Mohammed Abdelkhalek 2013-03-18, 15:56
+
Dan DeCapria, CivicScienc... 2013-03-18, 16:00
+
Mohammed Abdelkhalek 2013-03-18, 16:20
+
Dan DeCapria, CivicScienc... 2013-03-18, 16:48
+
Mohammed Abdelkhalek 2013-03-18, 16:57
+
Dan DeCapria, CivicScienc... 2013-03-18, 17:24
+
Dan DeCapria, CivicScienc... 2013-03-18, 17:27
+
Mohammed Abdelkhalek 2013-03-18, 17:29
+
Dan DeCapria, CivicScienc... 2013-03-18, 17:36
Copy link to this message
-
Re: Pig + Cassandra Example !!
Here's an example:
http://hortonworks.com/blog/pig-as-duct-tape-part-three-tf-idf-topics-with-cassandra-python-streaming-and-flask/
On Mon, Mar 18, 2013 at 10:36 AM, Dan DeCapria, CivicScience <
[EMAIL PROTECTED]> wrote:

> So yes, this is just a copy&paste baseline for what someone could use to go
> to/from Cassandra; the idea here is that you do require the correct
> dependencies in the /lib directory, but also the correct aliasing of
> exports and modifications to the config files on master and slaves.
>  Hopefully this is a good starting point for you.
>
> Good luck!
>
> -Dan
>
> On Mon, Mar 18, 2013 at 1:29 PM, Mohammed Abdelkhalek <
> [EMAIL PROTECTED]> wrote:
>
> > Thank you.
> > i'll try it !
> >
> >
> > 2013/3/18 Dan DeCapria, CivicScience <[EMAIL PROTECTED]>
> >
> > > Also,
> > >
> > > // ruby script modified for cassandra, from amazon
> > >
> > > #!/usr/bin/ruby
> > > require 'hpricot'
> > > require 'tempfile'
> > >
> > > CONFIG_HEADER = "<?xml version=\"1.0\"?>\n<?xml-stylesheet
> > > type=\"text/xsl\" href=\"configuration.xsl\"?>"
> > >
> > > def parse_config_file(config_file_path)
> > >   ret = []
> > >   if File.exist?(config_file_path) then
> > >     doc = open(config_file_path) { |f| Hpricot(f) }
> > >     (doc/"configuration"/"property").each do |property|
> > >       val = {:name => (property/"name").inner_html, :value =>
> > > (property/"value").inner_html }
> > >       if (property/"final").inner_html != "" then
> > >         val[:final] =  (property/"final").inner_html
> > >       end
> > >       ret << val
> > >     end
> > >   else
> > >     puts "#{config_file_path} does not exist, assuming empty
> > configuration"
> > >   end
> > >   return ret
> > > end
> > >
> > > def dump_config_file(file_name, config)
> > >   open(file_name, 'w') do |f|
> > >     f.puts CONFIG_HEADER
> > >     f.puts '<configuration>'
> > >     for entry in config
> > >       f.print "
> > >  <property><name>#{entry[:name]}</name><value>#{entry[:value]}</value>"
> > >       if entry[:final] then
> > >         f.print "<final>#{entry[:final]}</final>"
> > >       end
> > >       f.puts '</property>'
> > >     end
> > >     f.puts '</configuration>'
> > >   end
> > > end
> > >
> > > def merge_config(default, overwrite)
> > >   for entry in overwrite
> > >     cells = default.select { |x| x[:name] == entry[:name]}
> > >     if cells.size == 0 then
> > >       puts "'#{entry[:name]}': default does not have key, appending
> value
> > > '#{entry[:value]}'"
> > >       default << entry
> > >     elsif cells.size == 1 then
> > >       puts "'#{entry[:name]}': new value '#{entry[:value]}' overwriting
> > > '#{cells[0][:value]}'"
> > >       cells[0].replace(entry)
> > >     else
> > >       raise "'#{entry[:name]}': default has #{cells.size} keys"
> > >     end
> > >   end
> > > end
> > >
> > > def add_cassandra_settings()
> > >   file = "/home/hadoop/conf/mapred-site.xml"
> > >   default = parse_config_file(file)
> > >   merge_config(default,[{:name => "cassandra.thrift.address", :value =>
> > > "THISIPADDRESS" }])
> > >   merge_config(default,[{:name => "cassandra.input.thrift.address",
> > :value
> > > => "THISIPADDRESS" }])
> > >   merge_config(default,[{:name => "cassandra.output.thrift.address",
> > :value
> > > => "THISIPADDRESS" }])
> > >   merge_config(default,[{:name => "cassandra.thrift.port", :value =>
> > "9160"
> > > }])
> > >   merge_config(default,[{:name => "cassandra.input.thrift.port", :value
> > =>
> > > "9160" }])
> > >   merge_config(default,[{:name => "cassandra.output.thrift.port",
> :value
> > =>
> > > "9160" }])
> > >   merge_config(default,[{:name => "cassandra.partitioner.class", :value
> > =>
> > > "org.apache.cassandra.dht.RandomPartitioner" }])
> > >   merge_config(default,[{:name => "cassandra.input.partitioner.class",
> > > :value => "org.apache.cassandra.dht.RandomPartitioner" }])
> > >   merge_config(default,[{:name => "cassandra.output.partitioner.class",
> > > :value => "org.apache.cassandra.dht.RandomPartitioner" }])

Russell Jurney twitter.com/rjurney [EMAIL PROTECTED] datasyndrome.com