are you using hive to just convert your text files to sequence files?
If thats the case then you may want to look at the purpose why hive was
If you want to modify data or process data which does not involve any kind
of analytics functions on a routine basis.
If you want to do a data manipulation or enrichment and do not want to code
a lot of map reduce job, you can take a look at pig scripts.
basically what you want to do is generate an UUID for each of your tweet
and then feed it to mahout algorithms.
Sorry if I understood it wrong or it sounds rude.