Wrapping hive around existing csv files consists of manually naming and typing every column during the creation command. I have several csv tables and some of them have a ton of columns. I would love a way to create hive tables which automatically infers the column types by attempting various type conversions or regex matches on the data (say the first row). What would be even cooler is if the first row could actually be interpreted differently from the rest of the table...as a set of string labels to name the columns while the types could be automatically inferred from, say, the *second* row. These csv files are currently of this format, with the first row naming the columns.
Does this make sense?
Now, I'm sure that hive doesn't support this yet -- and I admit it is a somewhat esoteric desire on my part -- but I'm curious how others would suggest approaching it? I'm thinking of writing a separate isolated program that reads the first two rows of a csv file and dumps a text string of column names and types in the correct syntax for a hive external table creation statement which I would then copy/paste into hive...I was just hoping for a simpler solution.
Keith Wiley [EMAIL PROTECTED] keithwiley.com music.keithwiley.com
"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
-- Keith Wiley