Hive reads and writes to HDFSŠand by definition HDFS is write once and
immutable after that.
So like an RDBMS there is no concept of an update rows.
However if u want to delete some records based on a criteria, yesterday
there was a smart post about it, basically selecting the inverse and doing
an INSERT OVERWRITE on the table
INSERT OVERWRITE TABLE will write to the Hive managed HDFS location of
that table and "replace" all that is there with your latest
INSERT OVERWRITE DIRECTORY will write to your HDFS location of choice and
"replace" all that is there with your latest. You can use this directory
as LOCATION to which your PARTITION may point later
Note that INSERT OVERWRITE TABLE will follow the field separator of the
destination table that u specified while creating the table
However INSERT OVERWRITE DIRECTORY will have Hive's default "CtrlA" as the
field separator ( I use 0.10.x)Šperhaps this is changed in 0.11 u need to
In summary , u have to look at updates and deletes very differently from
On 5/31/13 12:27 PM, "Renata Ghisloti Duarte de Souza"
<[EMAIL PROTECTED]> wrote:
>I was wondering about the "update" statement on Hive. Is it something
>Hive needs? Or can "insert overwrite" be always used instead?
>Thank you in advance for the clarification,
=====================This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.