An Overview on DDL Commands in Apache Hive
This article was published as a part of the Data Science Blogathon.
Apache Hadoop is the most used open-source framework in the industry to store and process large data efficiently. Hive is built on the top of Hadoop for providing data storage, query and processing capabilities. Apache Hive provides an SQL-like query system for querying large volumes of data stored in various databases and file systems. Hive stores data using tables. Tables in Hive are analogous to RDMS tables. Before starting with Hive DDL commands, we first need to understand the commands are divided into various categories based on the functions performed by the authorities, such as database creation, updating data, access control, etc. DDL commands are used to create and modify the structure of databases and tables. Afterwards, these tables are used to query and process large datasets.
Have you ever wondered what might be the different DDL commands in Hive? That’s what we’ll discuss in this article!
DDL Commands on Databases in Hive
Create a database in Hive:
CREATE DATABASE command is used to create a database with the specified name in Hive.
We can use either DATABASE or SCHEMA for database creation. If the database already exists, use IF NOT EXISTS while creating a new database to avoid the error.
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)];
Here, COMMENT is a literal used to describe database details. LOCATION specifies the path to the directory where the database is stored. WITH PROPERTIES is used to identify the properties associated with the database, such as the name of the user, type of the database etc.
Example: For creating a database, SCHOOL use the following query
CREATE DATABASE IF NOT EXISTS SCHOOL COMMENT "Store details about students, courses and professors" WITH DBPROPERTIES ('createdBy' = 'Chaitanya');
Show databases in Hive:
SHOW DATABASES command lists all the databases present in the hive.
Example: Showing all the databases present in the hive.
Describe the database in Hive:
DESCRIBE DATABASE command displays the database name in Hive, its comment (if provided), and database location on the file system. To get the database properties, use EXTENDED with the above command.
Syntax: DESCRIBE DATABASE|SCHEMA [EXTENDED] database_name;
Example: To see the above-created database details, use the command
DESCRIBE DATABASE SCHOOL
Example: To see the above-created database details and properties, use the command
DESCRIBE DATABASE EXTENDED SCHOOL
Use database in Hive:
USE command is used to select the specified database for a session on which all subsequent HiveQL statements would be executed.
For example: If you want to use the SCHOOL database for executing subsequent HiveQL statements, use the command,
Drop database in Hive:
DROP DATABASE command is used to delete the database in Hive. By default, the command behaviour is RESTRICT which means that the database is dropped only when empty. Use CASCADE if you want to drop the database with tables.
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
Examples: If you want to delete database SCHOOL use,
DROP IF EXITS SCHOOL
If you want to delete the database SCHOOL with tables use,
DROP IF EXITS SCHOOL CASCADE
Alter database in Hive:
ALTER DATABASE command is used to modify the metadata associated with the existing database in Hive. This command can change database properties, database location, database owner, etc.
The syntax for changing database properties:
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);
Example: If you want to add one property createdFor to the existing SCHOOL database, then use :
ALTER SCHEMA SCHOOL SET DBPROPERTIES (‘createdFor’=’Analytics’);
The syntax for changing database owner:
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
Example: If you want to change the owner of the SCHOOL database to an admin role, then use :
ALTER SCHEMA SCHOOL SET OWNER ROLE admin
DDL Commands on Tables in Hive
Create a table in Hive:
CREATE TABLE command is used to create a table with the specified name in Hive. If a table with the same name already exists, the table is not made, and the error is thrown. To avoid the error, use IF NOT EXISTS while creating a new table.
CREATE TABLE [IF NOT EXISTS] [db_name.] table_name [(col_name data_type [COMMENT col_comment], ... [COMMENT col_comment])] [COMMENT table_comment] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path];
Here, COMMENT is a literal used to describe table and table columns. ROW FORMAT is used to specify row format for input and output. STORED AS is used to identify the file format for table storage. LOCATION is used to determine the path to the directory where table data is stored.
Example: For creating an internal table STUDENT with columns name, age, course_assigned and email_id, use the following query
CREATE TABLE STUDENT ( name string COMMENT 'This is student name.', age INT COMMENT 'This is student age.', course_assigned string COMMENT 'This is course assigned to the student.', email_id string COMMENT 'This is student email' ) COMMENT 'The table stores student details' ROW FORMAT DELIMITED STORED AS TEXTFILE;
Show tables in Hive:
SHOW TABLES command lists all the tables and views present in the current database.
Example: Showing all the tables present in the default database.
Describe the table in Hive:
DESCRIBE TABLE command displays the lists of columns for the specified table.
Example: To see the lists of columns for the above-created table, STUDENT use,
Drop table in Hive:
DROP TABLE command is used to delete the data and metadata associated with the specified table.
Example: If you want to delete table STUDENT use,
Alter table in Hive:
ALTER TABLE command is used to alter or modify the table structure of the already existing table. This command can be used to rename a table, add columns to the table, etc.
Syntax to rename table:
ALTER TABLE table_name RENAME TO new_table_name;
Example: If you want to rename the STUDENT table to student_data, the below command:
ALTER TABLE student RENAME TO students_data;
The syntax for adding columns to the table:
ALTER TABLE table_name ADD COLUMNS (column_1, column_2, column_3) ;
Example: If you want to add one more column address to the student_data table, then use:
ALTER TABLE students_data ADD COLUMNS (address STRING COMMENT 'This is student address.') ;
The truncate table in Hive:
TRUNCATE TABLE command removes all the rows from the specified table.
TRUNCATE TABLE table_name;
Example: If you want to remove all the rows from table student_data use,
TRUNCATE TABLE student_data;
In this article, we learned about various Hive DDL commands. We learned how to use DDL commands to assign database owners, delete table data, and modify database properties for data storage inside Hive. We also learned how to create databases and tables, select locations to store Hive databases and modify database metadata. Hopefully, you got a good understanding of Hive DDL commands. Please let me know your queries in the comments section below.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.