A Guide to Data Types in MySQL for Data Science Beginners
- Know about the various data types in MySQL
- Understand how to use these data types can be used in different scenarios
Databases are ubiquitous – from social media and sports to e-commerce and movies you watch online, databases are at the core of everything. For managing, retrieving, storing, and manipulating these databases we have Database Management Systems(DBMS).
There are plenty of DBMSs available in the market to manage this much data- MySQL, PostgreSQL, SQLite, Oracle Database, Apache CouchDB, etc. Each DBMS provides its own data types with a little modification than others but the basic idea is the same. Today I’ll cover the data types provided by MySQL DBMS.
The data types in MySQL are divided into three broad categories. I’ll cover each one in detail to give you a clear sense of them so that you can know where and how can you use them. Without further delay let’s get started.
Table of Contents
- What is a Data type?
- Data Types in MySQL
- String/Character Datatype
- Numeric Datatype
- Date and Time Datatype
What is a Data Type?
A data type as the name suggests is the type or category to which the data belongs to. It is an attribute of the data which defines the type of data an object can hold. In SQL the data type defines the type of data to be stored in each column of the table. Each column has its own data type. It is important to specify the data type of all columns so that similar values can be added to it. This means one column can hold only one type of data.
The data type is specified while creating the structure of the table. For example-
CREATE TABLE table_name ( column1 datatype, column2 datatype, column3 datatype, );
For instance, you want to create a Student Table, then this is how you can create it-
CREATE TABLE Student ( Roll_no int, Name char(50), Age int, );
Here we have specified that the column Roll_no and Age will hold only integer values and the column Name will hold character values.
Data Types in MySQL
Just like int and varchar, we saw in the example, MySQL provides many types of data types which have been categorized into 3 broad categories. Let’s look into each one of them.
1 – String/Character Data types in MySQL
|CHAR(size)||Stands for CHARACTER. It can hold a fixed-length string(alphabets, number, or special characters). The size of the required string is set using the size parameter, which can be between 0 and 255.
Your string should not exceed the length defined using the size parameter. Although it can be less than the size. If you do not provide any value then the default size is 1. It will occupy the space in the memory according to the size parameter. For example, Char(50) will straightaway occupy 50 bytes on the memory.
|VARCHAR(size)||Stands for VARIABLE CHARACTER. It can hold a variable-length string. The range of characters can be between 0 and 65,535. You cannot exceed the length of the string defined using the size parameter. Although it can be less than that. Its default value is also 1.
But the memory occupied by the varchar is dependent on the actual size of the string and not on what you have set. For example, if we have set the size to 50, Varchar(50), but our string is “Devdutt” then it will occupy only 7 bytes on the memory(unlike Char).
|BINARY(size)||Similar to CHAR(), but stores fixed-length binary byte strings. It does not contain any character set. The size parameter specifies the column length in bytes. Its range is between 0 and 255. The default size value is 1.|
|VARBINARY(size)||Similar to VARCHAR(), but stores variable-length binary byte strings. It does not contain any character set as well. The size parameter specifies the maximum column length in bytes.|
|BLOB(size)||BLOB is a Binary Large OBject that can hold a variable amount of data. It can store binary data such as images, multimedia, and PDF files. There are four BLOB types- TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB. BLOB holds up to 65,535 bytes of data.
It takes 2-byte overhead. It means that BLOB will occupy [the number of Binary values used+2]Bytes in the memory. Each BLOB value is stored using a two-byte length prefix that indicates the number of bytes in the value.
For instance, your data contains 100 binary values then BLOB will occupy 102 bytes in the memory.
|TINYBLOB||It is a type of BLOB that can hold up to 255 bytes of data. It takes 1-Byte overhead.|
|MEDIUMBLOB||It is a type of BLOB that can hold up to 16,777,215 bytes of data. It takes 3-Bytes overhead.|
|LONGBLOB||It is a type of BLOB that can hold up to 4,294,967,295 bytes of data. LONGBLOB can store the maximum data among these four BLOB. It takes 4-Bytes overhead.|
|TEXT(size)||The TEXT is useful for storing long-form text strings such as articles, blogs, etc. It has more features than CHAR and VARCHAR. It can hold from 1 byte to 4 GB of data. Just like BLOB, there are four TEXT types- TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT. TEXT holds up to 65,535 bytes or 64KB of data.
It takes 2-byte overhead. It means that TEXT will occupy [the number of character used+2]Bytes in the memory. For instance, your text contains 100 characters then TEXT will occupy 102 bytes in the memory.
|TINYTEXT||TINYTEXT can store up to 255 characters i.e 255 bytes. It is suitable for storing summaries of articles, short notices, captions, etc.
It takes 1-Byte overhead.
|MEDIUMTEXT||MEDIUMTEXT can store up to 16,777,215 characters i.e 16,777,215 bytes or 64MB of data. It is suitable for larger text strings like books, research papers, and code backup.
It takes 3-Bytes overhead.
|LONGTEXT||LONGTEXT can store the maximum characters among all four, up to 4,294,967,295 characters i,e 4,294,967,295 bytes or 4GB. This is more than enough storage for any long-form text strings. For example, a book that MEDIUMTEXT can’t hold can be stored using LONGTEXT.
LONGTEXT takes 4-Bytes overhead.
|ENUM(val1, val2, val3, …)||ENUM is a string object whose value is chosen from a list of permitted values defined at the time of column creation. You can list up to 65,535 values in an ENUM list. This provides compact storage of the data. If a value is inserted that is not in the list, a blank value will be inserted. The values are sorted in the order you enter them.
This is how you can define ENUM-
CREATE TABLE table_name( column_name ENUM(Val1, Val2, Val3), );
2 – Numeric Data types in MySQL
|INT(size)||INT is used for storing exact numbers. There are five INT types- TINYINT, INT, SMALLINT, MEDIUMINT, and BIGINT(the range of TINYINT is the least and of BIGINT is the most).
The signed range of INT is from -2147483648 to 2147483647 and the unsigned range is from 0 to 4294967295. You can specify signed or unsigned int in the column definition. The size parameter specifies the maximum length of the number which is 255.
EXAMPLE: 4294967294 for unsigned int and -17826537 for signed int.
|TINYINT(size)||The signed range of TINYINT is from -128 to 127 and the unsigned range is from 0 to 255. This means you can enter numbers ranging from 0 to 255 in TINYINT. You cannot store even 256 in this. For that next datatype can be used.
The maximum display length of TINYINT is also 255.
EXAMPLE: 254 for unsigned int and -116 for signed int.
|SMALLINT(size)||The signed range of SMALLINT is from -32768 to 32767 and the unsigned range is from 0 to 65535. The maximum display length of SMALLINT is also 255.
EXAMPLE: 65534 for unsigned int and -32423 for signed int.
|MEDIUMINT(size)||The signed range of MEDIUMINT is from -8388608 to 8388607 and the unsigned range is from 0 to 16777215. The maximum display length of MEDIUMINT is also 255.
EXAMPLE: 16777214 for unsigned int and -8251625 for signed int.
|BIGINT(size)||The signed range of BIGINT is from -9223372036854775808 to 9223372036854775807 and the unsigned range is from 0 to 18446744073709551615. The maximum display length of BIGINT is also 255.
EXAMPLE: 18446744073709551614 for unsigned int and -90837625537882 for signed int.
|FLOAT(p)||FLOAT is used for storing approximate values. MySQL uses the value of p to determine whether to use FLOAT or DOUBLE for the resulting data type. If the value of p is from 0 to 24, the data type becomes FLOAT(). If p is from 25 to 53, the data type becomes DOUBLE()
|FLOAT(size, d)||The length of digits is specified using the size parameter. The number of digits after the decimal point is specified in the d parameter. The float has 32 bit (4 bytes) with 8 places accuracy. The d should be less than or equal to size(d<=size).
For example FLOAT(3,2) can store a number 3.12, here the size of the number is 3 and there are 2 digits after the decimal. If you’ll provide more digits after the decimal then FLOAT will round the digit. For example, if you provide 3.006, then this digit will be stored as 3.01
|DOUBLE(size, d)||The difference between float and double is that DOUBLE is used when you want to achieve high accuracy because double has 64 bit (8 bytes) with 16 places accuracy. The total number of digits is specified using the size parameter. The number of digits after the decimal point is specified in the d parameter. The d should be less than or equal to size(d<=size).
|DECIMAL(size, d)||DECIMAL is used to store exact numeric values in the database. For instance, the DECIMAL data type is used for columns that preserve exact precision e.g., money data in accounting systems.
The total number of digits is specified using the size parameter. The number of digits after the decimal point is specified in the d parameter. But the catch is the maximum number for size is 65 and the maximum number for d is 30. You cannot exceed these values and d should be less than or equal to size(d<=size). The default value for the size is 10 and the default value for d is 0.
|BIT(size)||BIT is used to store bit values i.e a number containing 0’s and 1’s only. The number of bits per value is specified using the size parameter. The size parameter can hold a value from 1 to 64. The default value for size is 1.
|BOOLEAN/BOOL||False values are stored as zero and true values as one. This datatype has only two values. MySQL does not provide a built-in Boolean data type. It uses TINYINT(1) instead which works the same. For convenience, MySQL provides synonyms of TINYINT(1) as BOOLEAN or BOOL, so that you can use them for simplification.
You can use BOOLEAN like this-
CREATE TABLE table_name( column1_name BOOLEAN, column2_name BOOL, column3_name TINYINT(1) );
All three lines of code are the same.
EXAMPLE: true, false
3 – Date and Time Data types in MySQL
|DATE||DATE is one of the five temporal data types provided by MySQL for managing dates. It can store the date in the YYYY-MM-DD format. Dates within the range from ‘1000-01-01’ to ‘9999-12-31’ can be stored. But if you want to store dates apart from this range you can use other non-temporal formats such as INT.
|TIME is used for storing the time of a day. The time can be stored in the hh:mm:ss format only. Also, you can enter time within the range from ‘-838:59:59’ to ‘838:59:59’. It can have fractional seconds part that is up to microseconds precision (6 digits).
|DATETIME||It contains a combination of date and time. The format for storing data and time is YYYY-MM-DD hh:mm: ss. Dates within the range from ‘1000-01-01 00:00:00’ to ‘9999-12-31 23:59:59’ can be stored. You can add DEFAULT and ON UPDATE in the column definition to get automatic initialization and updating to the current date and time
EXAMPLE: 2020-10-24 11:55:40
|TIMESTAMP||TIMESTAMP values are stored as the number of seconds since the Unix epoch (‘1970-01-01 00:00:00’ UTC). The format is similar to the DATETIME as YYYY-MM-DD hh:mm:ss. The supported range for TIMESTAMP is from ‘1970-01-01 00:00:01’ UTC to ‘2038-01-09 03:14:07’ UTC. The difference you can see is that MySQL stores TIMESTAMP in UTC(Universal Time coordinated) value. When you insert a TIMESTAMP value into a table, MySQL converts it from your connection’s time zone to UTC for storing.
Automatic initialization and updating to the current date and time can be specified using DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP in the column definition
EXAMPLE: 2020-10-24 11:55:40
|YEAR||YEAR can be used for storing a year in a four-digit format. Values are allowed in the four-digit format: 1901 to 2155, and 0000. If you input 1- or 2-digit string, MySQL converts values in the ranges from ‘0’ to ’69’ to YEAR values in the ranges 2000 to 2069 and the values in the range of ’70’ to ’99’ to YEAR values in the range from 1970 to 1999.
This table has a combination of all three types of datatypes:
CREATE TABLE Student (Admission_Id Varchar(10), Class Tinyint(2), Class_Roll_No Int, Firstname Varchar(50), Lastname Varchar(50), Age Tinyint, DOB Date, Year_of_Admission Year); INSERT into Student values ('AB121', 10, 1, 'Abhay', 'Singh', 16, '2004-05-11', '04'); INSERT into Student values ('AB152', 10, 2, 'Aryan', 'Verma', 15, '2005-11-13', '2005'); INSERT into Student values ('AB233', 10, 3, 'Ayat', 'Khan', 16, '2004-03-10', '2006'); SELECT * from Student;
I’ve covered some famous and common data types that MySQL provides. There are other advanced datatypes as well which will be covered in another article.
I recommend you go through the following resources to enhance your understanding of SQL-
- 24 Commonly used SQL Functions for Data Analysis tasks
- Difference between SQL Keys (Primary Key, Super Key, Candidate Key, Foreign Key)
- SQL for Beginners and Analysts – Get Started with SQL using Python
I hope now you have a clear understanding of the datatypes in MySQL covered in this article. Thanks for reading!