Ever wondered how to query and analyze raw data? Also, have you ever tried doing this with Athena and QuickSight? This blog post will walk you through the necessary steps to achieve this using Amazon services and tools. Amazon’s perfect combination of storage, transformation, and visualization can help you achieve your data analysis and visualization goal.
The required AWS services and tools are as follows:
Simple Storage Service (S3) is an online store where you can store and retrieve any type of data on the web, regardless of time and place.
With the help of simple SQL, we can analyze and query raw data by using AWS Athena‘s interactive service.
QuickSight is an AWS-based Business Intelligence and visualization tool used to visualize data, perform ad hoc analysis, and derive business insights from our data.
Let’s start by checking the flow of data from left to right. We will first upload our source file to S3, then connect Athena to S3 to query the data, and finally, use QuickSight to visualize the information.
Upload data to S3
Create a new bucket.
Create a new folder in this bucket because Athena needs the folders inside it to access the data.
Upload the source file to the previously created folder.
Create a Table in Athena and Query the Data
There are three ways to access Athena: the AWS Management Console, the Amazon Athena API, or the AWS CLI. For this blog, we will be using the AWS Management Console. Before working with Athena, ensure the Athena region is the same as the S3 bucket region created earlier. Otherwise, you won’t be able to connect and query the data.
AWS Athena has a simple and easy-to-understand interface.
https://aws.amazon.com
In the left part, you can see the databases along with the tables and views that are part of the selected database. The right section is for writing SQL queries, and the result of the query we ran is displayed in the results section. In addition, Athena allows us to save or format our query.
The menu structure is easy to navigate and includes five primary tabs: Query Editor, Saved Queries, History, AWS Glue Data Catalog, and Workgroup: Primary.
So let’s start working with Athena. First, we need to create a table, and there are several options to do this. We will create a table from the S3 bucket. Once we select this option, we will be redirected to the four-step process of creating the table.
Step 1: Name and Location
This step defines the database, table name, and input data set location.
You can select a database from the drop-down list if you already have a database. If not, you can create a new database by selecting Create a new database, and then you need to define its name. Next, we need to enter the name of the table. This name should be unique worldwide. After naming the table, we need to define the source file’s location. This is where the S3 comes into play. We need to define the path to the folder, so we need to specify the bucket’s name and the folder’s name. Note that we cannot provide a file path; we can only provide a folder path.
Source: https://aws.amazon.com
Step 2: Format the data
There are several data formats that we can access using Athena. In this example, we will use a CSV file to select CSV as the data format.
https://aws.amazon.com
Step 3: Columns
The source file does not contain the column names, so we need to specify them in this step. This is something that is required for Athena to know the schema of the data we are working with. This can be achieved in two ways. One way is to simply enter the column name and column type for each column individually, and the other way is to add the columns in bulk as shown in the image below.
https://aws.amazon.com
The second method is more suitable when our dataset has too many columns, and it is tedious to configure each one individually. The format is pretty simple. We need to enter the column name, followed by a space, and followed by the data type in that column. A comma separates column definitions.
Step 4: Partitions
In this step, we can configure partitions. Amazon suggests splitting the data to reduce the amount of data a query needs to scan. This can increase performance and reduce query costs. This is preferable if we have a large data set, but in our example, we have a small data set so it is not required.
Source: https://aws.amazon.com
With this last step, we have completed the table creation process. We created it using the Create Table Wizard, but that’s not the only way to do it. Athena also allows us to create tables using DDL statements, as shown in the figure below.
Now all that remains is to query the table and see if the configuration is correct. To test this, let’s run this simple SQL query:
SELECT DISTINCT region, product Category, COUNT (productCategory) AS Quantity FROM sales WHERE region IN ('Central', 'East', 'West') GROUP BY region, product Category ORDER BY region
After running this query in the results section, we can see the output shown in the image below. Output files are saved automatically for each query run, regardless of whether the query itself has been saved or not.
In addition, by selecting the file icon in the Results section, we can download a file with the query results in CSV format. We can also download it from the Query History tab.
Visualize Data with Amazon QuickSight
We use AWS Quicksight for data visualization purposes. But in addition to AWS Quicksight, you can also use Tableau, Looker, Mode Analytics, and more for advanced reports and visualizations. In our case, we will use QuickSight. It is an AWS-based Business Intelligence and visualization tool used to visualize data, perform ad hoc analysis, and derive business insights from our data. Data is entered as a dataset. AWS uses Superfast Parallel In-Memory Computing (SPICE) to perform data calculations and create graphs quickly. Amazon QuickSight retrieves data from Athena, S3, RDS, RedShift, MySQL, Snowflake, and many more.
You must create an account if you have never worked with QuickSight. Before connecting it to Athena, ensure that QuickSight has the right to access the information. You must enable QuickSight to access Amazon Athena and S3. If QuickSight does not have these rights, we will be unable to analyze and visualize the data we have queried in Athena. Another important thing is also the region. The region we selected for QuickSight must be the same as the region we selected for Athena. Otherwise, QuickSight will not be able to access the data we have in Athena.
Several visualizations or graphic formats are available. We can choose charts like bar charts, pie charts, ring charts, line charts, etc.
https://aws.amazon.com
In the image above, we used a vertical bar chart to visualize the data we previously queried in Athena effectively. The dashboard automatically updates as data is updated or scheduled.
Conclusion
AWS Athena is used to running queries quickly and easily without having to set up and manage any servers or data warehouses. This service is the right choice if you need to analyze large data sets. In this blog, we used S3 to store data, connected Athena to S3 to query the data, and finally, used QuickSight to visualize the information.
AWS-based Business Intelligence and visualization tool used to visualize data, perform ad hoc analysis, and derive business insights from our data.
There are three ways to access Athena: the AWS Management Console, the Amazon Athena API, or the AWS CLI.
The menu structure is easy to navigate and includes five primary tabs: Query Editor, Saved Queries, History, AWS Glue Data Catalog, and Workgroup: Primary.
Amazon suggests splitting the data to reduce the amount of data a query needs to scan. This can increase performance and reduce query costs. This is preferable if we have a large data set, but in our example, we have a small data set, so it is not required.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
A verification link has been sent to your email id
If you have not recieved the link please goto
Sign Up page again
Loading...
Please enter the OTP that is sent to your registered email id
Loading...
Please enter the OTP that is sent to your email id
Loading...
Please enter your registered email id
This email id is not registered with us. Please enter your registered email id.
Don't have an account yet?Register here
Loading...
Please enter the OTP that is sent your registered email id
Loading...
Please create the new password here
We use cookies on Analytics Vidhya websites to deliver our services, analyze web traffic, and improve your experience on the site. By using Analytics Vidhya, you agree to our Privacy Policy and Terms of Use.Accept
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.