Khoi Nguyen — December 4, 2020
Beginner Data Exploration Data Visualization Excel Machine Learning R Structured Data Supervised Tableau

This article was published as a part of the Data Science Blogathon.

Introduction

There have been many articles online showing how to connect R with Tableau. However, what I observed is that most of them mainly demonstrate how to draw visualizations and plot charts with R-code. 

In this article, I will show how we can run a regression analysis and optimize rent price in R, then paste the value to Excel, which will subsequently be connected to Tableau again for further calculations with other fields in the original dataset.

Such kind of seamless integration among 3 different analysis tools can help data analysts to run statistical research in R, then migrate results in Tableau and visualize in digestible ways for business readers.

 

Introduction About Dataset Used and Business Requirements

The dataset in this example is extracted from the Capstone project within “Excel to MySQL: Analytic Techniques for Business”. This is a table containing information about properties for rent of a company, with information about short-term rent occupancy rate and the average nightly rent price. We also have data about the price at the 10th percentile and 90th percentile of similar properties in the same region.

Raw data can be found in the link below.

https://github.com/nguyenkhoi6394/Excel-to-MySQL-Capstone-Project/blob/main/database.csv

Our business requirement is to find an optimized price for each property so that revenue can be maximized. Since revenue is a function of occupancy rate*nightly rent price*365 (assuming that the property can be rented throughout the year), we need to express occupancy rate as a function of nightly rent price, which can be done by simple linear regression

The next task is to run the R optim function, just like we use Solver in Excel, for each Property, or each row in the dataset.

With optimized price and predicted occupancy rate, we can then calculate the total gross profit for the company and do many other analyses.

Connect R with Tableau

Here is the first look at the dataset after loading it into Tableau. Now I want to add another column showing the optimized price for each property.

r with tableau

We must first connect Tableau with R.

Before we connect R with Tableau, make sure your R console has installed Rserve already.

library("Rserve")

Rserve()

Now, R should print ‘Starting Rserve…’. If you observe this output, then R is reaching out to Tableau for a connection.

2) Open Tableau & click on Help > Settings & Performance > Manage External Service Connections.
3) In the dialog box that opens, choose ‘localhost’ for Server and type ‘6311’ for Port.
4) Then, click on Test Connection

Now, you should get a dialog box stating, ‘Successfully connected to the R serve service’. It means that you are ready to use R with Tableau

Create a calculated field running R code in Tableau

Create a Calculated Field and paste the below code:

SCRIPT_REAL(
"df <- data.frame(.arg1,.arg2,.arg3,.arg4,.arg5)

model <-lm(data=df,.arg1 ~ .arg2)

Create revenue function. 
revenue <- function(data,par) {
par_vs_10th <- par-data$.arg3
normalized_price <-0.1+0.8*par_vs_10th/data$.arg5
fcst_occupancy <-coef(model)['(Intercept)']+coef(model)['.arg2']*normalized_price
fcst_st_revenue <-fcst_occupancy*365*par
fcst_st_revenue
}

Run optim for each row in df. Find the value of "par"-rent price-that can optimize revenue function
for (i in 1:nrow(df)) 
{df[i,'optimized_price'] <-optim(122,revenue,data=df[i,],method='L-BFGS-B', control=list(fnscale=-1),lower=df[i,'.arg3']) }

#return optimized price as output for calculated field
df$optimized_price",

sum([OccupancyRate]),
avg([sample_price_percentile]),
avg([Percentile10Th Price]),
avg([Percentile 90Th Price]),
avg([percentile_90th_vs_10th]),
attr([Ws Property Id]))

R-code must be written under function like SCRIPT_REAL, which returns numeric values. There are other similar R-functions in Tableau such as SCRIPT_BOOL and SCRIPT_INT, depending on the values you want to retrieve.

Before running, we must create a table: df<-data.frame(.arg1,.arg2,…)

.arg1, .arg2… are the fields from data sources in Tableau. They are the bolded words in the code. .arg1 is the Occupancy Rate, .arg2 is the sample_price_percentile.

The R_code will be written within brackets (” “). The final line of code: df$optimized_price will determine the return value for this calculation.

For detailed explanation about running optim and linear regression in R, please refer to the link below:

https://nguyenkhoi6394.github.io/Excel-to-MySQL-Capstone-Project/Using_Optim_for_a_dataframe_in_R.html

This calculation is a table calculation. Make sure that it is computed along the Property ID.

 

r tableau excel

Let’s create a view to see this measure.

 

Now we have optimized price for each property.

However, a problem now occurs. This measure is a table calculation and we can only have one single value per property when observing it in a table. We cannot embed it inside another calculation.

For example, I want to normalize the optimized price into percentile value using below formula:

0.1+0.8*(Optimized Price-Percentile 10th Price)/(percentile 90th_vs_10th)

Tableau will trigger an error, saying that we cannot mix an aggregate measure with non-aggregate value. This is really inconvenient and inflexible, since we may want to leverage an R-coded calculation for many more measures.

tableau property id

In order to mitigate this issue, I have come up with a workaround solution: write the R-optimized values into a csv or excel file, then join this new dataset with the original data source in Tableau for other visualizations or measures creation.

 

Final Integration

Let’s create another calculation field in Tableau, called as Script. This time we will not return a numeric value, but will write outputs to an external CSV file. In my example, I write to CSV for simplicity, but you can also write to an xlsx file if prefer.

 

SCRIPT_REAL(
"df <- data.frame(.arg1,.arg2,.arg3,.arg4,.arg5)
 model <-lm(data=df,.arg1 ~ .arg2)

revenue <- function(data,par){
 par_vs_10th <- par-data$.arg3
 normalized_price <-0.1+0.8*par_vs_10th/data$.arg5
 fcst_occupancy <-coef(model)['(Intercept)']+coef(model)['.arg2']*normalized_price
 fcst_st_revenue <-fcst_occupancy*365*par
 fcst_st_revenue
 }

 for (i in 1:nrow(df)) {df[i,'optimized_price'] <-optim(122,revenue,data=df[i,],method='L-BFGS-B', control=list(fnscale=-1),lower=df[i,'.arg3']) }
df$normalized_optimized_price<-0.1+0.8*(df$optimized_price-df$.arg3)/(df$.arg5)

#Create a new dataframe, replacing .arg2(sample_percentile_price) with the normalized optimized price
new <-data.frame(.arg2=df$normalized_optimized_price)

#Predict the occupancy rate based on optimized price and add as a new column to df
df['Forecast Occupancy']=predict.lm(model, newdata=new)

#Add Property ID to df
df['Ws Property Id']=.arg6

#Write df to a csv file
 write.table(df,'D:/Documents/Business Analytics/4. Visualization/Business Capstone/Blogathon/new.csv',sep=',',row.names=FALSE,quote=FALSE,col.names = TRUE)
 ",
sum([Occupancy Rate]),
avg([sample_price_percentile]),
avg([Percentile 10Th Price]),
avg([Percentile 90Th Price]),
avg([percentile_90th_vs_10th]),
attr([Ws Property Id]))

The next step is to create a new Sheet, called Sheet 2 for instance. Then drag Property ID and Script measure to Detail in Mark card.

You should see a message like the following:

tableau

Just ignore that error message. Open the folder you specified in the Script calculation and you will see a new CSV file has just been created.

open csv in excel
r tableau excel - excel

Our next task is simpler, just connect Tableau workbook with this csv file and blend it with the original data source, based on the foreign key: WS Property ID.

database realtion in tableau

Now in the Data Pane, there is a new dataset readily available for use.

tableau data pane

Since we have Optimized Price and Forecast Occupancy Rate as normal fields, we can use them for further calculations without any issues related to aggregate level as previously.

Suppose that I want to create a measure called Gross Revenue= Optimized Price * Occupancy Rate *365. The calculation is now valid.

Gross Revenue

In the future, in case there are changes to the training data (sample nightly price), or you add more features to the linear model. Simply open Sheet 2 again to reactivate the process and retrieve new outputs.

 

End Notes

The ability to write code in R in a calculation makes Tableau become more flexible than its rival-Power BI in terms of connecting with external data analysis platforms. By combining Tableau, Excel and R, we can utilize the power of many tools simultaneously for our analytical practices.

Do you have any other ideas and use cases regarding using Python and R in Tableau? Feel free to comment in this article.

About the Author

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *