SAS is the largest market-share holder for advanced analytics.
If you are going to work in analytics industry, it is impossible to escape from the language SAS. Different softwares of SAS are used in the industry for data handling. Most common of them used in data handling are SAS Enterprise guide and Base SAS. Both the tools have a similar format and usage. The only difference being SAS Enterprise guide is the graphical user interface of Base SAS. SAS Enterprise guide is much easier to use and is modular in nature. Because of the modularity, SAS Enterprise guide is being widely used in the industry. With time Base SAS is losing its importance in the industry, and SAS Enterprise guide is filling in for Base SAS. However, having used both, I now appreciate and prefer using Base SAS to handle data of billions of customers and trillions of transactions.
One of our previous articles (http://www.analyticsvidhya.com/blog/2013/11/4-sas-tricky-analytics-interview/) covers four tricky questions asked in SAS interviews. In this article, I will cover some tricky scenarios in which using base SAS will become far easier than using the SAS Enterprise guide. These questions are tougher and lengthier than those covered in the first part of this article series. These questions are asked widely in companies who have a broad base of analytics and deal with big data (Millions of customers, Billions of transactions, Trillions of dollar value transactions).
Let’s first look into the pros of using both Base SAS and SAS Enterprise guide. This will make us appreciate those tricky scenarios, even more.
SAS Enterprise guide
1. EG is more common in firms with a smaller team of analytics. This is because programs on EG are far more understandable by a person who is new to the firm. Given the high attrition rate in analytics industry, it becomes very essential for such firm to hedge their risk by using the SAS Enterprise guide instead of Base SAS.
2. You develop a certain traditional routine much faster on EG than on Base SAS.
3. It is much easier to comprehend the flow using EG.
1. It is much easier to code on base SAS in case the logic of code is very complex.
2. Base SAS is much faster as compare to EG.
3. It is much easier to modify a code on Base SAS than on EG.
4. Using Macros makes coding in Base SAS much easier than SAS EG
You work in a retail industry. You have recently started a loyalty program for your customers. A study conducted on retail bank, says that the customers with a total purchase of $1,000 in 3rd month (T+ 2th months) are the customers who will finally purchase more than $30,000. You want to focus your loyalty campaign on these customers.
You have 2 data-sets. First has the entire list of customer IDs with their date of first purchase. Second data has the customer ID with their monthly purchases for each year-month. First purchase can possibly be a non-financial transaction which might not be a part of table 2.
You need to identify customers who make more than $1000 purchase in the 3rd month from the first purchase.
Table 1 :
Table 2 :
This question is a classic case when Base SAS clearly beats SAS EG. In this section you will see a simple solution for this case study.
* Creating a macro for each month
%macro fetch_data (next_mon = , third_mon = );
if first_pur < next_mon;
proc sort data = create_list out=list; by customer_id; run;
proc sort data = table_2 out=purchase; by customer_id; run;
merge list(in=a) purchase(in=b);
if yearmonth = third_mon;
append base=final_dataset data = fetch_purchase foce;
%fetch_data (next_mon = ’01Feb2012’d , third_mon = 201204);
%fetch_data (next_mon = ’01Feb2013’d , third_mon = 201304);
%fetch_data (next_mon = ’01Mar2012’d , third_mon = 201205);
%fetch_data (next_mon = ’01Mar2013’d , third_mon = 201305);
*Identifying the customers with purchase above $1000 in 3rd month
if sales ge 1000;
You work for a banking industry. You want to analyze the transaction dataset and want to find the median transaction amount for each customer. This is the amount over which we will want to pay to the customer for stretching. More the dollar value of transactions, the cheaper is the total cost of transactions.You need to make a list of all customer with their floored median transaction amount (if there are 5 transactions, we want the 2nd lowest transaction and not 3rd and if transactions are only 1 then remove the customer from the list).
The only dataset you have is unique on transaction ID. It also has the customer ID and amount of the transaction.
The solution to this problem is tiresome on SAS EG because there is no median function on SQL routines after grouping data. SQL routines are the foundation of data handling in SAS EG.But this becomes quite easy on Base SAS. Let’s see how this can be done easily on Base SAS.
create table work.summarize as
select count(*) as trans_nos, customer_id
group by customer_id;
proc sort data = tables1; by customer_id;run;
proc sort data = summarize; by customer_id;run;
merge table1 (in=a) summarize (in=b);
median_no = floor(trans_nos/2);
proc sort data = add_total_trans; by customer_id amount;run;
by customer_id amount;
if first.customer_id then n =1;
if n = median_no;
n + 1;
The solution in base SAS for this question is not only effective but also time efficient.
Both Base SAS and EG have their own pros and cons. The best recommended strategy is to use both. If you want to make a traditional query, use SAS EG to generate automated code. Now copy this code to make it macronized and generalized using Base SAS. The macro adds a new dimension to the codes which helps you generalize the code and avoid hard entered data.
Have you faced any other SAS problem in analytics interview? Are you facing any specific problem with SAS codes? Do you think this provides a solution to any problem you face? Do you think there are other methods to solve the problems discussed in a more optimized way? Do let us know your thoughts in the comments below.