Tricky Base SAS interview questions : Part-II

Tavish Srivastava 24 Jun, 2022 • 5 min read

tough questions

SAS is the largest market-share holder for advanced analytics.

If you are going to work in analytics industry, it is impossible to escape from the language SAS. Different softwares of SAS are used in the industry for data handling. Most common of them used in data handling are SAS Enterprise guide and Base SAS. Both the tools have a similar format and usage.

The only difference being SAS Enterprise guide is the graphical user interface of Base SAS. SAS Enterprise guide is much easier to use and is modular in nature. Because of the modularity, SAS  Enterprise guide is being widely used in the industry. With time Base SAS is losing its importance in the industry, and SAS Enterprise guide is filling in for Base SAS. However, having used both, I now appreciate and prefer using Base SAS to handle data of billions of customers and trillions of transactions.

One of our previous articles (https://www.analyticsvidhya.com/blog/2013/11/4-sas-tricky-analytics-interview/) covers four tricky questions asked in SAS interviews. In this article, I will cover some tricky scenarios in which using base SAS will become far easier than using the SAS Enterprise guide. These questions are tougher and lengthier than those covered in the first part of this article series. These questions are asked widely in companies who have a broad base of analytics and deal with big data (Millions of customers, Billions of transactions, Trillions of dollar value transactions).

[stextbox id=”section”]Base SAS vs. SAS Enterprise guide[/stextbox]

Let’s first look into the pros of using both Base SAS and SAS Enterprise guide. This will make us appreciate those tricky scenarios, even more.

SAS Enterprise guide

1. EG is more common in firms with a smaller team of analytics. This is because programs on EG are far more understandable by a person who is new to the firm. Given the high attrition rate in analytics industry, it becomes very essential for such firm to hedge their risk by using the SAS Enterprise guide instead of Base SAS.

2. You develop a certain traditional routine much faster on EG than on Base SAS.

3. It is much easier to comprehend the flow using EG.

 Base SAS

1. It is much easier to code on base SAS in case the logic of code is very complex.

2. Base SAS is much faster as compare to EG.

3. It is much easier to modify a code on Base SAS than on EG.

4. Using Macros makes coding in Base SAS much easier than SAS EG

[stextbox id=”section”]Background to case 1[/stextbox]

You work in a retail industry. You have recently started a loyalty program for your customers. A study conducted on retail bank, says that the customers with a total purchase of $1,000 in 3rd month (T+ 2th months) are the customers who will finally purchase more than $30,000. You want to focus your loyalty campaign on these customers.

You have 2 data-sets. First has the entire list of customer IDs with their date of first purchase. Second data has the customer ID with their monthly purchases for each year-month. First purchase can possibly be a non-financial transaction which might not be a part of table 2.

You need to identify customers who make more than $1000 purchase in the 3rd month from the first purchase.

Table 1 :

tab1

Table 2 :

tab2

[stextbox id=”section”]Solution to Case 1[/stextbox]

This question is a classic case when Base SAS clearly beats SAS EG. In this section you will see a simple solution for this case study.

[stextbox id=”grey”]

* Creating a macro for each month

%macro fetch_data (next_mon = , third_mon = );

data create_list;

set table_1;

if first_pur < next_mon;

run;

 

proc sort data =  create_list out=list; by customer_id; run;

 

proc sort data =  table_2 out=purchase; by customer_id; run;

 

data fetch_purchase;

merge list(in=a) purchase(in=b);

if yearmonth = third_mon;

by customer_id;

if a;

run;

 

proc datasets;

append base=final_dataset data =  fetch_purchase foce;

run;

%mend fetch_data;

 

%fetch_data (next_mon = ’01Feb2012’d  , third_mon = 201204);

%fetch_data (next_mon = ’01Feb2013’d  , third_mon = 201304);

%fetch_data (next_mon = ’01Mar2012’d  , third_mon = 201205);

%fetch_data (next_mon = ’01Mar2013’d  , third_mon = 201305);

 

*Identifying the customers with purchase above $1000 in 3rd month

data shortlisted;

set   final_dataset;

if sales ge 1000;

run;

[/stextbox]

[stextbox id=”section”]Background to case 2 [/stextbox]

You work for a banking industry. You want to analyze the transaction dataset and want to find the median transaction amount for each customer. This is the amount over which we will want to pay to the customer for stretching. More the dollar value of transactions, the cheaper is the total cost of transactions.You need to make a list of all customer with their floored median transaction amount (if there are 5 transactions, we want the 2nd lowest transaction and not 3rd and if transactions are only 1 then remove the customer from the list).

The only dataset you have is unique on transaction ID. It also has the customer ID and amount of the transaction.

 Table 1

tab3

[stextbox id=”section”]Solution to Case 2 [/stextbox]

The solution to this problem is tiresome on SAS EG because there is no median function on SQL routines after grouping data. SQL routines are the foundation of data handling in SAS EG.But this becomes quite easy on Base SAS. Let’s see how this can be done easily on Base SAS.

[stextbox id=”grey”]

proc sql;

create table work.summarize as

select count(*) as trans_nos, customer_id

from work.table1

group by customer_id;

quit;

 

proc sort data = tables1; by customer_id;run;

 

proc sort data = summarize; by customer_id;run;

 

data add_total_trans;

merge table1 (in=a) summarize (in=b);

median_no = floor(trans_nos/2);

by customer_id;

drop trans_nos;

run;

 

proc sort data = add_total_trans; by customer_id amount;run;

 

data final_list;

set add_total_trans;

by customer_id amount;

if first.customer_id then n =1;

if n = median_no;

n + 1;

run;

[/stextbox]

The solution in base SAS for this question is not only effective but also time efficient.

[stextbox id=”section”]End Notes[/stextbox]

Both Base SAS and EG have their own pros and cons. The best recommended strategy is to use both. If you want to make a traditional query, use SAS EG to generate automated code. Now copy this code to make it macronized and generalized using Base SAS. The macro adds a new dimension to the codes which helps you generalize the code and avoid hard entered data.

Have you faced any other SAS problem in analytics interview? Are you facing any specific problem with SAS codes?  Do you think this provides a solution to any problem you face? Do you think there are other methods to solve the problems discussed in a more optimized way? Do let us know your thoughts in the comments below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Tavish Srivastava 24 Jun 2022

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Şarkı Sözü
Şarkı Sözü 13 Apr, 2014

Thank you very nice blog liked

Swati
Swati 13 Apr, 2014

This is great stuff. Thanks Tavish for putting this together. Waiting for few more articles from the same series.

Wasim
Wasim 14 Apr, 2014

Very Impressive!!!

Venkatesh. KR.
Venkatesh. KR. 29 May, 2014

Very informative and crystal clear.

sowmya
sowmya 29 Jun, 2014

hi really nice.. but please post more examples i.e real time scenarios regarding credit cards domain.

Bhavani
Bhavani 01 Jul, 2014

Thank you very much for sharing your experiences and your article also very understandable.

Ravi
Ravi 08 Jul, 2014

Thank You so much....! Really nice...!!!!

Ali
Ali 15 Sep, 2014

Hi Tavish: I am a senior Statistician in my late 40's. For last over 15 years I have been using SAS for time series forecasting purposes only i.e. Time Series Forecasting system using SAS/ETS. Now for my new job scenario I need to use SAS Base for data manipulations. Could you please tell me the most simplest and quickest way to put my hands on and learn SAS Base quickly. Plus for SAS Base certification what would be the most effective fastest approach. Your guidance will very helpful and will be much appreciated. thanks,

Amit Sharma
Amit Sharma 22 Sep, 2014

Hi Ali, Other than mentioned by Tavish, for some free tutorials you can start with the UCLA Site http://www.ats.ucla.edu/stat/sas/ (in my opinion a very good place to start with). You can also read through the Little SAS Book. The of course you have the SAS User guide (Generally available online for free).

Ali
Ali 22 Sep, 2014

Thanks Tavish and Amit. It is very helpful. I just started a free course form SAS training "SAS Programming Essentials-I". I'll sure try the UCLA site recommended by Amit. And after that, if needed for SAS base Certification, I'll try to take the paid course online from SAS training "SAS Programming Essentials-II". Thanks guys for your support and help, Ali

Rohan
Rohan 25 Nov, 2014

Hi Tavish, This is a very useful stuff. But I feel this could have been made fruther compact using intnx to calculate the 3rd month rather than calculating them manually..

Anders Sköllermo
Anders Sköllermo 08 Feb, 2015

Hi! Nice, but you pass thru the large data set several time. Try to find: 1) A one-pass solution using a DATA-step (memory limitations perhaps). or 2) A solution using a SAS PROC. (I have a cold and slight fever now, so I do not dare to write anything specific).

Sandeep Kumar
Sandeep Kumar 13 Feb, 2015

Hi Tavish, I am Sandeep,working as Research Associate in Indian School of Business for 9 months.I have a prior experience of 2 years working in Data Processing & some statistical analysis as well.I worked on Softwares like SAS, SPSS and R. Now, I am looking for a job in Data Analytics field in Hyderabad. Could you please refer me if you know any openings for the same? Thanks in Advance, Sandeep

Neel Biswas
Neel Biswas 28 Jun, 2015

Hi Tavish I read your Both Articles on Interview questions. They were very nice and gave a quick glance to the major concepts of Base SAS but I would like to add some details to your 2nd CASE. Its mentioned in the business scenario that we want the floored median i.e if 5 transactions are done, then we need 2nd transaction. Then wouldn't it be beneficial to add 1 more data step containing only 1 obs per customer i.e only the relevant transaction.We proc sort data= final_list; by customer_list amount; run; data final; set final_set; by customer_id amount; if first.customer_id then tot_amt=0; total_amt+amount; if last.customer_id then output final; run; PLEASE CORRECT ME IF I AM WRONG OR MISSING SOME POINT HERE Thank You. Neel

Neel Biswas
Neel Biswas 28 Jun, 2015

CORRECTION* proc sort data= final_list; by customer_list amount; run; data final; set final_set; by customer_id amount; if first.customer_id then tot_amt=0; tot_amt+amount; *CORRECTED LINE; if last.customer_id then output final; run;

Jitin
Jitin 03 Aug, 2015

Hi These are good examples but in the second case all the Transaction Ids and customer ids are unique. Now the problem here is that we will not get any observation in the final dataset as its mentioned in the question that if transactions are only 1 then remove the customer from the observation from the list. Here we do not ave multiple Customer ids so this will not work. This is what i think. Please correct me if i'am wrong.

Data Visualization
Become a full stack data scientist