**SAS is the largest market-share holder for advanced analytics.**

If you are going to work in analytics industry, it is impossible to escape from the language SAS. Different softwares of SAS are used in the industry for data handling. Most common of them used in data handling are SAS Enterprise guide and Base SAS. Both the tools have a similar format and usage. The only difference being SAS Enterprise guide is the graphical user interface of Base SAS. SAS Enterprise guide is much easier to use and is modular in nature. Because of the modularity, SAS Enterprise guide is being widely used in the industry. With time Base SAS is losing its importance in the industry, and SAS Enterprise guide is filling in for Base SAS. However, having used both, I now appreciate and prefer using Base SAS to handle data of billions of customers and trillions of transactions.

One of our previous articles (http://www.analyticsvidhya.com/blog/2013/11/4-sas-tricky-analytics-interview/) covers four tricky questions asked in SAS interviews. In this article, I will cover some tricky scenarios in which using base SAS will become far easier than using the SAS Enterprise guide. These questions are tougher and lengthier than those covered in the first part of this article series. These questions are asked widely in companies who have a broad base of analytics and deal with big data (Millions of customers, Billions of transactions, Trillions of dollar value transactions).

**Base SAS vs. SAS Enterprise guide**

Let’s first look into the pros of using both Base SAS and SAS Enterprise guide. This will make us appreciate those tricky scenarios, even more.

**SAS Enterprise guide**

1. EG is more common in firms with a smaller team of analytics. This is because programs on EG are far more understandable by a person who is new to the firm. Given the high attrition rate in analytics industry, it becomes very essential for such firm to hedge their risk by using the SAS Enterprise guide instead of Base SAS.

2. You develop a certain traditional routine much faster on EG than on Base SAS.

3. It is much easier to comprehend the flow using EG.

** Base SAS**

1. It is much easier to code on base SAS in case the logic of code is very complex.

2. Base SAS is much faster as compare to EG.

3. It is much easier to modify a code on Base SAS than on EG.

4. Using Macros makes coding in Base SAS much easier than SAS EG

**Background to case 1**

You work in a retail industry. You have recently started a loyalty program for your customers. A study conducted on retail bank, says that the customers with a total purchase of $1,000 in 3rd month (T+ 2th months) are the customers who will finally purchase more than $30,000. You want to focus your loyalty campaign on these customers.

You have 2 data-sets. First has the entire list of customer IDs with their date of first purchase. Second data has the customer ID with their monthly purchases for each year-month. First purchase can possibly be a non-financial transaction which might not be a part of table 2.

You need to identify customers who make more than $1000 purchase in the 3rd month from the first purchase.

**Table 1 : **

**Table 2 :**

**Solution to Case 1**

This question is a classic case when Base SAS clearly beats SAS EG. In this section you will see a simple solution for this case study.

* Creating a macro for each month

%macro fetch_data (next_mon = , third_mon = );

data create_list;

set table_1;

if first_pur < next_mon;

run;

proc sort data = create_list out=list; by customer_id; run;

proc sort data = table_2 out=purchase; by customer_id; run;

data fetch_purchase;

merge list(in=a) purchase(in=b);

if yearmonth = third_mon;

by customer_id;

if a;

run;

proc datasets;

append base=final_dataset data = fetch_purchase foce;

run;

%mend fetch_data;

%fetch_data (next_mon = ’01Feb2012’d , third_mon = 201204);

%fetch_data (next_mon = ’01Feb2013’d , third_mon = 201304);

%fetch_data (next_mon = ’01Mar2012’d , third_mon = 201205);

%fetch_data (next_mon = ’01Mar2013’d , third_mon = 201305);

*Identifying the customers with purchase above $1000 in 3rd month

data shortlisted;

set final_dataset;

if sales ge 1000;

run;

**Background to case 2**

You work for a banking industry. You want to analyze the transaction dataset and want to find the median transaction amount for each customer. This is the amount over which we will want to pay to the customer for stretching. More the dollar value of transactions, the cheaper is the total cost of transactions.You need to make a list of all customer with their floored median transaction amount (if there are 5 transactions, we want the 2nd lowest transaction and not 3rd and if transactions are only 1 then remove the customer from the list).

The only dataset you have is unique on transaction ID. It also has the customer ID and amount of the transaction.

** Table 1**

**Solution to Case 2**

The solution to this problem is tiresome on SAS EG because there is no median function on SQL routines after grouping data. SQL routines are the foundation of data handling in SAS EG.But this becomes quite easy on Base SAS. Let’s see how this can be done easily on Base SAS.

proc sql;

create table work.summarize as

select count(*) as trans_nos, customer_id

from work.table1

group by customer_id;

quit;

proc sort data = tables1; by customer_id;run;

proc sort data = summarize; by customer_id;run;

data add_total_trans;

merge table1 (in=a) summarize (in=b);

median_no = floor(trans_nos/2);

by customer_id;

drop trans_nos;

run;

proc sort data = add_total_trans; by customer_id amount;run;

data final_list;

set add_total_trans;

by customer_id amount;

if first.customer_id then n =1;

if n = median_no;

n + 1;

run;

The solution in base SAS for this question is not only effective but also time efficient.

**End Notes**

Both Base SAS and EG have their own pros and cons. The best recommended strategy is to use both. If you want to make a traditional query, use SAS EG to generate automated code. Now copy this code to make it macronized and generalized using Base SAS. The macro adds a new dimension to the codes which helps you generalize the code and avoid hard entered data.

Have you faced any other SAS problem in analytics interview? Are you facing any specific problem with SAS codes? Do you think this provides a solution to any problem you face? Do you think there are other methods to solve the problems discussed in a more optimized way? Do let us know your thoughts in the comments below.

[…] Tricky Base SAS interview questions : Part-II […]

Thank you very nice blog liked

This is great stuff. Thanks Tavish for putting this together. Waiting for few more articles from the same series.

Very Impressive!!!

Very informative and crystal clear.

hi really nice..

but please post more examples i.e real time scenarios regarding credit cards domain.

Thank you very much for sharing your experiences and your article also very understandable.

Thank You so much….! Really nice…!!!!

Hi Tavish:

I am a senior Statistician in my late 40’s. For last over 15 years I have been using SAS for time series forecasting purposes only i.e. Time Series Forecasting system using SAS/ETS.

Now for my new job scenario I need to use SAS Base for data manipulations. Could you please tell me the most simplest and quickest way to put my hands on and learn SAS Base quickly. Plus for SAS Base certification what would be the most effective fastest approach.

Your guidance will very helpful and will be much appreciated.

thanks,

Ali,

Given that you have such deep experience into SAS, it should not be a challenge to learn base SAS. The tricky part is that learning material on SAS in a structured format is rarely available for free. Alternatively, if you are open to spend money and take up some paid trainings, SAS institute and Jigsaw academy offer such courses.

For Base SAS certification there won’t be any additional preperation required. If you are comfortable with base SAS fundamentals, certification should be cakewalk.

Hope this helps.

Tavish

Hi Ali,

Other than mentioned by Tavish, for some free tutorials you can start with the UCLA Site http://www.ats.ucla.edu/stat/sas/ (in my opinion a very good place to start with). You can also read through the Little SAS Book. The of course you have the SAS User guide (Generally available online for free).

Thanks Tavish and Amit. It is very helpful. I just started a free course form SAS training “SAS Programming Essentials-I”.

I’ll sure try the UCLA site recommended by Amit. And after that, if needed for SAS base Certification, I’ll try to take the paid course online from SAS training “SAS Programming Essentials-II”.

Thanks guys for your support and help, Ali

Hi Tavish,

This is a very useful stuff. But I feel this could have been made fruther compact using intnx to calculate the 3rd month rather than calculating them manually..

[…] c. http://www.analyticsvidhya.com/blog/2014/04/tricky-base-sas-interview-questions-part-ii/ […]

Hi! Nice, but you pass thru the large data set several time. Try to find:

1) A one-pass solution using a DATA-step (memory limitations perhaps).

or

2) A solution using a SAS PROC.

(I have a cold and slight fever now, so I do not dare to write anything specific).

Hi Tavish,

I am Sandeep,working as Research Associate in Indian School of Business for 9 months.I have a prior experience of 2 years working in Data Processing & some statistical analysis as well.I worked on Softwares like SAS, SPSS and R. Now, I am looking for a job in Data Analytics field in Hyderabad.

Could you please refer me if you know any openings for the same?

Thanks in Advance,

Sandeep