Many of us wonder 'what is SAS?', but in order to answer this question, we must first understand why SAS was developed in the first place. SAS (Statistical Analytics System) was developed as a means of handling tons of data generated each day in a systematic and organized manner. Its primary task is to create and analyze data followed by making strategic decisions.
In short, a reason behind the development of data analysis tools like SAS has to do with the abundance of data that leads to data analysis, and SAS does it very effectively. Let's first discuss what SAS is before moving on to SAS Interview Questions.
What is SAS (Statistical Analytics System)?
SAS (Statistical Analytics System) is one of the leading analytics software tools and has been developed by SAS Institute. It provides users with the ability to alter, manage and retrieve different kinds of data from different sources, as well as perform statistical analysis on collected data. SAS software lets you perform a number of operations on data, including data management, statistical analysis, report writing, business modelling, application development, quality improvement, data extraction, and data transformation.
As shown above, SAS extracts raw data from different sources, cleans the data, and stores or loads it in a database. SAS extracts and categorizes data in tables that help you identify and analyze data patterns. Using this tool will allow you to increase employee productivity and business profits through qualitative techniques and procedures like advanced analytics, multivariate analysis, business intelligence, handling data management functions, or predictive analytics. SAS is driven by SAS programmers, who perform a series of operations on SAS datasets in order to generate reliable statistical data reports for taking business decisions. Non-technical users have access to a graphical interface with point-and-click functionality and more advanced options with SAS language.
In this article, we will discuss concepts on SAS ranging from freshers to experienced level, including reading and data manipulation, reporting, SAS macros, SQL queries, SAS programming, etc.
SAS Interview Questions for Freshers
1. Why choose SAS over other data analytical tools?
Listed below are a few reasons to choose SAS over other data analysis tools:
- Learning and using SAS is very easy as compared to other analytics software tools. It has a better and more stable Graphic User Interface (GUI) and offers an easy option (PROC SQL) for users who are already familiar with SQL.
- Every day, data is growing and securing data becomes more complicated. SAS is very capable of storing and organizing large amounts of data smoothly and reliably.
- In the corporate world and large companies, SAS is often used, as it is more professional and easier to use compared to other languages. SAS jobs abound all over the market.
- SAS provides adequate graphical functionality. However, it provides limited customization options.
- Since SAS is licensed software and its updates are released in a controlled environment, all of its features have been thoroughly tested. As a result, there are fewer chances of errors.
- The customer service and technical support provided by SAS are outstanding. In any case, if a user runs into technical difficulties during installation, they will receive immediate assistance from the team.
- With its high level of security in terms of data privacy, SAS is a recognized and trusted name in the enterprise market.
2. What are the essential features of SAS?
SAS has the following essential features:
- SAS offers extensive support for programmatically transforming and analyzing data in comparison to other BI (Business Intelligence) tools,
- SAS offers extensive support for programmatically transforming and analyzing data in comparison to other BI (Business Intelligence) tools.
- Furthermore, SAS is a platform-independent software, which means it can run on almost any operating system, including Linux, Windows, Mac, and Ubuntu.
- It provides very fine control over data manipulation and analysis, which is its USP.
- The SAS package provides a complete data analysis solution, ranging from simple figures to advanced analysis. One of the best features of SAS software is its Inbuilt Library, which contains all the necessary packages for data analysis and reporting.
- The reports can be visualized in the form of graphs that range from simple scatter plots and bar graphs to complex multi-page classification panels.
- Another feature of SAS is its support for multiple data formats. With SAS, you can read data from a variety of file types, formats, and even from files with missing data.
- Since SAS is a 4GL (4 Generation Programming Language), it has an easy-to-learn syntax, which makes it an essential feature.
3. Write down some capabilities of SAS Framework.
SAS Framework has the following four capabilities:
- Access Data: Data accessibility is a powerful SAS capability. In other words, data can be accessed from different sources including raw databases, excel files, Oracle databases, SAS datasets, etc.
- Manage Data: SAS offers additional capabilities including data management. Data accessed from a variety of sources can thus be managed easily in order to generate useful insights. The process of managing data can include creating variables, validating data, cleaning data, creating subsets, etc. SAS manages the existing data to provide the data that you need.
- Analyze Data: SAS will analyze the data once it has been managed to perform simple evaluations like frequency and averages, along with more complex evaluations like forecasting, regression, etc.
- Present: The analyzed data can be saved and stored as a graphic report, a list, and overall statistics that can be printed or published. They can also be saved into a data file.
4. What is the use of Retain in SAS?
SAS, at the start of each iteration of the data step, reads the data statement and puts the missing values of variables (assigned either through an INPUT statement or via an assignment statement within the data step) into the program data vector (logical areas of memory). RETAIN statements override this default. In other words, a RETAIN statement instructs SAS not to set variables to missing when moving from one iteration of the data step to another. The variables are instead retained.
RETAIN variable1 variable2 ... variablen;
There are no limits to the number of variables you can specify. When you do not specify variable names, SAS retains the values of every variable that was created in INPUT or assignment statement by default.
5. What is PDV (Program Data Vector)?
Logical areas of memory where SAS builds data sets, one observation at a time are called Program data vectors (PDVs). Whenever a program is executed, SAS usually reads data values from the input buffer or generates them based on SAS language statements and assigns these data values to specific or respective variables in the program data vector. The program data vector also includes two automatic variables i.e., _N_ and _ERROR_ variable.
6. State difference between Missover and Truncover in SAS.
- Missover: The INPUT statement does not jump to the next line when the Missover option is used on the INFILE statement. If the INPUT statement cannot read the entire field specified due to the field length, it will set the value to missing. The variables with no values assigned are set to missing when an INPUT statement reaches the end of an input data record.
Example: An external file with variable-length records, for example, contains the following records:
1 22 333 4444 55555
Following are the steps to create a SAS data set using these data. The numeric informat 5 is used for this data step and the informatted length of the variable NUM is matched by only one input record.
data readin; infile 'external-file' missover; input NUM 5.; run; proc print data=readin; run;
Obs ID 1 . 2 . 3 . 4 . 5 55555
Those values that were read from input records that were too short have been set to missing. This problem can be corrected by using the TRUNCOVER option in the INFILE statement:
- Truncover: This option assigns the raw data value to the variable, even if it is shorter than what the INPUT statement expects.
An external file with variable-length records, for example, contains the following records:
1 22 333 4444 55555
Following are the steps to create a SAS data set using these data. The numeric informat 5 is used for this data step.
data readin; infile 'external-file' truncover; input NUM 5.; run; proc print data=readin; run;
Obs ID 1 1 2 22 3 333 4 4444 5 55555
Those values that were read from input records that were too short are not set to missing.
7. What do you mean by the Scan function in SAS and write its usage?
The Scan() function is typically used to extract words from a value marked by delimiters (characters or special signs that separate words in a text string). The SCAN function selects individual words from text or variables containing text and stores them in new variables.
In this case,
- Argument: It specifies the character variable or text to be scanned.
- N: The number n indicates which word to read.
- Delimiters: These are characters values or special signs in a text string.
Consider that we would like to extract the first word from a sentence 'Hello, Welcome to Scaler!'. In this case, the delimiter used is a blank.
data _null_; string="Hello, Welcome to Scaler!"; first_word=scan(string, 1, ' ' ); put first_word =; run;
First_word returns the word 'hello' since it's the first word in the above sentence. Now, consider that we would like to extract the last word from a sentence 'Hello, Welcome to Scaler!'. In this case, the delimiter used is a blank.
data _null_; string="Hello, Welcome to Scaler!"; last_word=scan(string, -1, ' ' ); put last_word =; run;
Last_word returns 'Scaler!' As Scaler is the last word in the above sentence.
8. Consider the following expression stored in the variable address: 9/4 Infantry Marg Mhow CITY, MP, 453441
In the following scenario, what would the scan function return?
In the above program, we have used the scan function to read the 3rd word in the address string. The following output will the returned by the scan function:
9. Explain what is first and last in SAS?
SAS Programming always uses the BY and SET statements to group data based on the order of grouping. When both BY and SET statements are used together, SAS automatically creates two temporary variables, FIRST. and LAST. 'SAS' identifies the first and last observations of a group based on the values of the FIRST. and LAST. variables. These variables are always 1 or 0, depending on the following conditions:
- FIRST.variable = 1 if an observation of a group is the first one in a BY group.
- FIRST.variable = 0 if observation of group is not the first one in a BY group.
- LAST.variable = 1 if observation of group is the last one in a BY group.
- LAST.variable = 0 if observation of group is not the last one in a BY group.
Essentially, SAS stores FIRST.variable and LAST.variable in a program data vector (PDV). As a result, they become available for DATA step processing. However, SAS will not add them to the output data set since they are temporary.
Example: In the following example, ID is a grouping variable containing duplicate entries. When FIRST.variable = 1 and LAST.variable = 1, it means that there is only a single value in the group like ID=4, ID=6 and ID=8 as shown below:
10. What is the meaning of STOP and OUTPUT statements in SAS?
- STOP Statement: Using STOP, SAS immediately stops processing the current DATA step and resumes processing statements after the current DATA step ends. In other words, the STOP statement halts the execution of all statements containing it, including DO statements and looping statements.
Example: As demonstrated in this example, STOP is used to avoid an infinite loop when using a random access method within a DATA step:
data sample; do developerobs=1 to engineeringobs by 10; set master.research point=developerobs nobs=engineeringobs; output; end; stop; run;
- OUTPUT Statement: Output tells SAS to write the current observation immediately to a SAS data set, not at the end of the DATA step. The current observation will be written to all data sets named in the DATA statement if there is no data set name specified in the OUTPUT statement.
Example: Each line of input data can be used to create two or more observations. As given below, for each observation in the data set Scaler, three observations are created in the SAS data set Result.
data Result(drop=time4-time6); set Scaler; time=time4; output; time=time5; output; time=time6; output; run;
11. State the difference between using the drop = data set option in the set statement and data statement.
In SAS, the drop= option is used to exclude variables from processing or from the output data set. This option tells SAS which variables you wish to remove from a data set.
- The drop= option in the set statement can be used if you do not wish to process certain variables or do not want to have them included in the new data set.
- However, if you want to process certain variables but don't want them to be included in the new data set, then choose drop= in the data statement.
In this case, variable(s) lists one or more names of variables. Variables can be listed in any format SAS supports.
Example: Consider the following data set:
DATA outdata; INPUT gender $ section score1 score2; DATALINES; F A 17 20 F B 25 17 F C 12 15 M D 21 25 ; proc print; run;
The following DROP= data set option command SAS to drop variables score1 and score2.
data readin; set outdata (drop = score1 score2); totalsum = sum(score1, score2); run;
Gender Section score1 score2 totalsum F A . . . F B . . . F C . . . M D . . .
12. Name different data types that SAS support.
SAS supports two data types, i.e., Character and Numeric. Dates are also considered characters despite the fact that there are implicit functions that can be performed on them.
13. What do you mean by the "+" operator and sum function?
In SAS, summation or addition is performed either with the “sum” function or by using the “+” operator. Function "Sum" returns the sum of arguments that are present (non-missing arguments), whereas "+" operator returns a missing value if one or more arguments are not present or missing.
Example: Consider a data set containing three variables a, b, and c.
data variabledata; input a b c; cards; 1 2 3 34 3 4 . 3 2 53 . 3 54 4 . 45 4 2 ; run;
There are missing values for all variables and we wish to compute the sum of all variables.
data sumofvariables; set variabledata; x=sum(a,b,c); y=a+b+c; run;
x y 6 6 41 41 5 . 56 . 58 . 51 51
The value of y is missing for the 3rd, 4th, and 5th observations in the output.
14. Explain _N_ and _ERROR_ in SAS.
In a SAS Data Step, there are two variables that are automatically created, namely, the _ERROR_ variable and the _N_ variable.
- _N_: Typically, this variable is used to keep track of the number of times a data step has been iterated. It is set to 1 by default. The variable _N_ increases every time the data step of a data statement is iterated.
- _ERROR_: The value is 0 by default and gives information about any errors that occur during execution. Whenever there is an error, such as an input data error, a math error, or a conversion error, the value is set to 1. This variable can be used to locate errors in data records and to display an error message in the SAS log.
15. What are different ways to exclude or include specific variables in a dataset?
DROP and KEEP statements can be used to exclude or include specific variables from a data set.
- Drop Statement: This instructs SAS which variables to remove from the data set.
- Keep Statement: The variables in the data set to be retained are specified using this statement.
Example: Consider the following data set:
DATA outdata; INPUT gender $ section score1 score2; DATALINES; F A 17 20 F B 25 17 F C 12 15 M D 21 25 ; proc print; run;
The following DROP statement instructs SAS to drop variables score1 and score2.
data readin; set outdata; totalsum = sum(score1,score2); drop score1, score2; run;
Gender Section totalsum F A 37 F B 42 F C 27 M D 46
The following KEEP statement instructs SAS to retain score1 in the data set.
data readin1; set readin; keep score1; run;
Gender Section score1 totalsum F A 17 37 F B 25 42 F C 12 27 M D 21 46
16. What are some common mistakes that people make while writing programs in SAS?
The following are some of the most common programming errors in SAS:
- If a semicolon is missing from a statement, SAS will misinterpret not only that statement but potentially several that follow.
- A number of errors will result from unclosed quotes and unclosed comments because SAS may fail to read the subsequent statements correctly.
- Data and procedure steps have very different functions in SAS, so statements that are valid in one will probably cause errors in the other.
- Data is not sorted before using a statement that requires a sort
- Submitted programs are not checked for log entries.
- The quotation marks are not matched.
- The dataset option is invalid or the statement option is invalid.
- Debugging techniques are not used.
SAS Interview Questions for Experienced
17. What do you mean by SAS Macros and why to use them?
Macro is a group of SAS statements (program) that automates repetitive tasks. With SAS's Macros feature, we can avoid repeating sections of code and use them again and again when needed without having to type them again and it increases readability also. Automation makes your work faster because you don't have to write the same lines of code every day. %MACRO and %MEND are the start and end statements of a macro program. These can be reused multiple times. The SAS program declares them at the beginning and then calls them out during the body of the program when needed.
Macro variables contain a value that will be used over and over again by SAS programs. With a maximum of 65534 characters, macro variables are one of SAS's most powerful tools. They can be either global or local in scope. The % Local macro variable is a variable that can be defined and accessed inside macro programs only. The %Global macro variable is defined in open code (outside of the macro program) and can be accessed from any SAS program running in the SAS environment.
Syntax: The local variables are declared in the following syntax.
In the following program, we have created the Macro variable in which we pass the parameters comma-separated and then we have written the Macro statement followed by the %MEND statement. After that, we have called the macro program by passing the parameters.
# Creating a Macro program. %MACRO <macro name>(Param1, Param2,....Paramn); Macro Statements; %MEND; # Calling a Macro program. %MacroName (Value1, Value2,.....Valuen);
18. Write different ways to create micro variables in SAS Programming?
The following are some ways to create macro variables:
- Call Symput
- Proc SQl into clause
- Macro Parameters
19. Explain how %Let and macro parameters can be used to create micro variables in SAS programming?
%LET: %Let is generally used to create micro variables and assign values to them. You can define it inside or outside a macro.
%LET macro-variable-name = value;
Any number, text or date can be entered in the Value field, depending on what the program requires.
How to use the Micro Variable?
Whenever referencing macro variables, an ampersand (&) is used followed by the macro variable name as shown below:
& <Macro variable Name>
Macro Parameters: Macros have variables called parameters whose values you set when you invoke the macro. The parameters are added to a macro by naming them in parenthesis in %macro.
%MACRO macro-name (parameter-1= , parameter-2= , ......parameter-n = ); Macro Statements; %MEND;
How to call a Macro?
To call/use micro variables, we use % followed by the macro variable name and then pass parameters.
20. Name some SAS system options that are used to debug SAS Micros.
There are a number of SAS System options that users can use to troubleshoot macro problems and issues. Macro-option results are automatically shown in the SAS Log.
- MEMRPT: Displays memory usage statistics in the SAS logs.
- MERROR: SAS will issue a warning if we attempt to invoke a macro that SAS does not recognize. Whenever there is a misspelling or if a macro is not defined, warning messages are displayed.
- MLOGIC: SAS prints details about the macro execution in its log. In short, it identifies and displays micro logic.
- MPRINT: When you execute a macro code, SAS doesn't show it in the LOG file, but when you use the MPRINT option it displays all the SAS statements of the resolved macro code. With the MPRINT option, one statement per line is printed along with the corrected macro code.
- SYMBOLGEN: It prints a message in the LOG file about how a macro variable is resolved. Specifically, a message is printed in the LOG whenever a macro variable is resolved.
21. State the difference between PROC MEANS and PROC SUMMARY.
Proc SUMMARY and Proc MEANS are essentially the same methods for calculating descriptive statistics, such as mean, count, sum, median, etc. Also, it is capable of calculating several other metrics such as percentiles, quartiles, variances, standard deviations, and t-tests. N, MIN, MAX, MEAN, and STD DEV are the default statistics produced by PROC MEANS.
- They differ mainly in the output type they produce by default. Unlike PROC SUMMARY, PROC MEANS by default prints output in the LISTING window or other open destination. When the print option is included in the Proc SUMMARY statement, the results will be printed to the output window.
- By default, PROC MEANS takes into account all of the numerical variables in the statistical analysis. PROC SUMMARY, on the other hand, takes into account all of the variables described in the VAR statement in the statistical analysis.
22. What do you mean by functions and procedures in SAS?
SAS Procedures: They process data in SAS data sets to create statistics, tables, reports, charts, and plots, as well as to perform other analyses and operations on the data. All types of statistical analysis can be performed using SAS procedures. Execution of a procedure is triggered by the keyword PROC, which starts the step. Here are some SAS PROCs:
- PROC SORT
- PROC MEAN
- PROC SQL
- PROC COMPARE
- PROC REPORT
- PROC FREQ, etc.
SAS Functions: There are many built-in functions in SAS that aid in the analysis and processing of data. You use them in DATA statements. Different functions take different numbers of arguments. Here is a list of SAS functions:
- COMPRESS(), etc.
23. Identify the error in the following code.
proc mixed data=SASHELP.IRIS plots=all; model petallength= /; class species; run;
Basically, it is a syntax error. In all cases, the MODEL statement must appear after the CLASS statement.
24. Explain what you mean by SYMGET and SYMPUT.
In a data step, SYMGET returns a macro variable's value. Conversely, the primary function of SYMPUT is to store the value of the data set in a macro variable.
Syntax of Symput:
CALL SYMPUT(macro-variable, value);
Syntax of SYMGET:
Example: In the following program we have created a macro variable and then we have used the symput function to put the value where our key is 'avar' and then we have used the symget function to get the micro variable value.
* Create a macro variable. data dataset; set sashelp.class; if _N_ = 1 then do; call symput('avar', name); end; run; %put &avar; * Get macro variable value in a dataset; data needit; var1=symget('avar'); run;
25. What is the importance of the Tranwrd function in SAS.
TRANRWD, when applied to a character string, replaces or eliminates all occurrences of a substring. By using TRANWRD, you can scan for words (or patterns of characters) and replace them with a second word (or pattern of characters).
TRANWRD(source, target, replacement)
- The source is a character constant, variable, or expression you wish to translate.
- The target is an expression, constant, or variable searched in the source.
- Replacement specifies an expression, constant, or variable that will replace target.
name : Mrs. Johny Lever
name=tranwrd(name, "Mrs.", "Ms.");
Result : Ms. Johny Lever
26. How do you specify the number of iterations and specific conditions within a single do loop?
The code below illustrates how to specify the number of iterations and specific conditions within a single do loop. The iterative DO statement executes the DO loop until the Sum is greater than or equal to 50000, or until the DO loop has executed 10 times, whichever comes first.
data Scaler; do i=1 to 50 until (Sum>=50000); Year+1; Sum+5000; Sum+Sum*.10; end; run;
27. Explain the usage of trailing @@.
Occasionally, multiple observations need to be created from a single record of raw data. In order to specify how SAS will read such a record, you can use the double trailing at-sign (@@ or "double trailing @"). By using a double trailing @@, SAS is told to "hold the line more strongly". A double trailing sign (@@) directs SAS not to advance to another input record, but to hold the current input record for the next input statement.
It is important to note that the single trailing @ does not hold an input record for subsequent Iterations of the data step. A trailing "@" indicates that an input record will only be held for this iteration of the data step (until the processing returns or gets back to the top of the data step), or that it will be passed to the next INPUT statement without a single trailing "@".
28. Explain different ways to remove duplicate values in SAS.
Below are two ways to delete duplicate values in SAS:
- The use of nodups in the procedure:
The NODUPRECS (or NODUPREC or NODUP) option of PROC SORT identifies observations with identical values for all columns and removes them from the output data set.
Proc sort data=SAS-Dataset nodups; By varname; run;
- The use of PROC SQL in the procedure:
PROC SQL can be used to remove duplicates. The DISTINCT keyword is used in the select clause to account for duplicate observations.
proc sql; create table New_dataset as select distinct * from Old_dataset where var=distinct(var); quit;
29. What do you mean by NODUP and NODUPKEY options and write difference between them?
PROC SORT in SAS enables the removal of duplicate values from a table primarily by utilizing two options:
NODUP vs NODUPKEY -
|Each variable in the data set can be compared with it.||NODUPKEY only compares the variables that are listed in the BY statement.|
|NODUP removes duplicate observations where the same values are repeated across all variables.||When NODUPKEY is selected, the duplicate observations are removed where the values of a variable listed in the BY statement are the same.|
30. Name the command used for sorting in SAS programs?
The PROC SORT command can be used to sort data in SAS. The command can be used for multiple variables within a program. It creates a new dataset with sorting and keeps the original dataset unchanged.
PROC SORT DATA=original OUT=Sorted; BY variable_name;
- Variable_name represents the column name on which sorting happens.
- Original represents the dataset name to be sorted.
- Sorted represents the dataset name after it is sorted.
31. Explain what is INPUT and INFILE Statement.
In SAS programming, using an INFILE statement identifies an external file containing the data, whereas using an INPUT statement describes the variables used.
Syntax of INFILE:
Syntax of INPUT:
INPUT 'varname1' 'varname2';
DATA readin INFILE Test; INPUT ID Gender Score; Run;
32. What do you mean by %Include and %Eval?
%Include: If you run a program containing the %INCLUDE statement, the SAS System executes any statements or data lines that you bring into the program. Statements are executed immediately.
%INCLUDE source(s) </<SOURCE2> <S2=length> <option-list> >;
- Source(s) specify the location of the information that you wish to access with the %INCLUDE statement.
- SOURCE2 causes the SAS log to show the source statements being used in your SAS program.
- S2=length specifies the length of the input record.
- Option-list specifies options that can be included in %INCLUDE.
%Eval: Integer arithmetic is used to evaluate arithmetic or logical expressions. %EVAL accepts only integers as operands in arithmetic expressions. Operands with floating-point values cannot be used in %EVAL arithmetic calculations. %SYSEVALF can be used in these cases.
Have you been preparing for a SAS interview and wondering how you can succeed? This useful guide can help you prepare for it. We've compiled a list of the top 30+ SAS interview questions and answers that you're likely to be asked during your interviews. The questions have been specifically designed to familiarize you with the type of questions you might encounter during the interview.
SAS MCQ Questions
Which of the following PROC statements is correct?
Is there a way to limit the variables written to output dataset in DATA STEP?
When there is a missing value in SAS, it should be coded as __.
SAS stands for ___ .
The other name for the Data Preparation stage of Knowledge Discovery Process is ___.
Reports and graphs are typically generated using which of the following steps?
Which of the following is not a SAS system option that is used to debug SAS Micros?
Which of the following PROC statements is used to remove duplicate values from a data set?
Which of the following is not a way to create micro variables in SAS Programming?
Which of the following is used to store the value of the data set in a macro variable?