SQL Server Integration Services, in short SSIS, is a Microsoft SQL Server 2005 component that came as an upgrade to the Data Transformation Services (DTS). SSIS is available as Standard and Enterprise editions of the Microsoft SQL Server. SSIS provides an integrated platform for building data workflow applications. It is primarily used as a fast and efficient data warehousing tool that takes care of data extraction, data transformation, and data loading that constitutes ETL (Extract-Transform-Load). It can also be used for automating SQL Server maintenance and updation of multidimensional cube data.
SSIS Interview Questions for Freshers
1. What is SSIS?
SSIS stands for SQL Server Integration Services. It is a part of the Microsoft SQL server and is a well-known integrated platform for performing enterprise-level data migration and transformation activities. The platform is used for solving complex problems like reading, loading, copying, or downloading large files, mining data, extracting relevant information, and managing SQL objects. It supports data sources like relational databases, XML, flat files, etc, and helps to load and send data to multiple destinations.
2. What is an SSIS task?
Tasks are units of work and are present in SSIS packages. We have different task types depending on the type of work that needs to be performed. Below are some of the most commonly used tasks present in SSIS:
- Execute SQLTask: Used for executing SQL statements in relational databases.
- Data Flow Task: Used for reading data from one or more data sources, transforming them, and loading it to various destinations as required.
- Analysis Services Processing Task: Used for processing objects of the tabular model or SSAS (SQL Server Analysis Services) cubes.
- FTP Tasks: Used for sending and receiving files from FTP Servers to our local folders.
- Script Task: Used for writing .Net code as per requirements.
- Send Mail Task: Used for sending emails for notifying users regarding status, progress, or outcomes of tasks.
- Execute Package Task: Used for calling different packages of the same project. It also has the feature of passing the value of variables to the package.
- Execute Process Task: Used for running application as a whole or as a batch script. It can also be used for opening software like Microsoft Word, Excel, etc, and for unzipping compressed files.
- File System Task: Used for performing file system manipulations like move files, rename them, delete them, change the location, etc.
- Custom Task: If all the existing tasks do not meet our requirements, then SSIS provides flexibility to create new custom tasks.
3. What are packages in SSIS?
SSIS packages are organized collections of various connections and elements related to data flow, event handlers, variables, control flow, configurations, or parameters that would be used for assembling and programmatically building graphical design tools. They can also be used for populating data from different sources which can be later used for standardizing and administrative purposes. They are created in BIDS - Business Intelligent Development Studio.
4. What do you understand by SSIS expressions?
SSIS Expressions are used for filtering information based on parameters and conditions to get desired information. They make lives easy by helping to work with conditions, loops, and dynamic connections. The expression could consist of symbols, literals, identifiers, operators, and functions. They are mainly used for updating properties dynamically at runtime. Once we have the expressions ready, an expression evaluator parses this expression for validating the rules of the expression.
The expressions are used for the below 2 cases:
- Here, the variable values are derived using other variables employing expressions.
- For validating any condition, precedence constraints are used.
- Used for setting properties dynamically at runtime by tasks and containers.
- Expressions are used on rows to get conditional split transformations evaluating to true or false.
- Helps to create a new column based on the result of applying expressions to get derived transformations on new columns.
5. Define Manifest file.
The manifest file is the utility having information that is useful for deploying packages using file system wizard and database of SQL server.
6. Differentiate between SSIS and Informatica.
|Maintenance and Administration||Easy||Difficult|
|Ease of Implementation||Easy||Moderate|
|Performance in a medium SQL server environment||Performs well||Performs well|
|Productivity||Moderate||High in large applications implementation|
|Auto Documentation||Not available||Available|
|Product Maturity||Younger||Was present much before to SSIS|
7. What is data transformation in SSIS?
Data transformation is the process of extracting required data from a data source and is the most critical SSIS step. Post extraction, the process aids in managing and transferring the data to a specific file destination. There are several rules implemented by this process for loading the extracted data to the destination target file. Based on this, the transformations are classified as:
8. Define SSIS Catalog. Is it possible to deploy user-defined packages in the catalog?
SSIS catalog is the database where all deployed packages are stored securely. This helps to control and handle the deployed packages effectively. The users can deploy their packages too. All the deployed packages are stored in the catalog as a centralized database.
9. What is SSIS Container?
SSIS containers are defined as a set of logically linked tasks that allows the management of the task scope effectively. Containers allow looping through tasks set until tasks are grouped logically or until specified criteria are met. Nesting of containers is also allowed and the containers are set in the Package designer section in the Control flow tab.
10. In SSIS, what are the variable types that can be created?
SSIS permits the creation of two variable types- global and task-specific variables. The scope of global variables is available for all tasks of a specified process. The task-specific variables have scope specific to the task.
11. Define SSIS Checkpoint.
SSIS checkpoint is a property that provides a point of restart from the failure point. When the checkpoint property is set to true, a checkpoint file is created which consists of all the relevant information required to run the package from the point of failure. If the package is run successfully, the file is deleted and is recreated again when the package runs. This is particularly useful as it avoids re-running the whole project when the execution fails.
12. Define Precedence Constraint.
Precedence Constraint is used for enabling the developers to define a set of tasks logically and define the order of execution. These are connectors used for connecting all the tasks.
13. What are SSIS Connection Managers?
Connection Managers are used for gathering data from various sources and sending it to a destination. It facilitates system connection by including information regarding server, data source, authentication details, database, etc.
14. How are SSIS Packages more advantageous than stored procedures?
SSIS has more advantages when dealing with complex transformation processes involving slow-changing dimensions. They are:
- SSIS is known for managing memory efficiently thereby resulting in much bigger improvements in performance when compared to stored procedures.
- SSIS supports GUI (Graphical User Interface) and helps developers develop complex transformations easily and in a reliable way.]
- SSIS helps in easier interaction with external sources which are particularly useful during data cleansing.
If the ETL (Extract, Transform, Load) process consists majorly of Extract and Load with very little Transform, then Stored Procedures work well because they do not deal with cursors.
15. Define conditional split transformations.
These are the transformations that are similar to the IF-ELSE condition of programming languages which checks for the condition and executes the process if the condition evaluates to true. Here, we can also specify the order of evaluation for the conditions and also provide the default value as the output.
16. What are Process Bytes in SSIS?
Process bytes represent the amount of memory currently used by the Integration services. This memory cannot be shared by other processes.
17. What are the disadvantages of SSIS?
There are some disadvantages while using SSIS. They are as follows:
- If we need to see the report of package execution, then we will have to install new software called Management Studio. SSIS does not have the option to publish to reporting services.
- SSIS utilizes high memory which can cause conflicts with memory utilization of SQL. This causes problems if there are requirements to run multiple packages in parallel.
- SSIS also uses high CPU resources which again results in problems while running multiple packages parallely. Proper CPU allocation needs to be ensured between SQL and SSIS to avoid either of them running slow.
18. How is error handling done in SSIS?
During the process of transferring millions of records from one source to a destination, the probability of errors always exists. It is very important to identify the records that are errored out and ensure the transmission of records is smooth with no errors to the destination. To identify the errors, the errors need to be logged. The components of SSIS - source, transformation, and the destination - all have had support to log errors to output by means of secondary pipe by providing means to define behaviors while errors occur.
The secondary pipe or the error flow can be connected to another transformation or a destination component depending on the requirement. The error log can contain details regarding the error columns, error descriptions, and error codes.
SSIS Interview Questions for Experienced
19. How is the deployment utility created in SSIS?
Deployment is nothing but a process of updating the state of the package from development mode to executable mode. Deployment in SSIS can be done by simply performing the right-click operation on the “Integration Services Project” and clicking on the build option. This would create a “package.dtsx” file inside the bin folder named “projectbin”. The utility helps in deploying packages at SQL server or as an executable file on a specified location.
Following are the steps followed to create a deployment utility:
- Go to Project and right-click on it.
- Double click on the properties.
- Choose the path location for deployment by selecting the box next to “True” under Create Deployment Utility.
- Save changes. Close this window.
- Right-click again on the project and select the build option. This creates a deployment folder in the project’s root within the BIN folder.
- The deployment folder will have a .manifest file.
- Double-clicking on the .manifest file and selecting the deploy option will perform package deployment on the SQL server.
20. Define Data Flow in SSIS.
Data flow is the data transferring process from the source to a specified donation. It defines the way how data or information is sent from one point to other. It consists of 3 main components:
SSIS gives control to the users to perform the data transformation process from source to destination utilizing various built-in transformation components. The components provided by the SSIS helps users in data cleaning and manipulation in the pipeline. For achieving this, the user needs to attach a data flow task to the package control flow which is later executed within the package by creating, ordering, and running the data flow. It is worthy to note that each data flow task has a separate data flow engine instance.
21. Difference between Merge Transformation and Union all transformations.
Merge Transformation: This does the task of merging data from 2 sources to single output. This is useful when we need to break the data flow path to a separate error path and once the error is handled, merge that back to the main data flow. To apply this transformation, the data should be sorted first and the metadata information for the 2 paths should be the same. The sorting can be done by applying sort transformation before the merge or by providing an ORDER BY clause in the source connection.
Union All Transformation: Union all transformation working is similar to merge transformation but the only difference is that this does not require sorted data. Data from different sources are transformed and combined into a single output.
22. Explain the types of SSIS containers.
The following containers are present in SSIS:
- Task Host Containers: Default container where all single tasks are available and are used for background scenes in SSIS. This is not present in the toolbox of Visual Studio and is assigned to the task implicitly. This container is also used to extend event handlers and variables.
Sequence Containers: These handle the flow of package subset and helps to divide packages into smaller pieces.
- These are displayed in the Control Flow tab. We can drag and drop containers from the toolbox in the design pane and then add a set of tasks into the container. Following things can be achieved using sequence containers:
- Group tasks for disabling a portion of the unused package.
- To narrow the variable scope to the container.
- Manage properties of tasks easily by setting container properties.
- Ensure multiple executions of tasks by making use of one method.
- Create transactions for data-related tasks and not package as a whole.
- Create event handlers such as sending mail in case things go wrong.
- The below example shows two tasks that need to be successfully executed before calling the third script task.
- These are displayed in the Control Flow tab. We can drag and drop containers from the toolbox in the design pane and then add a set of tasks into the container. Following things can be achieved using sequence containers:
- For Loop Containers: Helps to create a loop in the execution of packages. SSIS does this by initializing expression optionally and continues evaluation until expression becomes false. In the below example, We have the task name “Wait for file to arrive” which is inside For Loop Container, and the task is executed until the expression is evaluated to false. The Load File task is executed after the loop execution is complete.
- Foreach Containers: This comprises of powerful loop mechanism which enables the loop using object collection. As the collection is looped, the variable is assigned values from the collection which is used by connections or tasks outside or within the container.
23. What does the data profiling task do?
Data profiling is the process of examining information by comprehending its condition, readability, examples, nulls, or numbers. This is performed at the beginning of the advancement cycle for structuring the database blueprint and assessing the quality of the data.
24. How many lookup cache modes are present in SSIS?
There are 3 types of lookup cache modes. These are part of Lookup Transformation in SSIS.
- Full Cache Mode: This mode helps in querying the database before the data-flow task is executed. This constitutes a critical part of the pre-execution step of the data-flow task. In this mode, the entire data copy is copied from the table and stored into the lookup cache in SSIS.
- Partial Cache Mode: This mode helps in querying new rows in the database from different data sources. Here, the matched row data is cached into the SSIS lookup cache. If the lookup cache is full, the data is removed automatically from the cache based on the usage statistics of the existing rows to free up space for newly matched rows.
- No Cache Mode: Here, the data is not cached unless 2 subsequent sources come up with the same matched rows. Database queries are run to get matched data every time from the source.
25. Is it possible to log SSIS execution?
Yes. SSIS supports logging which helps the system in analyzing and writing log entries during any run-time events. It also supports logging of custom messages, which the developer has to manually enable. Since the SSIS deals with a variety of data sources, logging is supported in multiple formats such as logging information to text files, XML files, Windows Log, etc. The scope of the logs is packages. They are maintained at the package level which means that any task that runs from the package can log information to other package logs.
26. Is it possible to schedule packages for a specific time period of a day?
Yes. This can be done by designing SQL Server Agent Job with work steps similar to the SSIS package. This job fetches the dtexec order utility to execute the bundle. This created bundle can be run or requested based on the requirement or can be rescheduled based on the recurring requirement.
27. What is SSIS breakpoint?
An SSIS breakpoint is a property that allows developers to debug and review the variable or data status at any point by pausing the execution of the package in the project. The breakpoints are configured in Business Intelligence Development Studio, generally known as BIDS. The following steps are followed to apply or remove SSIS breakpoints:
- Go to the control flow interface within the BIDS section.
- Right-click on the object that we want to apply or remove the breakpoint.
- Select “Edit Breakpoint” and enable/disable the breakpoint.
28. What are the components that would be used to send data from the access database to the SQL database on the off chance?
We will be using the following three components:
- One OLE DB Source
- Information Conversion Change
- One SQL Server Goal
OLE DB source is used to get information from the Access database source. Information conversion change is used in case of any datatype anomaly due to data contrast between source and destination databases. SQL Server Goal is utilized when we have the bundle run from the same machine and our database server is put on at the off chance. It is used for improving the SQL goal.
29. What is SSIS event logging property?
This property gives us the flexibility to log or not to log the events while running the components - tasks and packages - in SSIS.
30. Explain the importance of config files in SSIS.
The config file is used for providing inputs to the connection manager about the properties used by the packages and tasks at the run-time. This is especially useful when the changes are deployed on multiple locations or servers as we do not have to worry every time package runs about configuration. Depending on the server or location, the config files are automatically picked and used.
We can store config properties in multiple ways. Some of them are:
- XML configuration file: Data can be stored as an XML file.
- Environment variables: The config data can be stores as part of environment variables.
- Registry entry: The config can be stored in the registry.
- Parent package variable: The config can be stored as a variable in a package of a task.
- SQL Server: The config can be stored in a database table on the SQL Server.
31. How do you store SQL passwords? Does the SSIS connection manager of the package store SQL password?
The SSIS connection manager has issues in storing the password. Hence we can follow the below-proposed solutions:
Method 1: Create and use SQL Server Agent Proxy account which uses credential that makes SQL Server Agent capable of the running job as an account having required permissions to run the job or as the account which created the package.
- This method is used for decrypting secrets and key requirements of the user are satisfied.
- There are chances that this method could fail because the secrets are decrypted by taking the current user and current computer information which will inadvertently fail if the project or package is moved to a different computer.
- Method 2: Set the value of Package Protection Level Property to Server Storage. This stores packages on the SQL Server and lets users access this through SQL Server database roles.
- Method 3: Set the value of Package Protection Level property as “Encrypt Sensitive with Password”. The method uses a password for encrypting sensitive information. This can be added to the SQL Server Agent job to run the job effectively.
- Method 4: Package configuration can be used to save sensitive information which can later be stored in a secured folder. When the package is run, the information is obtained from this config file. If we do not want to save the secrets in the package, we can set the property of Protection Level to “DontSaveSensitive”.
- Method 5: Package templates having a good protection level can be created for long-term solutions.
32. How will you add a recordset variable inside Script Task?
Variables are used in script tasks to facilitate data exchange with other objects of the package. Script task makes use of the property called Variables of Dts object for reading and writing to variable objects of the package. To make the variables available to the custom script, we need to add them to read-only or read-write variable lists. The variable names are case-sensitive and care has to be taken while adding the variables to the list. We can use the Value property for reading and writing data to the variables.
33. What will you do if a package runs fine in BIDS but fails while running from the SQL agent job?
A package that runs successfully in BIDS and fails from the SQL agent job can mean that the SQL Agent jobs do not have the necessary permission for some connections of the package. We can either create a proxy account that allows SQL Agent jobs to run the package or elevate the current account permissions.
SSIS is a platform for integration services that provide automated tasks and various transformation components for cleaning, transforming, and validating data throughout the process of data loading from source to destination. In this article, we have seen what are the most commonly asked SSIS interview questions and the answers to them.
References and Resources:
What among the below options is a best practice while working on SSIS?
What is the indication if the SSIS cannot drive 100% CPU load?
What is the key counter for SQL Server and Integration Services?
Which among the below options indicate SSIS package design limitation?
Which of the below elements are included in a base package?
What kind of transformation you would use if there is a need to convey information from one type to other?
What kind of transformation you would use if there is a need to create manipulation of multiple columns into a brand new column?
What kind of transformation you would use if there is a need to add metadata of package and tasks?
What kind of transformation you would use if there is a need to standardise information sources to reduce anomalies by transforming rows into columns?
What kind of transformation you would use if there is a need to perform ranking-supported formal logic and pattern matching?