Practice
Resources
Contests
Online IDE
New
Free Mock
Events New Scaler
Practice
Improve your coding skills with our resources
Contests
Compete in popular contests with top coders
logo
Events
Attend free live masterclass hosted by top tech professionals
New
Scaler
Explore Offerings by SCALER
exit-intent-icon

Download Interview guide PDF

Before you leave, take this SQL Interview Questions interview guide with you.
Get a Free Personalized Career Roadmap
Answer 4 simple questions about you and get a path to a lucrative career
expand-icon Expand in New Tab
/ Interview Guides / SQL Interview Questions

SQL Interview Questions

Last Updated: Jan 25, 2026

Download PDF


Your requested download is ready!
Click here to download.
Learn via Video Course
SQL for Beginners: Learn SQL using MySQL and Database Design Course
By Prateek Narang
Popular
Certificate included
About the Speaker
What will you Learn?
Register Now
Learn via Video Course
SQL for Beginners: Learn SQL using MySQL and Database Design Course
By Prateek Narang
Popular
ai interview ai interview

Are you preparing for your SQL developer interview?

Then you have come to the right place.

This guide will help you to brush up on your SQL skills, regain your confidence and be job-ready!

Here, you will find a collection of real-world Interview questions asked in companies like Google, Oracle, Amazon, and Microsoft, etc. Each question comes with a perfectly written answer inline, saving your interview preparation time.

It also covers practice problems to help you understand the basic concepts of SQL.

We've divided this article into the following sections:

In the end, multiple-choice questions are provided to test your understanding.

SQL Interview Questions

1. What is a Cross-Join?

Cross join can be defined as a cartesian product of the two tables included in the join. The table after join contains the same number of rows as in the cross-product of the number of rows in the two tables. If a WHERE clause is used in cross join then the query will work like an INNER JOIN.

SELECT stu.name, sub.subject 
FROM students AS stu
CROSS JOIN subjects AS sub;
Create a free personalised study plan Create a FREE custom study plan
Get into your dream companies with expert guidance
Get into your dream companies with expert..
Real-Life Problems
Prep for Target Roles
Custom Plan Duration
Flexible Plans

2. List the different types of relationships in SQL.

  • One-to-One - This can be defined as the relationship between two tables where each record in one table is associated with the maximum of one record in the other table.
  • One-to-Many & Many-to-One - This is the most commonly used relationship where a record in a table is associated with multiple records in the other table.
  • Many-to-Many - This is used in cases when multiple instances on both sides are needed for defining a relationship.
  • Self-Referencing Relationships - This is used when a table needs to define a relationship with itself.

3. What are Entities and Relationships?

Entity: An entity can be a real-world object, either tangible or intangible, that can be easily identifiable. For example, in a college database, students, professors, workers, departments, and projects can be referred to as entities. Each entity has some associated properties that provide it an identity.

Relationships: Relations or links between entities that have something to do with each other. For example - The employee's table in a company's database can be associated with the salary table in the same database.

You can download a PDF version of Sql Interview Questions.

Download PDF


Your requested download is ready!
Click here to download.

4. What is Cursor? How to use a Cursor?

A database cursor is a control structure that allows for the traversal of records in a database. Cursors, in addition, facilitates processing after traversal, such as retrieval, addition, and deletion of database records. They can be viewed as a pointer to one row in a set of rows.

Working with SQL Cursor:

  1. DECLARE a cursor after any variable declaration. The cursor declaration must always be associated with a SELECT Statement.
  2. Open cursor to initialize the result set. The OPEN statement must be called before fetching rows from the result set.
  3. FETCH statement to retrieve and move to the next row in the result set.
  4. Call the CLOSE statement to deactivate the cursor.
  5. Finally use the DEALLOCATE statement to delete the cursor definition and release the associated resources.
DECLARE @name VARCHAR(50)   /* Declare All Required Variables */
DECLARE db_cursor CURSOR FOR   /* Declare Cursor Name*/
SELECT name
FROM myDB.students
WHERE parent_name IN ('Sara', 'Ansh')
OPEN db_cursor   /* Open cursor and Fetch data into @name */ 
FETCH next
FROM db_cursor
INTO @name
CLOSE db_cursor   /* Close the cursor and deallocate the resources */
DEALLOCATE db_cursor

5. What are UNION, MINUS and INTERSECT commands?

The UNION operator combines and returns the result-set retrieved by two or more SELECT statements.
The MINUS operator in SQL is used to remove duplicates from the result-set obtained by the second SELECT query from the result-set obtained by the first SELECT query and then return the filtered results from the first.
The INTERSECT clause in SQL combines the result-set fetched by the two SELECT statements where records from one match the other and then returns this intersection of result-sets.

Certain conditions need to be met before executing either of the above statements in SQL -

  • Each SELECT statement within the clause must have the same number of columns
  • The columns must also have similar data types
  • The columns in each SELECT statement should necessarily have the same order
SELECT name FROM Students   /* Fetch the union of queries */
UNION
SELECT name FROM Contacts;
SELECT name FROM Students   /* Fetch the union of queries with duplicates*/
UNION ALL
SELECT name FROM Contacts;
SELECT name FROM Students   /* Fetch names from students */
MINUS     /* that aren't present in contacts */
SELECT name FROM Contacts;
SELECT name FROM Students   /* Fetch names from students */
INTERSECT    /* that are present in contacts as well */
SELECT name FROM Contacts;

Learn via our Video Courses

6. What are some common clauses used with SELECT query in SQL?

Some common SQL clauses used in conjuction with a SELECT query are as follows:

  • WHERE clause in SQL is used to filter records that are necessary, based on specific conditions.
  • ORDER BY clause in SQL is used to sort the records based on some field(s) in ascending (ASC) or descending order (DESC).
SELECT *
FROM myDB.students
WHERE graduation_year = 2019
ORDER BY studentID DESC;
  • GROUP BY clause in SQL is used to group records with identical data and can be used in conjunction with some aggregation functions to produce summarized results from the database.
  • HAVING clause in SQL is used to filter records in combination with the GROUP BY clause. It is different from WHERE, since the WHERE clause cannot filter aggregated records.
SELECT COUNT(studentId), country
FROM myDB.students
WHERE country != "INDIA"
GROUP BY country
HAVING COUNT(studentID) > 5;

7. What is the SELECT statement?

SELECT operator in SQL is used to select data from a database. The data returned is stored in a result table, called the result-set.

SELECT * FROM myDB.students;
Advance your career with   Mock Assessments Refine your coding skills with Mock Assessments
Real-world coding challenges for top company interviews
Real-world coding challenges for top companies
Real-Life Problems
Detailed reports

8. What is a Subquery? What are its types?

A subquery is a query within another query, also known as a nested query or inner query. It is used to restrict or enhance the data to be queried by the main query, thus restricting or enhancing the output of the main query respectively. For example, here we fetch the contact information for students who have enrolled for the maths subject:

SELECT name, email, mob, address
FROM myDb.contacts
WHERE roll_no IN (
 SELECT roll_no
 FROM myDb.students
 WHERE subject = 'Maths');

There are two types of subquery - Correlated and Non-Correlated.

  • A correlated subquery cannot be considered as an independent query, but it can refer to the column in a table listed in the FROM of the main query.
  • A non-correlated subquery can be considered as an independent query and the output of the subquery is substituted in the main query.

9. What is a Query?

A query is a request for data or information from a database table or combination of tables. A database query can be either a select query or an action query.

SELECT fname, lname    /* select query */
FROM myDb.students
WHERE student_id = 1;
UPDATE myDB.students    /* action query */
SET fname = 'Captain', lname = 'America'
WHERE student_id = 1;

10. What is Data Integrity?

Data Integrity is the assurance of accuracy and consistency of data over its entire life-cycle and is a critical aspect of the design, implementation, and usage of any system which stores, processes, or retrieves data. It also defines integrity constraints to enforce business rules on the data when it is entered into an application or a database.

11. What is the difference between Clustered and Non-clustered index?

As explained above, the differences can be broken down into three small factors -

  • Clustered index modifies the way records are stored in a database based on the indexed column. A non-clustered index creates a separate entity within the table which references the original table.
  • Clustered index is used for easy and speedy retrieval of data from the database, whereas, fetching records from the non-clustered index is relatively slower.
  • In SQL, a table can have a single clustered index whereas it can have multiple non-clustered indexes.

12. What is an Index? Explain its different types.

A database index is a data structure that provides a quick lookup of data in a column or columns of a table. It enhances the speed of operations accessing data from a database table at the cost of additional writes and memory to maintain the index data structure.

CREATE INDEX index_name   /* Create Index */
ON table_name (column_1, column_2);
DROP INDEX index_name;   /* Drop Index */

There are different types of indexes that can be created for different purposes:

  • Unique and Non-Unique Index:

Unique indexes are indexes that help maintain data integrity by ensuring that no two rows of data in a table have identical key values. Once a unique index has been defined for a table, uniqueness is enforced whenever keys are added or changed within the index.

CREATE UNIQUE INDEX myIndex
ON students (enroll_no);

Non-unique indexes, on the other hand, are not used to enforce constraints on the tables with which they are associated. Instead, non-unique indexes are used solely to improve query performance by maintaining a sorted order of data values that are used frequently.

  • Clustered and Non-Clustered Index:

Clustered indexes are indexes whose order of the rows in the database corresponds to the order of the rows in the index. This is why only one clustered index can exist in a given table, whereas, multiple non-clustered indexes can exist in the table.

The only difference between clustered and non-clustered indexes is that the database manager attempts to keep the data in the database in the same order as the corresponding keys appear in the clustered index.

Clustering indexes can improve the performance of most query operations because they provide a linear-access path to data stored in the database.

13. What is an Alias in SQL?

An alias is a feature of SQL that is supported by most, if not all, RDBMSs. It is a temporary name assigned to the table or table column for the purpose of a particular SQL query. In addition, aliasing can be employed as an obfuscation technique to secure the real names of database fields. A table alias is also called a correlation name.

An alias is represented explicitly by the AS keyword but in some cases, the same can be performed without it as well. Nevertheless, using the AS keyword is always a good practice.

SELECT A.emp_name AS "Employee"  /* Alias using AS keyword */
B.emp_name AS "Supervisor"
FROM employee A, employee B   /* Alias without AS keyword */
WHERE A.emp_sup = B.emp_id;

14. What is a Self-Join?

A self JOIN is a case of regular join where a table is joined to itself based on some relation between its own column(s). Self-join uses the INNER JOIN or LEFT JOIN clause and a table alias is used to assign different names to the table within the query.

SELECT A.emp_id AS "Emp_ID",A.emp_name AS "Employee",
B.emp_id AS "Sup_ID",B.emp_name AS "Supervisor"
FROM employee A, employee B
WHERE A.emp_sup = B.emp_id;

15. What is a Join? List its different types.

The SQL Join clause is used to combine records (rows) from two or more tables in a SQL database based on a related column between the two.

There are four different types of JOINs in SQL:

  • (INNER) JOIN: Retrieves records that have matching values in both tables involved in the join. This is the widely used join for queries.
SELECT *
FROM Table_A
JOIN Table_B;
SELECT *
FROM Table_A
INNER JOIN Table_B;
  • LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.
SELECT *
FROM Table_A A
LEFT JOIN Table_B B
ON A.col = B.col;
  • RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.
SELECT *
FROM Table_A A
RIGHT JOIN Table_B B
ON A.col = B.col;
  • FULL (OUTER) JOIN: Retrieves all the records where there is a match in either the left or right table.
SELECT *
FROM Table_A A
FULL JOIN Table_B B
ON A.col = B.col;

16. What is a Foreign Key?

A FOREIGN KEY comprises of single or collection of fields in a table that essentially refers to the PRIMARY KEY in another table. Foreign key constraint ensures referential integrity in the relation between two tables.
The table with the foreign key constraint is labeled as the child table, and the table containing the candidate key is labeled as the referenced or parent table.

CREATE TABLE Students (   /* Create table with foreign key - Way 1 */
   ID INT NOT NULL
   Name VARCHAR(255)
   LibraryID INT
   PRIMARY KEY (ID)
   FOREIGN KEY (Library_ID) REFERENCES Library(LibraryID)
);

CREATE TABLE Students (   /* Create table with foreign key - Way 2 */
   ID INT NOT NULL PRIMARY KEY
   Name VARCHAR(255)
   LibraryID INT FOREIGN KEY (Library_ID) REFERENCES Library(LibraryID)
);

ALTER TABLE Students   /* Add a new foreign key */
ADD FOREIGN KEY (LibraryID)
REFERENCES Library (LibraryID);

17. What is a UNIQUE constraint?

A UNIQUE constraint ensures that all values in a column are different. This provides uniqueness for the column(s) and helps identify each row uniquely. Unlike primary key, there can be multiple unique constraints defined per table. The code syntax for UNIQUE is quite similar to that of PRIMARY KEY and can be used interchangeably.

CREATE TABLE Students (   /* Create table with a single field as unique */
   ID INT NOT NULL UNIQUE
   Name VARCHAR(255)
);

CREATE TABLE Students (   /* Create table with multiple fields as unique */
   ID INT NOT NULL
   LastName VARCHAR(255)
   FirstName VARCHAR(255) NOT NULL
   CONSTRAINT PK_Student
   UNIQUE (ID, FirstName)
);

ALTER TABLE Students   /* Set a column as unique */
ADD UNIQUE (ID);
ALTER TABLE Students   /* Set multiple columns as unique */
ADD CONSTRAINT PK_Student   /* Naming a unique constraint */
UNIQUE (ID, FirstName);

18. What is a Primary Key?

The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint.
A table in SQL is strictly restricted to have one and only one primary key, which is comprised of single or multiple fields (columns).

CREATE TABLE Students (   /* Create table with a single field as primary key */
   ID INT NOT NULL
   Name VARCHAR(255)
   PRIMARY KEY (ID)
);

CREATE TABLE Students (   /* Create table with multiple fields as primary key */
   ID INT NOT NULL
   LastName VARCHAR(255)
   FirstName VARCHAR(255) NOT NULL,
   CONSTRAINT PK_Student
   PRIMARY KEY (ID, FirstName)
);

ALTER TABLE Students   /* Set a column as primary key */
ADD PRIMARY KEY (ID);
ALTER TABLE Students   /* Set multiple columns as primary key */
ADD CONSTRAINT PK_Student   /*Naming a Primary Key*/
PRIMARY KEY (ID, FirstName);

19. What are Constraints in SQL?

Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during the creation of the table or after creating using the ALTER TABLE command. The constraints are:

  • NOT NULL - Restricts NULL value from being inserted into a column.
  • CHECK - Verifies that all values in a field satisfy a condition.
  • DEFAULT - Automatically assigns a default value if no value has been specified for the field.
  • UNIQUE - Ensures unique values to be inserted into the field.
  • INDEX - Indexes a field providing faster retrieval of records.
  • PRIMARY KEY - Uniquely identifies each record in a table.
  • FOREIGN KEY - Ensures referential integrity for a record in another table.

20. What are Tables and Fields?

A table is an organized collection of data stored in the form of rows and columns. Columns can be categorized as vertical and rows as horizontal. The columns in a table are called fields while the rows can be referred to as records.

21. What is the difference between SQL and MySQL?

SQL is a standard language for retrieving and manipulating structured databases. On the contrary, MySQL is a relational database management system, like SQL Server, Oracle or IBM DB2, that is used to manage SQL databases.

22. What is SQL?

SQL stands for Structured Query Language. It is the standard language for relational database management systems. It is especially useful in handling organized data comprised of entities (variables) and relations between different entities of the data.

23. What is RDBMS? How is it different from DBMS?

RDBMS stands for Relational Database Management System. The key difference here, compared to DBMS, is that RDBMS stores data in the form of a collection of tables, and relations can be defined between the common fields of these tables. Most modern database management systems like MySQL, Microsoft SQL Server, Oracle, IBM DB2, and Amazon Redshift are based on RDBMS.

24. What is DBMS?

DBMS stands for Database Management System. DBMS is a system software responsible for the creation, retrieval, updation, and management of the database. It ensures that our data is consistent, organized, and is easily accessible by serving as an interface between the database and its end-users or application software.

25. What is Collation? What are the different types of Collation Sensitivity?

Collation refers to a set of rules that determine how data is sorted and compared. Rules defining the correct character sequence are used to sort the character data. It incorporates options for specifying case sensitivity, accent marks, kana character types, and character width. Below are the different types of collation sensitivity:

  • Case sensitivity: A and a are treated differently.
  • Accent sensitivity: a and á are treated differently.
  • Kana sensitivity: Japanese kana characters Hiragana and Katakana are treated differently.
  • Width sensitivity: Same character represented in single-byte (half-width) and double-byte (full-width) are treated differently.

26. What is Pattern Matching in SQL?

SQL pattern matching provides for pattern search in data if you have no clue as to what that word should be. This kind of SQL query uses wildcards to match a string pattern, rather than writing the exact word. The LIKE operator is used in conjunction with SQL Wildcards to fetch the required information.

  • Using the % wildcard to perform a simple search

The % wildcard matches zero or more characters of any type and can be used to define wildcards both before and after the pattern. Search a student in your database with first name beginning with the letter K:

SELECT *
FROM students
WHERE first_name LIKE 'K%'
  • Omitting the patterns using the NOT keyword

Use the NOT keyword to select records that don't match the pattern. This query returns all students whose first name does not begin with K.

SELECT *
FROM students
WHERE first_name NOT LIKE 'K%'
  • Matching a pattern anywhere using the % wildcard twice

Search for a student in the database where he/she has a K in his/her first name.

SELECT *
FROM students
WHERE first_name LIKE '%Q%'
  • Using the _ wildcard to match pattern at a specific position

The _ wildcard matches exactly one character of any type. It can be used in conjunction with % wildcard. This query fetches all students with letter K at the third position in their first name.

SELECT *
FROM students
WHERE first_name LIKE '__K%'
  • Matching patterns for a specific length

The _ wildcard plays an important role as a limitation when it matches exactly one character. It limits the length and position of the matched results. For example - 

SELECT *   /* Matches first names with three or more letters */
FROM students
WHERE first_name LIKE '___%'

SELECT *   /* Matches first names with exactly four characters */
FROM students
WHERE first_name LIKE '____'

27. How to create empty tables with the same structure as another table?

Creating empty tables with the same structure can be done smartly by fetching the records of one table into a new table using the INTO operator while fixing a WHERE clause to be false for all records. Hence, SQL prepares the new table with a duplicate structure to accept the fetched records but since no records get fetched due to the WHERE clause in action, nothing is inserted into the new table.

SELECT * INTO Students_copy
FROM Students WHERE 1 = 2;

28. What is a Recursive Stored Procedure?

A stored procedure that calls itself until a boundary condition is reached, is called a recursive stored procedure. This recursive function helps the programmers to deploy the same set of code several times as and when required. Some SQL programming languages limit the recursion depth to prevent an infinite loop of procedure calls from causing a stack overflow, which slows down the system and may lead to system crashes.

DELIMITER $$     /* Set a new delimiter => $$ */
CREATE PROCEDURE calctotal( /* Create the procedure */
   IN number INT,   /* Set Input and Ouput variables */
   OUT total INT
) BEGIN
DECLARE score INT DEFAULT NULL;   /* Set the default value => "score" */
SELECT awards FROM achievements   /* Update "score" via SELECT query */
WHERE id = number INTO score;
IF score IS NULL THEN SET total = 0;   /* Termination condition */
ELSE
CALL calctotal(number+1);   /* Recursive call */
SET total = total + score;   /* Action after recursion */
END IF;
END $$     /* End of procedure */
DELIMITER ;     /* Reset the delimiter */

29. What is a Stored Procedure?

A stored procedure is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data dictionary. The sole disadvantage of stored procedure is that it can be executed nowhere except in the database and occupies more memory in the database server. It also provides a sense of security and functionality as users who can't access the data directly can be granted access via stored procedures.

DELIMITER $$
CREATE PROCEDURE FetchAllStudents()
BEGIN
SELECT *  FROM myDB.students;
END $$
DELIMITER ;

30. What is Database?

A database is an organized collection of data, stored and retrieved digitally from a remote or local computer system. Databases can be vast and complex, and such databases are developed using fixed design and modeling approaches.

31. What are the differences between OLTP and OLAP?

OLTP stands for Online Transaction Processing, is a class of software applications capable of supporting transaction-oriented programs. An important attribute of an OLTP system is its ability to maintain concurrency. OLTP systems often follow a decentralized architecture to avoid single points of failure. These systems are generally designed for a large audience of end-users who conduct short transactions. Queries involved in such databases are generally simple, need fast response times, and return relatively few records. A number of transactions per second acts as an effective measure for such systems.

OLAP stands for Online Analytical Processing, a class of software programs that are characterized by the relatively low frequency of online transactions. Queries are often too complex and involve a bunch of aggregations. For OLAP systems, the effectiveness measure relies highly on response time. Such systems are widely used for data mining or maintaining aggregated, historical data, usually in multi-dimensional schemas.

32. What is OLTP?

OLTP stands for Online Transaction Processing, is a class of software applications capable of supporting transaction-oriented programs. An essential attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure, OLTP systems are often decentralized. These systems are usually designed for a large number of users who conduct short transactions. Database queries are usually simple, require sub-second response times, and return relatively few records. Here is an insight into the working of an OLTP system [ Note - The figure is not important for interviews ] -

33. What is User-defined function? What are its various types?

The user-defined functions in SQL are like functions in any other programming language that accept parameters, perform complex calculations, and return a value. They are written to use the logic repetitively whenever required. There are two types of SQL user-defined functions:

  • Scalar Function: As explained earlier, user-defined scalar functions return a single scalar value.
  • Table-Valued Functions: User-defined table-valued functions return a table as output.
    • Inline: returns a table data type based on a single SELECT statement.
    • Multi-statement: returns a tabular result-set but, unlike inline, multiple SELECT statements can be used inside the function body.

34. What are Aggregate and Scalar functions?

An aggregate function performs operations on a collection of values to return a single scalar value. Aggregate functions are often used with the GROUP BY and HAVING clauses of the SELECT statement. Following are the widely used SQL aggregate functions:

  • AVG() - Calculates the mean of a collection of values.
  • COUNT() - Counts the total number of records in a specific table or view.
  • MIN() - Calculates the minimum of a collection of values.
  • MAX() - Calculates the maximum of a collection of values.
  • SUM() - Calculates the sum of a collection of values.
  • FIRST() - Fetches the first element in a collection of values.
  • LAST() - Fetches the last element in a collection of values.

Note: All aggregate functions described above ignore NULL values except for the COUNT function.

A scalar function returns a single value based on the input value. Following are the widely used SQL scalar functions:

  • LEN() - Calculates the total length of the given field (column).
  • UCASE() - Converts a collection of string values to uppercase characters.
  • LCASE() - Converts a collection of string values to lowercase characters.
  • MID() - Extracts substrings from a collection of string values in a table.
  • CONCAT() - Concatenates two or more strings.
  • RAND() - Generates a random collection of numbers of a given length.
  • ROUND() - Calculates the round-off integer value for a numeric field (or decimal point values).
  • NOW() - Returns the current date & time.
  • FORMAT() - Sets the format to display a collection of values.

35. What is the difference between DELETE and TRUNCATE statements?

The TRUNCATE command is used to delete all the rows from the table and free the space containing the table.
The DELETE command deletes only the rows from the table based on the condition given in the where clause or deletes all the rows from the table if no condition is specified. But it does not free the space containing the table.

36. What is the difference between DROP and TRUNCATE statements?

If a table is dropped, all things associated with the tables are dropped as well. This includes - the relationships defined on the table with other tables, the integrity checks and constraints, access privileges and other grants that the table has. To create and use the table again in its original form, all these relations, checks, constraints, privileges and relationships need to be redefined. However, if a table is truncated, none of the above problems exist and the table retains its original structure.

37. What are the TRUNCATE, DELETE and DROP statements?

DELETE statement is used to delete rows from a table.

DELETE FROM Candidates
WHERE CandidateId > 1000;

TRUNCATE command is used to delete all the rows from the table and free the space containing the table.

TRUNCATE TABLE Candidates;

DROP command is used to remove an object from the database. If you drop a table, all the rows in the table are deleted and the table structure is removed from the database.

DROP TABLE Candidates;

38. What are the various forms of Normalization?

Normal Forms are used to eliminate or reduce redundancy in database tables. The different forms are as follows:

  • First Normal Form:
    A relation is in first normal form if every attribute in that relation is a single-valued attribute. If a relation contains a composite or multi-valued attribute, it violates the first normal form. Let's consider the following students table. Each student in the table, has a name, his/her address, and the books they issued from the public library -

Students Table

Student  Address  Books Issued  Salutation
Sara  Amanora Park Town 94  Until the Day I Die (Emily Carpenter), Inception (Christopher Nolan) Ms.
Ansh 62nd Sector A-10  The Alchemist (Paulo Coelho), Inferno (Dan Brown)  Mr.
Sara  24th Street Park Avenue  Beautiful Bad (Annie Ward), Woman 99 (Greer Macallister) Mrs.
Ansh  Windsor Street 777  Dracula (Bram Stoker) Mr.

As we can observe, the Books Issued field has more than one value per record, and to convert it into 1NF, this has to be resolved into separate individual records for each book issued. Check the following table in 1NF form -

Students Table (1st Normal Form)

Student  Address  Books Issued  Salutation
Sara Amanora Park Town 94 Until the Day I Die (Emily Carpenter)  Ms.
Sara Amanora Park Town 94 Inception (Christopher Nolan)  Ms.
Ansh 62nd Sector A-10 The Alchemist (Paulo Coelho)  Mr.
Ansh 62nd Sector A-10 Inferno (Dan Brown)  Mr.
Sara 24th Street Park Avenue Beautiful Bad (Annie Ward)  Mrs.
Sara 24th Street Park Avenue Woman 99 (Greer Macallister)  Mrs.
Ansh Windsor Street 777 Dracula (Bram Stoker)  Mr.
  • Second Normal Form:

A relation is in second normal form if it satisfies the conditions for the first normal form and does not contain any partial dependency. A relation in 2NF has no partial dependency, i.e., it has no non-prime attribute that depends on any proper subset of any candidate key of the table. Often, specifying a single column Primary Key is the solution to the problem. Examples -

Example 1 - Consider the above example. As we can observe, the Students Table in the 1NF form has a candidate key in the form of [Student, Address] that can uniquely identify all records in the table. The field Books Issued (non-prime attribute) depends partially on the Student field. Hence, the table is not in 2NF. To convert it into the 2nd Normal Form, we will partition the tables into two while specifying a new Primary Key attribute to identify the individual records in the Students table. The Foreign Key constraint will be set on the other table to ensure referential integrity.

Students Table (2nd Normal Form)

Student_ID  Student Address  Salutation
1 Sara Amanora Park Town 94  Ms.
2 Ansh 62nd Sector A-10  Mr.
3 Sara 24th Street Park Avenue  Mrs.
4 Ansh Windsor Street 777  Mr.

Books Table (2nd Normal Form)

Student_ID  Book Issued
1 Until the Day I Die (Emily Carpenter)
1 Inception (Christopher Nolan)
2 The Alchemist (Paulo Coelho)
2 Inferno (Dan Brown)
3 Beautiful Bad (Annie Ward)
3 Woman 99 (Greer Macallister)
4 Dracula (Bram Stoker)

Example 2 - Consider the following dependencies in relation to R(W,X,Y,Z)

 WX -> Y    [W and X together determine Y] 
 XY -> Z    [X and Y together determine Z] 

Here, WX is the only candidate key and there is no partial dependency, i.e., any proper subset of WX doesn’t determine any non-prime attribute in the relation.

  • Third Normal Form

A relation is said to be in the third normal form, if it satisfies the conditions for the second normal form and there is no transitive dependency between the non-prime attributes, i.e., all non-prime attributes are determined only by the candidate keys of the relation and not by any other non-prime attribute.

Example 1 - Consider the Students Table in the above example. As we can observe, the Students Table in the 2NF form has a single candidate key Student_ID (primary key) that can uniquely identify all records in the table. The field Salutation (non-prime attribute), however, depends on the Student Field rather than the candidate key. Hence, the table is not in 3NF. To convert it into the 3rd Normal Form, we will once again partition the tables into two while specifying a new Foreign Key constraint to identify the salutations for individual records in the Students table. The Primary Key constraint for the same will be set on the Salutations table to identify each record uniquely.

Students Table (3rd Normal Form)

Student_ID  Student  Address  Salutation_ID
1 Sara Amanora Park Town 94  1
2 Ansh 62nd Sector A-10  2
3 Sara 24th Street Park Avenue  3
4 Ansh Windsor Street 777  1

Books Table (3rd Normal Form)

Student_ID Book Issued
1 Until the Day I Die (Emily Carpenter)
1 Inception (Christopher Nolan)
2 The Alchemist (Paulo Coelho)
2 Inferno (Dan Brown)
3 Beautiful Bad (Annie Ward)
3 Woman 99 (Greer Macallister)
4 Dracula (Bram Stoker)

Salutations Table (3rd Normal Form)

Salutation_ID Salutation
1 Ms.
2 Mr.
3 Mrs.

Example 2 - Consider the following dependencies in relation to R(P,Q,R,S,T)

 P -> QR     [P together determine C] 
 RS -> T     [B and C together determine D] 
 Q -> S 
 T ->

For the above relation to exist in 3NF, all possible candidate keys in the above relation should be {P, RS, QR, T}.

  • Boyce-Codd Normal Form

A relation is in Boyce-Codd Normal Form if satisfies the conditions for third normal form and for every functional dependency, Left-Hand-Side is super key. In other words, a relation in BCNF has non-trivial functional dependencies in form X –> Y, such that X is always a super key. For example - In the above example, Student_ID serves as the sole unique identifier for the Students Table and Salutation_ID for the Salutations Table, thus these tables exist in BCNF. The same cannot be said for the Books Table and there can be several books with common Book Names and the same Student_ID.

39. What is Denormalization?

Denormalization is the inverse process of normalization, where the normalized schema is converted into a schema that has redundant information. The performance is improved by using redundancy and keeping the redundant data consistent. The reason for performing denormalization is the overheads produced in the query processor by an over-normalized structure.

40. What is Normalization?

Normalization represents the way of organizing structured data in the database efficiently. It includes the creation of tables, establishing relationships between them, and defining rules for those relationships. Inconsistency and redundancy can be kept in check based on these rules, hence, adding flexibility to the database.

41. What is a View?

A view in SQL is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

PostgreSQL Interview Questions

1. What is the main disadvantage of deleting data from an existing table using the DROP TABLE command?

DROP TABLE command deletes complete data from the table along with removing the complete table structure too. In case our requirement entails just remove the data, then we would need to recreate the table to store data in it. In such cases, it is advised to use the TRUNCATE command.

2. Differentiate between commit and checkpoint.

The commit action ensures that the data consistency of the transaction is maintained and it ends the current transaction in the section. Commit adds a new record in the log that describes the COMMIT to the memory. Whereas, a checkpoint is used for writing all changes that were committed to disk up to SCN which would be kept in datafile headers and control files.

Conclusion:

SQL is a language for the database. It has a vast scope and robust capability of creating and manipulating a variety of database objects using commands like CREATE, ALTER, DROP, etc, and also in loading the database objects using commands like INSERT. It also provides options for Data Manipulation using commands like DELETE, TRUNCATE and also does effective retrieval of data using cursor commands like FETCH, SELECT, etc. There are many such commands which provide a large amount of control to the programmer to interact with the database in an efficient way without wasting many resources. The popularity of SQL has grown so much that almost every programmer relies on this to implement their application's storage functionalities thereby making it an exciting language to learn. Learning this provides the developer a benefit of understanding the data structures used for storing the organization's data and giving an additional level of control and in-depth understanding of the application.

PostgreSQL being an open-source database system having extremely robust and sophisticated ACID, Indexing, and Transaction supports has found widespread popularity among the developer community. 

References and Resources:

3. What are parallel queries in PostgreSQL?

Parallel Queries support is a feature provided in PostgreSQL for devising query plans capable of exploiting multiple CPU processors to execute the queries faster.

4. Does PostgreSQL support full text search?

Full-Text Search is the method of searching single or collection of documents stored on a computer in a full-text based database. This is mostly supported in advanced database systems like SOLR or ElasticSearch. However, the feature is present but is pretty basic in PostgreSQL.

5. How will you take backup of the database in PostgreSQL?

We can achieve this by using the pg_dump tool for dumping all object contents in the database into a single file. The steps are as follows:

Step 1: Navigate to the bin folder of the PostgreSQL installation path.

C:\>cd C:\Program Files\PostgreSQL\10.0\bin

Step 2: Execute pg_dump program to take the dump of data to a .tar folder as shown below:

pg_dump -U postgres -W -F t sample_data > C:\Users\admin\pgbackup\sample_data.tar

The database dump will be stored in the sample_data.tar file on the location specified.

6. How do you perform case-insensitive searches using regular expressions in PostgreSQL?

To perform case insensitive matches using a regular expression, we can use POSIX (~*) expression from pattern matching operators. For example:

'interviewbit' ~* '.*INTervIewBit.*'

7. What can you tell about WAL (Write Ahead Logging)?

Write Ahead Logging is a feature that increases the database reliability by logging changes before any changes are done to the database. This ensures that we have enough information when a database crash occurs by helping to pinpoint to what point the work has been complete and gives a starting point from the point where it was discontinued.

For more information, you can refer here.

8. How do you check the rows affected as part of previous transactions?

SQL standards state that the following three phenomena should be prevented whilst concurrent transactions. SQL standards define 4 levels of transaction isolations to deal with these phenomena.

  • Dirty reads: If a transaction reads data that is written due to concurrent uncommitted transaction, these reads are called dirty reads.
  • Phantom reads: This occurs when two same queries when executed separately return different rows. For example, if transaction A retrieves some set of rows matching search criteria. Assume another transaction B retrieves new rows in addition to the rows obtained earlier for the same search criteria. The results are different.
  • Non-repeatable reads: This occurs when a transaction tries to read the same row multiple times and gets different values each time due to concurrency. This happens when another transaction updates that data and our current transaction fetches that updated data, resulting in different values.

To tackle these, there are 4 standard isolation levels defined by SQL standards. They are as follows:

  • Read Uncommitted – The lowest level of the isolations. Here, the transactions are not isolated and can read data that are not committed by other transactions resulting in dirty reads.
  • Read Committed – This level ensures that the data read is committed at any instant of read time. Hence, dirty reads are avoided here. This level makes use of read/write lock on the current rows which prevents read/write/update/delete of that row when the current transaction is being operated on.
  • Repeatable Read – The most restrictive level of isolation. This holds read and write locks for all rows it operates on. Due to this, non-repeatable reads are avoided as other transactions cannot read, write, update or delete the rows.
  • Serializable – The highest of all isolation levels. This guarantees that the execution is serializable where execution of any concurrent operations are guaranteed to be appeared as executing serially.

The following table clearly explains which type of unwanted reads the levels avoid:

Isolation levels  Dirty Reads  Phantom Reads  Non-repeatable reads
Read Uncommitted  Might occur Might occur Might occur
Read Committed  Won’t occur Might occur Might occur
Repeatable Read Won’t occur Might occur Won’t occur
Serializable Won’t occur Won’t occur Won’t occur

9. What do you understand by command enable-debug?

The command enable-debug is used for enabling the compilation of all libraries and applications. When this is enabled, the system processes get hindered and generally also increases the size of the binary file. Hence, it is not recommended to switch this on in the production environment. This is most commonly used by developers to debug the bugs in their scripts and help them spot the issues. For more information regarding how to debug, you can refer here.

10. What do you understand by multi-version concurrency control?

MVCC or Multi-version concurrency control is used for avoiding unnecessary database locks when 2 or more requests tries to access or modify the data at the same time. This ensures that the time lag for a user to log in to the database is avoided. The transactions are recorded when anyone tries to access the content.

For more information regarding this, you can refer here.

11. Can you explain the architecture of PostgreSQL?

  • The architecture of PostgreSQL follows the client-server model.
  • The server side comprises of background process manager, query processer, utilities and shared memory space which work together to build PostgreSQL’s instance that has access to the data. The client application does the task of connecting to this instance and requests data processing to the services. The client can either be GUI (Graphical User Interface) or a web application. The most commonly used client for PostgreSQL is pgAdmin.

12. What are ACID properties? Is PostgreSQL compliant with ACID?

ACID stands for Atomicity, Consistency, Isolation, Durability. They are database transaction properties which are used for guaranteeing data validity in case of errors and failures.

  • Atomicity: This property ensures that the transaction is completed in all-or-nothing way.
  • Consistency: This ensures that updates made to the database is valid and follows rules and restrictions.
  • Isolation: This property ensures integrity of transaction that are visible to all other transactions.
  • Durability: This property ensures that the committed transactions are stored permanently in the database.

PostgreSQL is compliant with ACID properties.

13. How can you delete a database in PostgreSQL?

This can be done by using the DROP DATABASE command as shown in the syntax below:

DROP DATABASE database_name;

If the database has been deleted successfully, then the following message would be shown:

DROP DATABASE

14. How can you get a list of all databases in PostgreSQL?

This can be done by using the command \l -> backslash followed by the lower-case letter L.

15. What are string constants in PostgreSQL?

They are character sequences bound within single quotes. These are using during data insertion or updation to characters in the database.
There are special string constants that are quoted in dollars. Syntax: $tag$<string_constant>$tag$ The tag in the constant is optional and when we are not specifying the tag, the constant is called a double-dollar string literal.

16. Define sequence.

A sequence is a schema-bound, user-defined object which aids to generate a sequence of integers. This is most commonly used to generate values to identity columns in a table. We can create a sequence by using the CREATE SEQUENCE statement as shown below:

CREATE SEQUENCE serial_num START 100;

To get the next number 101 from the sequence, we use the nextval() method as shown below:

SELECT nextval('serial_num');

We can also use this sequence while inserting new records using the INSERT command:

INSERT INTO ib_table_name VALUES (nextval('serial_num'), 'interviewbit');

17. What is the capacity of a table in PostgreSQL?

The maximum size of PostgreSQL is 32TB.

18. What is the importance of the TRUNCATE statement?

TRUNCATE TABLE name_of_table statement removes the data efficiently and quickly from the table.
The truncate statement can also be used to reset values of the identity columns along with data cleanup as shown below:

TRUNCATE TABLE name_of_table 
RESTART IDENTITY;

We can also use the statement for removing data from multiple tables all at once by mentioning the table names separated by comma as shown below:

TRUNCATE TABLE 
   table_1, 
   table_2,
   table_3;

19. Define tokens in PostgreSQL?

A token in PostgreSQL is either a keyword, identifier, literal, constant, quotes identifier, or any symbol that has a distinctive personality. They may or may not be separated using a space, newline or a tab. If the tokens are keywords, they are usually commands with useful meanings. Tokens are known as building blocks of any PostgreSQL code.

20. What are partitioned tables called in PostgreSQL?

Partitioned tables are logical structures that are used for dividing large tables into smaller structures that are called partitions. This approach is used for effectively increasing the query performance while dealing with large database tables. To create a partition, a key called partition key which is usually a table column or an expression, and a partitioning method needs to be defined. There are three types of inbuilt partitioning methods provided by Postgres:

  • Range Partitioning: This method is done by partitioning based on a range of values. This method is most commonly used upon date fields to get monthly, weekly or yearly data. In the case of corner cases like value belonging to the end of the range, for example: if the range of partition 1 is 10-20 and the range of partition 2 is 20-30, and the given value is 10, then 10 belongs to the second partition and not the first.
  • List Partitioning: This method is used to partition based on a list of known values. Most commonly used when we have a key with a categorical value. For example, getting sales data based on regions divided as countries, cities, or states.
  • Hash Partitioning: This method utilizes a hash function upon the partition key. This is done when there are no specific requirements for data division and is used to access data individually. For example, you want to access data based on a specific product, then using hash partition would result in the dataset that we require.

The type of partition key and the type of method used for partitioning determines how positive the performance and the level of manageability of the partitioned table are.

21. How can we start, restart and stop the PostgreSQL server?

  • To start the PostgreSQL server, we run:
service postgresql start
  • Once the server is successfully started, we get the below message:
Starting PostgreSQL: ok
  • To restart the PostgreSQL server, we run:
service postgresql restart

Once the server is successfully restarted, we get the message:

Restarting PostgreSQL: server stopped
ok
  • To stop the server, we run the command:
service postgresql stop

Once stopped successfully, we get the message:

Stopping PostgreSQL: server stopped
ok

22. What is the command used for creating a database in PostgreSQL?

The first step of using PostgreSQL is to create a database. This is done by using the createdb command as shown below: createdb db_name
After running the above command, if the database creation was successful, then the below message is shown:

CREATE DATABASE

23. How will you change the datatype of a column?

This can be done by using the ALTER TABLE statement as shown below:

Syntax:

ALTER TABLE tname
ALTER COLUMN col_name [SET DATA] TYPE new_data_type;

24. How do you define Indexes in PostgreSQL?

Indexes are the inbuilt functions in PostgreSQL which are used by the queries to perform search more efficiently on a table in the database. Consider that you have a table with thousands of records and you have the below query that only a few records can satisfy the condition, then it will take a lot of time to search and return those rows that abide by this condition as the engine has to perform the search operation on every single to check this condition. This is undoubtedly inefficient for a system dealing with huge data. Now if this system had an index on the column where we are applying search, it can use an efficient method for identifying matching rows by walking through only a few levels. This is called indexing.

Select * from some_table where table_col=120

25. What is PostgreSQL?

PostgreSQL was first called Postgres and was developed by a team led by Computer Science Professor Michael Stonebraker in 1986. It was developed to help developers build enterprise-level applications by upholding data integrity by making systems fault-tolerant. PostgreSQL is therefore an enterprise-level, flexible, robust, open-source, and object-relational DBMS that supports flexible workloads along with handling concurrent users. It has been consistently supported by the global developer community. Due to its fault-tolerant nature, PostgreSQL has gained widespread popularity among developers.

Advanced SQL Interview Questions

1. How do you find the Top N rows per group?

To find the Top N rows per group, a window function is used to rank rows within each group, followed by filtering based on that rank. This approach avoids grouping the data and allows row-level details to be retained.

The common method is to use ROW_NUMBERRANK, or DENSE_RANK with PARTITION BY, and then filter the ranked results in an outer query.

Example: Finding the top 3 highest-paid employees in each department.

SELECT *
FROM (
    SELECT
        department_id,
        employee_name,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department_id
            ORDER BY salary DESC
        ) AS rn
    FROM employees
) ranked
WHERE rn <= 3;

In this example:

  • Rows are partitioned by department_id
  • Employees are ranked by salary within each department
  • Only the top 3 rows per department are selected

If ties need to be handled differently, RANK or DENSE_RANK can be used instead of ROW_NUMBER. The choice depends on whether duplicate ranks should be included or limited.

2. What is the difference between ROW_NUMBER, RANK, and DENSE_RANK?

ROW_NUMBERRANK, and DENSE_RANK are window functions used to assign ranking values to rows within a partition, based on a specified ordering. The key difference between them lies in how they handle ties.

  • ROW_NUMBER assigns a unique sequential number to each row, even if two rows have the same values. Tied rows are given different numbers based on their order.
  • RANK assigns the same rank to tied rows, but skips the next rank values. This results in gaps in the ranking sequence.
  • DENSE_RANK also assigns the same rank to tied rows, but does not skip any ranks. The ranking remains continuous.

Example: Ranking employees by salary within a department.

SELECT
    department_id,
    employee_name,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num,
    RANK()       OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank_val,
    DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank_val
FROM employees;

If two employees have the same salary:

  • ROW_NUMBER will assign different numbers
  • RANK will assign the same rank and skip the next rank
  • DENSE_RANK will assign the same rank without skipping

Choosing the right function depends on whether gaps in ranking are acceptable and whether unique row ordering is required.

3. What is a window function?

A window function performs a calculation across a set of rows that are related to the current row, without collapsing the result into a single output row. Also, window functions do not reduce the number of rows returned like how it happens with aggregate functions.

Window functions are defined using the OVER clause, which specifies how rows are grouped and ordered for the calculation.

Window functions form an important category of SQL window functions interview questions, especially for analytics and data-focused roles.

The OVER clause commonly includes:

  • PARTITION BY - divides the result set into partitions (similar to GROUP BY)
  • ORDER BY - defines the order of rows within each partition

Example: Calculating a running total of sales per department.

SELECT
    department_id,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY department_id
        ORDER BY order_date
    ) AS running_total
FROM orders;

In this example:

  • Rows are grouped by department_id
  • Within each department, rows are ordered by order_date
  • The running total is calculated for each row without grouping the results

Window functions are commonly used for ranking, running totals, moving averages, and comparisons across rows. They are especially useful in analytical queries where row-level detail must be preserved.

4. What is a recursive CTE?

A recursive CTE is a type of Common Table Expression that repeatedly executes itself until a specified termination condition is met. It is mainly used to query hierarchical or tree-structured data, such as organizational charts, category trees, or parent–child relationships.

A recursive CTE consists of two parts:

  • Anchor query: returns the base result set
  • Recursive query: references the CTE itself and builds on the previous result

The recursion stops automatically when no new rows are produced.

Example: Finding all employees in a reporting hierarchy starting from a given manager.

 WITH RECURSIVE emp_hierarchy AS (
    -- Anchor member
    SELECT employee_id, manager_id, name
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive member
    SELECT e.employee_id, e.manager_id, e.name
    FROM employees e
    JOIN emp_hierarchy h
      ON e.manager_id = h.employee_id
)
SELECT *
FROM emp_hierarchy;

In this example, the anchor query selects top-level managers. The recursive part repeatedly finds employees who report to those managers, building the hierarchy level by level until no more rows are found.

Recursive CTEs are preferred over iterative or self-join approaches because they are easier to write, more readable, and better suited for hierarchical data traversal.

5. What is the difference between a CTE and a subquery?

A subquery is a query nested inside another SQL query, whereas a CTE (Common Table Expression) is a named temporary result set defined using the WITH clause and referenced later in the main query.

The main difference between the two is in their readability, reusability, and scope.

Subqueries are useful for simple, one-time operations, but they can become difficult to read and maintain when deeply nested. Each subquery exists only at the point where it is written and cannot be reused elsewhere in the same query.

CTEs, on the other hand, allow complex logic to be written once and referenced multiple times within the same query. By giving intermediate results a name, CTEs make queries easier to read, debug, and modify. This is especially helpful in queries involving multiple joins, aggregations, or step-by-step transformations.

Scope-wise, both CTEs and subqueries exist only for the duration of the query in which they are defined. However, a CTE’s scope is clearer and more flexible within that query, as it can be referenced multiple times, while a subquery is limited to its immediate context.

In usage, subqueries are preferred for small, straightforward conditions, while CTEs are better suited for complex queries where readability and logical separation are important.

Many of these topics are also asked as SQL queries interview questions, where candidates are expected to write or reason through queries instead of explaining concepts verbally.

6. What is a CTE, and when should you use it?

A CTE (Common Table Expression) is a temporary result set that can be referenced within a SELECTINSERTUPDATE, or DELETE statement. It is defined using the WITH keyword and exists only for the duration of the query in which it is used.

CTEs are mainly used to improve the structure and readability of the query, especially when dealing with complex queries that involve multiple steps, aggregations, or intermediate results. Instead of nesting subqueries, a CTE allows breaking the logic into smaller, more understandable parts.

CTEs are commonly used when:

  • A query requires the same subquery logic multiple times
  • Complex joins, or aggregations, need to be organized clearly
  • Recursive queries are required (for hierarchical data)

Example: Using a CTE to find employees with salaries above the department average.


 WITH dept_avg AS (
  SELECT department_id, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department_id
)
SELECT e.employee_id, e.salary
FROM employees e
JOIN dept_avg d
  ON e.department_id = d.department_id
WHERE e.salary >


In this example, the CTE dept_avg computes the average salary per department, which is then reused in the main query to filter employees. This approach is clearer and easier to maintain than writing the same logic as a nested subquery.

7. How do you find the latest record per user?

To find the latest record per user, a window function is used to rank records for each user based on a timestamp or date column. The most common approach is to apply ROW_NUMBER with descending order and then filter for the first row.

This method ensures that exactly one latest record is selected per user, even when multiple records exist.

Example: Fetching the most recent order for each user.


SELECT *
FROM (
   SELECT
       user_id,
       order_id,
       order_date,
       ROW_NUMBER() OVER (
           PARTITION BY user_id
           ORDER BY order_date DESC
       ) AS rn
   FROM orders
) latest_orders
WHERE rn = 1;

In this query:

  • Records are grouped by user_id
  • Orders are ordered by order_date in descending order
  • The most recent record per user is selected by filtering rn = 1

This approach is preferred over aggregate-based methods because it preserves complete row details and handles ties in a controlled manner.

8. What are transaction isolation levels?

Transaction isolation levels define how and when the changes made by one transaction become visible to other concurrent transactions. They control the balance between data consistency and performance in a database system.

SQL defines four standard isolation levels, each preventing certain types of anomalies.

Common anomalies:

  • Dirty read: Reading data that has been modified but not yet committed
  • Non-repeatable read: Reading the same row twice and getting different values
  • Phantom read: Re-running a query and seeing new rows that were not visible earlier

Isolation levels and what they prevent:

  • READ UNCOMMITTED Allows dirty reads, non-repeatable reads, and phantom reads. It offers the highest concurrency but the lowest data consistency.
     
  • READ COMMITTED Prevents dirty reads but allows non-repeatable reads and phantom reads. This is the default level in many databases.
     
  • REPEATABLE READ Prevents dirty reads and non-repeatable reads, but phantom reads may still occur depending on the database implementation.
     
  • SERIALIZABLE Prevents all anomalies by making transactions behave as if they are executed one after another. This provides the strongest consistency but can reduce concurrency.

Choosing the right isolation level depends on the use case. Systems that prioritize accuracy use higher isolation, while high-throughput systems may accept some anomalies for better performance.

9. What is a deadlock and how do you avoid it?

A deadlock occurs when two or more transactions block each other indefinitely because each transaction is holding a lock that the other needs. As a result, none of the transactions can proceed, and the database must intervene to resolve the situation.

Deadlocks commonly happen when:

  • Transactions acquire locks on the same resources in different orders
  • Long-running transactions hold locks for extended periods
  • Multiple rows or tables are locked within a single transaction

Example scenario: Transaction A locks Table X and waits for Table Y, while Transaction B locks Table Y and waits for Table X. Neither transaction can continue, resulting in a deadlock.

Ways to avoid deadlocks:

  • Use consistent lock ordering: Ensure all transactions acquire locks in the same order
  • Keep transactions short: Commit or roll back as early as possible
  • Avoid unnecessary locks by accessing only required rows
  • Use appropriate isolation levels to reduce lock contention
  • Handle deadlock retries gracefully, as most databases automatically roll back one transaction

Databases detect deadlocks automatically and resolve them by terminating one of the transactions. Writing queries with predictable lock behavior and consistent ordering significantly reduces the likelihood of deadlocks.

While syntax may vary slightly, most of these concepts apply equally to SQL Server interview questions and Oracle SQL interview questions, making them useful across database platforms.

10. What causes double counting in joins and how can it be prevented?

 

Double counting in joins usually occurs when tables are joined in a many-to-many relationship without proper aggregation or filtering. In such cases, rows from one table match multiple rows in another, causing values to be repeated and summed more than once.

Concepts like joins, anti-joins, and many-to-many relationships are asked through SQL joins interview questions.

This is commonly seen when:

  • Fact tables are joined directly without aggregation
  • Dimension tables contain multiple matching rows
  • Join conditions are incomplete or incorrect

Example: If an orders table is joined with an order_items table, each order appears once for every item it contains. Summing order-level values after this join can lead to inflated results.
 

SELECT
    SUM(o.total_amount)
FROM orders o
JOIN order_items i
  ON o.order_id = i.order_id;
 

Here, total_amount is repeated for each item, causing double counting.

Ways to prevent double counting:

  • Aggregate before joining, especially in many-to-many relationships
  • Use DISTINCT carefully when appropriate
  • Join at the correct grain, ensuring both tables represent data at compatible levels
  • Use subqueries or CTEs to reduce data before the join

Corrected approach (aggregate first):

SELECT
    SUM(order_total)
FROM (
    SELECT
        order_id,
        MAX(total_amount) AS order_total
    FROM orders
    GROUP BY order_id
) o;

Preventing double counting requires understanding the data grain of each table and ensuring joins are designed to preserve it. This is especially important in reporting

11. How do you handle slowly changing dimensions (SCD Type 2)?

lowly Changing Dimension (SCD) Type 2 is a technique used to preserve the full history of changes in dimension tables. Instead of updating existing records, a new row is inserted whenever a tracked attribute changes, while the old record is marked as inactive.

This approach is commonly used in data warehouses to track historical changes in attributes such as customer address, job title, or product category.

An SCD Type 2 table typically includes:

  • A surrogate key (unique row identifier)
  • Effective start and end dates
  • A flag to indicate the current active record

Example: Dimension table structure for tracking customer history.

customer_dim
-------------
customer_sk
customer_id
address
start_date
end_date
is_current

When a customer’s address changes:

  1. The existing active record is updated to set end_date and mark it as inactive
  2. A new record is inserted with the updated address and marked as current
-- Expire existing record
UPDATE customer_dim
SET end_date = CURRENT_DATE,
    is_current = 'N'
WHERE customer_id = 101
  AND is_current = 'Y';


-- Insert new record
INSERT INTO customer_dim (
    customer_id, address, start_date, end_date, is_current
)
VALUES (
    101, 'New Address', CURRENT_DATE, NULL, 'Y'
);


 

This method ensures that historical data is preserved and reports can accurately reflect changes over time. SCD Type 2 is widely used when auditability and time-based analysis are required.


 

12. What is MERGE/UPSERT and when should it be used?

MERGE (also known as UPSERT) is an operation that allows inserting new rows and updating existing rows in a single statement. It compares records from a source dataset with a target table and decides whether to insert or update based on a matching condition.

MERGE/UPSERT is especially useful in incremental data loads, where only new or changed records need to be applied instead of reloading the entire table.

This approach helps:

  • Avoid duplicate records
  • Reduce data processing time
  • Keep target tables in sync with source data

Example: Updating existing customer records and inserting new ones.

MERGE INTO customers t
USING staging_customers s
ON t.customer_id = s.customer_id
WHEN MATCHED THEN
    UPDATE SET
        t.name = s.name,
        t.email = s.email
WHEN NOT MATCHED THEN
    INSERT (customer_id, name, email)
    VALUES (s.customer_id, s.name, s.email);

In this example:

  • Existing customers are updated
  • New customers are inserted
  • Both actions happen in a single statement

MERGE/UPSERT is commonly used in ETL pipelines, data warehouses, and sync jobs where data arrives incrementally and needs to be applied efficiently.

13. What is partitioning and why does it help?

Partitioning is a database technique where a large table is divided into smaller, more manageable pieces called partitions, based on the values of one or more columns. Each partition holds a subset of the data, but together they represent the full table.

Partitioning helps improve performance through a mechanism called partition pruning. When a query includes a filter on the partition key, the database can skip irrelevant partitions and scan only the required ones. This reduces the amount of data read and speeds up query execution.

Partitioning is commonly done using:

  • Range partitioning (e.g., by date)
  • List partitioning (e.g., by region or category)
  • Hash partitioning (for even data distribution)

Example: Partitioning an orders table by order date.

CREATE TABLE orders (
    order_id INT,
    order_date DATE,
    amount DECIMAL
)
PARTITION BY RANGE (order_date);
When a query filters by a specific date range:
SELECT *
FROM orders
WHERE order_date >= '2024-01-01'
  AND order_date <  '2024-02-01';

Only the partitions covering January 2024 are scanned, while others are ignored.

Partitioning is especially useful for large tables, time-based data, and analytical queries where filtering on the partition key is common.

14. What is a materialized view and when should it be used?

materialized view is a database object that stores the precomputed result of a query physically on disk. Unlike a regular view, which runs the underlying query every time it is accessed, a materialized view returns stored results, making read operations much faster.

Materialized views are useful when queries involve heavy aggregations, joins, or transformations and are executed frequently on relatively static data.

The main trade-off when using materialized views is data freshness versus performance. Since the data is stored, it must be refreshed to reflect changes in the underlying tables.

Materialized views are typically used when:

  • Query performance is critical
  • The underlying data changes less frequently
  • Slightly stale data is acceptable

Example: Creating a materialized view for daily sales totals.

CREATE MATERIALIZED VIEW daily_sales AS
SELECT
    order_date,
    SUM(amount) AS total_sales
FROM orders
GROUP BY order_date;
The materialized view can be refreshed based on requirements:
REFRESH MATERIALIZED VIEW daily_sales;

Choosing between on-demand or scheduled refresh depends on how up-to-date the data needs to be. Materialized views are commonly used in reporting and analytics systems to reduce query load and improve performance.

15. What is a covering index?

covering index is an index that contains all the columns required by a query, so the database engine can retrieve the result directly from the index without accessing the underlying table.

When a covering index is used, the database avoids additional table (or heap) lookups, which significantly improves query performance, especially for read-heavy workloads.

This works because indexes store not only the indexed columns but also pointers to the data. If every column referenced in the SELECTWHERE, and JOIN clauses is already present in the index, the table itself does not need to be read.

Example: Query fetching order details filtered by status and date.

SELECT order_id, order_date
FROM orders
WHERE status = 'completed'
  AND order_date >= '2024-01-01';


A covering index for this query would be:


CREATE INDEX idx_orders_covering
ON orders (status, order_date, order_id);

In this case:

  • status and order_date support filtering
  • order_id is included to satisfy the SELECT clause

Because all required columns are present in the index, the query can be answered entirely using the index, avoiding table access. Covering indexes are especially useful for frequently executed queries that return a limited set of columns.

16. What are composite indexes and how do you choose column order?

A composite index is an index created on multiple columns of a table. It is used to speed up queries that filter or sort data based on more than one column.

The order of columns in a composite index matters because the database can efficiently use the index only from the leftmost prefix of the index definition.

A common rule for choosing column order is:

  • Place equality conditions first
  • Place range conditions (such as <, >, BETWEEN) after them

This allows the index to narrow down rows as much as possible before applying range filtering.

Example: Query filtering by status (equality) and date (range).


SELECT *
FROM orders
WHERE status = 'completed'
  AND order_date >= '2024-01-01';
An effective composite index for this query would be:
CREATE INDEX idx_orders_status_date
ON orders (status, order_date);

Here:

  • status uses an equality condition and is placed first
  • order_date uses a range condition and is placed second

If the column order were reversed, the index would be less effective for this query. Choosing the correct column order in composite indexes is essential for optimal query performance.

17. How do you optimize a slow query?

Optimizing a slow query starts with understanding where time and resources are being spent. The most effective optimizations usually focus on reducing the number of rows processed early and ensuring that indexes are used correctly.

A common first step is to analyze the query using an execution plan (EXPLAIN) to identify full table scans, inefficient joins, or missing indexes.

Key optimization techniques include:

  • Filtering rows as early as possible using selective WHERE conditions
  • Creating appropriate indexes on columns used in filters, joins, and sorting
  • Avoiding non-sargable conditions, such as functions on indexed columns
  • Selecting only required columns instead of using SELECT *
  • Reordering joins so that smaller or filtered datasets are processed first

Example: Filtering rows early to reduce data scanned.


SELECT order_id, order_date, amount
FROM orders
WHERE order_date >= '2024-01-01'
  AND status = 'completed';

If order_date and status are indexed appropriately, this query limits the number of rows before further processing, improving performance.

In practice, query optimization is an iterative process that combines execution plan analysis, proper indexing, and query rewriting to minimize unnecessary data access.

18. What is an execution plan (EXPLAIN)?

An execution plan describes how a database engine executes a SQL query. It shows the sequence of operations the optimizer chooses to retrieve data, such as table scans, index lookups, joins, and filtering steps. The EXPLAIN command is used to view this plan before or during query execution.

Execution plans help identify whether a query is using an index scan or a full table scan, which has a direct impact on performance.

  • Index scan: The database uses an index to quickly locate matching rows. This is generally faster and preferred for selective queries.
     
  • Table scan (or sequential scan): The database scans every row in the table. This can be expensive for large tables and usually indicates missing or unused indexes.

Example: Viewing the execution plan for a query.


EXPLAIN
SELECT *
FROM orders
WHERE order_date >= '2024-01-01';


 The output of EXPLAIN shows details such as:

  • Which indexes are used (if any)
  • The estimated number of rows processed
  • The cost of each operation

By analyzing an execution plan, inefficient scans can be detected early, allowing queries to be optimized through better indexing or query rewriting.

Questions related to indexing, execution plans, and query optimization often appear as SQL performance tuning interview questions, testing how efficiently a candidate can work with large datasets.

19. What is sargability in SQL?

Sargability refers to whether a SQL query can efficiently use an index to filter rows. A query is considered sargable if the database engine can apply the search condition directly to an indexed column, rather than evaluating the condition row by row.

Non-sargable conditions usually occur when functions, calculations, or transformations are applied to indexed columns in the WHERE clause. This prevents the query optimizer from using the index effectively, leading to full table scans.

Example: A non-sargable condition that prevents index usage.


SELECT *
FROM orders
WHERE YEAR(order_date) = 2024;

In this case, the function YEAR() is applied to the indexed column order_date, making the condition non-sargable.

A sargable alternative rewrites the condition to avoid applying functions on the column:


SELECT *
FROM orders
WHERE order_date >= '2024-01-01'
  AND order_date <  '2025-01-01';

This version allows the database to use the index on order_date efficiently.

Writing sargable queries is important for performance, especially when working with large tables and indexed columns.

20. What is the difference between NOT IN and NOT EXISTS?

Both NOT IN and NOT EXISTS are used to filter rows that do not have matching values in a subquery. However, they behave differently when NULL values are involved, which is a common source of errors.

The key difference lies in how each handles NULLs.

  • NOT IN returns no rows if the subquery contains even a single NULL value. This happens because comparisons with NULL result in an unknown condition, causing the entire filter to fail.
     
  • NOT EXISTS does not have this issue. It checks for the existence of matching rows and safely handles NULLs, making it more reliable in most cases.
     

Example: Finding users who have not placed any orders. 

-- Using NOT IN (can fail if subquery returns NULL)
SELECT user_id
FROM users
WHERE user_id NOT IN (
    SELECT user_id
    FROM orders
);


If orders.user_id contains a NULL, this query may return no results.


-- Using NOT EXISTS (NULL-safe)
SELECT u.user_id
FROM users u
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.user_id = u.user_id
);

While using, NOT EXISTS is generally preferred for anti-join conditions because it is NULL-safe and more predictable, especially when working with real-world data. 

SQL Scenario-Based Interview Questions

1. How do you find orphan rows (orders without users)?

Orphan rows occur when records in one table reference entries that do not exist in another table. A common example is orders that do not have a matching user record. Identifying such rows is important for data quality checks and referential integrity validation.

The standard way to find orphan rows is to use an anti-join, which returns rows from one table that have no matching rows in another table.

Example: Finding orders that do not have a corresponding user.

SELECT o.order_id, o.user_id
FROM orders o
LEFT JOIN users u
  ON o.user_id = u.user_id
WHERE u.user_id IS NULL;

In this query:

  • A left join keeps all orders
  • Rows where no matching user exists result in NULL values on the user side
  • Filtering on u.user_id IS NULL identifies orphan orders

An alternative approach is to use NOT EXISTS, which is often preferred for clarity and NULL safety:
 

SELECT o.order_id, o.user_id
FROM orders o
WHERE NOT EXISTS (
    SELECT 1
    FROM users u
    WHERE u.user_id = o.user_id
);

Both methods correctly identify orphan rows. Anti-joins like these are commonly used in data audits, cleanup tasks, and pipeline validations to ensure data consistency.

2. 9. How do you compute LTV and bucket users?

Lifetime Value (LTV) represents the total revenue generated by a user over their entire lifetime. To compute LTV, all purchases made by a user are summed up. Once LTV is calculated, users can be grouped into buckets (for example, high, medium, low value) to support segmentation and analysis.

A common way to bucket users is by using the NTILE window function, which divides users into equal-sized groups based on their LTV.

Example: Computing user LTV and dividing users into 4 value buckets.

WITH user_ltv AS (
    SELECT
        user_id,
        SUM(amount) AS ltv
    FROM orders
    GROUP BY user_id
)
SELECT
    user_id,
    ltv,
    NTILE(4) OVER (ORDER BY ltv DESC) AS ltv_bucket

FROM user_ltv;

In this query:

  • LTV is calculated as the total spend per user
  • Users are ordered by LTV in descending order
  • NTILE(4) splits users into four equal buckets

These buckets can be interpreted as high-value, mid-value, and low-value user segments. This method is widely used in customer analytics, marketing targeting, and revenue analysis.

3. How do you find churned customers (no purchase in the last 60 days)?

Churned customers are typically defined as users who were active in the past but have not made a purchase within a recent time window, such as the last 60 days. The key is to identify each customer’s most recent purchase date and compare it with the current date.

This is usually done by aggregating purchase data per customer and filtering based on the maximum purchase date.

Example: Finding customers who have not purchased in the last 60 days.

SELECT

    customer_id

FROM orders

GROUP BY customer_id

HAVING MAX(order_date) < CURRENT_DATE - INTERVAL '60 days';

In this query:

  • MAX(order_date) identifies the most recent purchase per customer
  • Customers whose latest purchase is older than 60 days are marked as churned

If customers with no purchases at all also need to be included, a left join with the customer table can be used instead of grouping only on orders.

This approach is simple, efficient, and commonly used in retention and churn analysis.

4. How do you detect anomalies compared to a trailing 7-day average?

Anomaly detection using a trailing average involves comparing the current day’s metric against the average value from the previous 7 days. A data point is typically considered anomalous if it deviates significantly from this rolling baseline.

The usual approach is to calculate a 7-day rolling average using a window function and then compare the current value with that average. This helps smooth out short-term fluctuations and highlights unusual spikes or drops.

Example: Detecting daily revenue anomalies using a trailing 7-day average.

SELECT
    order_date,
    daily_revenue,
    AVG(daily_revenue) OVER (
        ORDER BY order_date
        ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING
    ) AS trailing_7_day_avg,
    CASE
        WHEN daily_revenue > 
             AVG(daily_revenue) OVER (
                 ORDER BY order_date
                 ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING
             ) * 1.3
        THEN 'anomaly'
        ELSE 'normal'
    END AS status
FROM daily_sales;

In this query:

  • The rolling average is calculated using the previous 7 days only
  • The current day is excluded to avoid bias
  • A threshold (for example, 30% above average) is used to flag anomalies

Thresholds can be adjusted based on business needs. This method is widely used in monitoring systems to detect unusual behavior while accounting for natural variation in the data.

5. How do you find revenue excluding refunds or chargebacks?

To calculate revenue excluding refunds or chargebacks, the idea is to compute net revenue by subtracting refunded or reversed amounts from the original sales. This ensures revenue reflects only the money actually retained by the business.

The most common approach is to separate sales and refund transactions, then aggregate them together using conditional logic. Each transaction type is handled differently while summing amounts.

Example: Calculating net revenue from sales and refunds.

SELECT
    SUM(
        CASE
            WHEN transaction_type = 'sale' THEN amount
            WHEN transaction_type IN ('refund', 'chargeback') THEN -amount
            ELSE 0
        END
    ) AS net_revenue
FROM transactions;

In this query:

  • Sales increase revenue
  • Refunds and chargebacks reduce revenue
  • Net revenue is computed in a single aggregation

If refunds are stored in a separate table, a similar result can be achieved by aggregating sales and refunds separately and then subtracting refund totals.

This approach ensures that revenue reports remain accurate and are not inflated by transactions that were later reversed
 

6. How do you compute A/B test conversion uplift?

A/B test conversion uplift measures how much better one variant performs compared to another. The key is to separate users by their assigned variant and then calculate conversion rates independently for each group.

To compute uplift correctly, users must be counted only once, based on their original assignment, and conversions should be measured after the assignment event.

Example: Calculating conversion rates for control and treatment groups.

WITH assignments AS (
    SELECT
        user_id,
        variant
    FROM ab_assignments
),
conversions AS (
    SELECT DISTINCT user_id
    FROM events
    WHERE event_type = 'purchase'
)
SELECT
    a.variant,
    COUNT(DISTINCT a.user_id) AS users,
    COUNT(DISTINCT c.user_id) AS converted_users,
    COUNT(DISTINCT c.user_id) * 1.0 / COUNT(DISTINCT a.user_id) AS conversion_rate
FROM assignments a
LEFT JOIN conversions c
  ON a.user_id = c.user_id
GROUP BY a.variant;

Once conversion rates are computed for each variant, uplift is calculated as the difference between the treatment and control conversion rates.

This approach ensures that:

  • Users are grouped strictly by their assigned variant
  • Conversions are attributed correctly
  • Each user is counted only once

A/B test analysis often extends this by adding statistical significance checks, but the core SQL logic focuses on correct assignment and conversion counting.

7. How do you deduplicate rows and keep the latest record?

Deduplication is commonly required when multiple records exist for the same entity, and only the most recent record should be retained. The standard approach is to rank records within each duplicate group using a timestamp and then filter out all but the latest one.

This is typically done using the ROW_NUMBER window function, ordered by the timestamp in descending order. Rows with rank 1 represent the latest record per group.

Example: Keeping the latest record per user based on updated_at.
 

SELECT *
FROM (
    SELECT
        user_id,
        email,
        updated_at,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY updated_at DESC
        ) AS rn
    FROM user_updates
) ranked
WHERE rn = 1;

In this query:

  • Records are grouped by user_id
  • The most recent record is ranked first using descending order
  • Filtering on rn = 1 removes older duplicates

This method is reliable, easy to read, and works well even when multiple updates exist for the same entity. It is preferred over DISTINCT when the definition of “latest” depends on time or versioning.

8. How do you sessionize events using a 30-minute inactivity window?

Sessionization groups user events into sessions based on periods of activity separated by inactivity. A common rule is to start a new session if the time gap between two consecutive events exceeds 30 minutes.

The typical approach uses the LAG window function to compare each event’s timestamp with the previous event for the same user. When the gap is greater than 30 minutes, a session break is identified. A cumulative sum is then used to assign session numbers.

Example: Creating sessions per user with a 30-minute inactivity threshold.

WITH ordered_events AS (
    SELECT
        user_id,
        event_time,
        LAG(event_time) OVER (
            PARTITION BY user_id
            ORDER BY event_time
        ) AS prev_event_time
    FROM events
),
session_flags AS (
    SELECT
        user_id,
        event_time,
        CASE
            WHEN prev_event_time IS NULL
              OR event_time - prev_event_time > INTERVAL '30 minutes'
            THEN 1
            ELSE 0
        END AS is_new_session
    FROM ordered_events
)
SELECT
    user_id,
    event_time,
    SUM(is_new_session) OVER (
        PARTITION BY user_id
        ORDER BY event_time
        ROWS UNBOUNDED PRECEDING
    ) AS session_id
FROM session_flags;
This logic works as follows:
  • LAG gets the previous event time for each user
  • A new session is flagged when inactivity exceeds 30 minutes
  • A cumulative sum assigns a unique session ID per user

Sessionization like this is commonly used in analytics to compute session counts, session duration, and user engagement metrics.

9. How do you compute D1/D7 retention cohorts?

D1 and D7 retention measure how many users return to the product 1 day or 7 days after their first activity. The main idea behind cohort analysis is to group users based on their signup or first activity date and then check whether they are active again after a fixed number of days.

To compute retention, users are first assigned to a cohort based on the date they joined. Their subsequent activity is then joined back to this cohort to see if they were active on Day 1 or Day 7 relative to the cohort date.

Example: Calculating D1 and D7 retention based on user activity.

WITH cohorts AS (
    SELECT
        user_id,
        DATE(MIN(event_time)) AS cohort_date
    FROM events
    GROUP BY user_id
),
activity AS (
    SELECT
        user_id,
        DATE(event_time) AS activity_date
    FROM events
)
SELECT
    c.cohort_date,
    COUNT(DISTINCT c.user_id) AS cohort_size,
    COUNT(DISTINCT CASE
        WHEN a.activity_date = c.cohort_date + INTERVAL '1 day'
        THEN c.user_id
    END) AS d1_retained,
    COUNT(DISTINCT CASE
        WHEN a.activity_date = c.cohort_date + INTERVAL '7 day'
        THEN c.user_id
    END) AS d7_retained
FROM cohorts c
LEFT JOIN activity a
  ON c.user_id = a.user_id
GROUP BY c.cohort_date
ORDER BY c.cohort_date;

This approach ensures that:

  • Users are grouped by their first activity date
  • Retention is measured relative to the cohort date
  • Each user is counted only once per retention window

Retention rates are then calculated by dividing D1 or D7 retained users by the total cohort size.

10. How do you calculate funnel conversion (visit -> signup -> purchase)?

Funnel conversion measures how many users move through a defined sequence of events, such as visit -> signup -> purchase. The key challenge is to ensure that each user is counted only once at each stage and that events occur in the correct order.

A common approach is to track distinct users at every stage of the funnel and use timestamps to enforce the sequence. This prevents users who sign up before visiting, or purchase without signing up, from being counted incorrectly.

Typically, the funnel is calculated within a fixed time window to keep the analysis meaningful.

Example: Calculating how many users progress through each funnel stage.

WITH funnel_events AS (
    SELECT
        user_id,
        MIN(CASE WHEN event_type = 'visit' THEN event_time END)    AS visit_time,
        MIN(CASE WHEN event_type = 'signup' THEN event_time END)   AS signup_time,
        MIN(CASE WHEN event_type = 'purchase' THEN event_time END) AS purchase_time
    FROM events
    GROUP BY user_id
)
SELECT
    COUNT(DISTINCT user_id) AS visits,
    COUNT(DISTINCT CASE 
        WHEN signup_time IS NOT NULL 
         AND signup_time > visit_time 
        THEN user_id 
    END) AS signups,
    COUNT(DISTINCT CASE 
        WHEN purchase_time IS NOT NULL 
         AND purchase_time > signup_time 
        THEN user_id 
    END) AS purchases
FROM funnel_events;

This query ensures that:

  • Each user is counted once per stage
  • Events follow the correct order
  • Only valid funnel progressions are included

Funnel conversion rates can then be calculated by dividing signups by visits, and purchases by signups.

Coding Problems

View All Problems

SQL MCQ

1.

An SQL query to delete a table from the database and memory while keeping the structure of the table intact?

2.

What is a pre-requisite for creating a database in PostgreSQL?To create a database in PostgreSQL, you must have the special CREATEDB privilege or

3.

Which of the following is known as a virtual table in SQL?

4.

What is the main advantage of a clustered index over a non-clustered index?

5.

SQL query used to fetch unique values from a field?

6.

Which statement is used to update data in the database?

7.

Which statement is false for the ORDER BY statement?

8.

What statement is used for adding data to PostgreSQL?

9.

Normalization which has neither composite values nor partial dependencies?

10.

What does SQL stand for?

11.

Which statement is true for a PRIMARY KEY constraint?

12.

What is the order of results shown by default if the ASC or DESC parameter is not specified with the ORDER BY command?

13.

What allows us to define how various tables are related to each other formally in a database?

14.

What is the name of the component that requests data to the PostgreSQL server?

15.

What languages are supported by PostgreSQL?

16.

What command is used for restoring the backup of PostgreSQL which was created using pg_dump?

17.

Query to select all records with "bar" in their name?

18.

Which command is used to tell PostgreSQL to make all changes made to the database permanent?

19.

Which statement is false for a FOREIGN KEY constraint?

20.

What is a Query?

Excel at your interview with Masterclasses Know More
Certificate included
What will you Learn?
Free Mock Assessment
Fill up the details for personalised experience.
Phone Number *
OTP will be sent to this number for verification
+91 *
+91
Change Number
Graduation Year *
Graduation Year *
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
*Enter the expected year of graduation if you're student
Current Employer
Company Name
College you graduated from
College/University Name
Job Title
Job Title
Engineering Leadership
Software Development Engineer (Backend)
Software Development Engineer (Frontend)
Software Development Engineer (Full Stack)
Data Scientist
Android Engineer
iOS Engineer
Devops Engineer
Support Engineer
Research Engineer
Engineering Intern
QA Engineer
Co-founder
SDET
Product Manager
Product Designer
Backend Architect
Program Manager
Release Engineer
Security Leadership
Database Administrator
Data Analyst
Data Engineer
Non Coder
Other
Please verify your phone number
Edit
Resend OTP
By clicking on Start Test, I agree to be contacted by Scaler in the future.
Already have an account? Log in
Free Mock Assessment
Instructions from Interviewbit
Start Test