Friday, March 1, 2024
Home Interview Questions 15 Tricky SQL Queries Solved

# 15 Tricky SQL Queries Solved

0 comment

Working with SQL queries can be incredibly tricky, especially if you’re new to the language. Even experienced SQL users can be stumped by complex queries. However, with a few tips and tricks, you can quickly become an SQL expert.

In this step-by-step guide, you’ll learn how to solve tricky SQL queries and become an SQL wizard.

You’ll learn the best practices for crafting optimized queries, how to debug queries that don’t work, and how to use common features like joins and subqueries.

With these tools in hand, you’ll be able to tackle any SQL query with confidence. Whether you’re a beginner or a seasoned SQL coder, this guide will help you become a master of SQL queries.

## Query 1

### 1. Classes More Than 5 Students @ LeetCode

There is a table `courses` with columns: student and class. Please list out all classes which have more than or equal to 5 students. For example, the table:

`+---------+------------+| student | class      |+---------+------------+| A       | Math       || B       | English    || C       | Math       || D       | Biology    || E       | Math       || F       | Computer   || G       | Math       || H       | Math       || I       | Math       |+---------+------------+`

Should output:

`+---------+| class   |+---------+| Math    |+---------+`

Note:
The students should not be counted duplicate in each course.

### Walking Through My Thinking

Even for absolute beginners, this is an easy question: the underlying idea is to filter classes with 5 or more students and do something like this:

### Incorrect Code

`SELECT class FROM coursesGROUP BY classHAVING COUNT(student)>=5`

You will bump into an error message if you submit the above code at LeetCode.

The tricky part is that there may be duplicates, and the last line of code, the HAVING COUNT() statement, fails to consider such a scenario. Instead, we should use the DISTINCT method inside the chunk to specify the output.

In fact, the question has an additional note that specifies what to do with duplicates.

A piece of advice is to ask for clarification questions if you are doing a SQL interview for DS positions. Smooth communication in any interview goes a long way.

### Solution

```# Write your MySQL query statement below
SELECT class
FROM courses
GROUP BY class
HAVING COUNT(DISTINCT student)>=5```

## Query 2

We are given a table consisting of two columns, Name, and Profession. We need to query all the names immediately followed by the first letter in the profession column enclosed in parenthesis.

My Solution

``````SELECT
CONCAT(Name, ’(‘, SUBSTR(Profession, 1, 1), ’)’)
FROM table;``````

Since we need to combine the name and profession we can use`CONCAT`. We also need to have only one letter inside the parenthesis. Hence we will use `SUBSTR`and pass the column name, start index, end index. Since we need only the first letter we will pass 1,1(start index is inclusive and the end index is not inclusive)

## Query 3

### 3. Game Play Analysis II @ LeetCode

Table: `Activity`

`+--------------+---------+| Column Name  | Type    |+--------------+---------+| player_id    | int     || device_id    | int     || event_date   | date    || games_played | int     |+--------------+---------+(player_id, event_date) is the primary key of this table.This table shows the activity of players of some game.Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.`

Write a SQL query that reports the device that is first logged in for each player.

The query result format is in the following example:

`Activity table:+-----------+-----------+------------+--------------+| player_id | device_id | event_date | games_played |+-----------+-----------+------------+--------------+| 1         | 2         | 2016-03-01 | 5            || 1         | 2         | 2016-05-02 | 6            || 2         | 3         | 2017-06-25 | 1            || 3         | 1         | 2016-03-02 | 0            || 3         | 4         | 2018-07-03 | 5            |+-----------+-----------+------------+--------------+Result table:+-----------+-----------+| player_id | device_id |+-----------+-----------+| 1         | 2         || 2         | 3         || 3         | 1         |+-----------+-----------+`

### Walking Through My Thinking

OMG, this question appears to be so easy but deceivingly difficult.

Here is why.

The result table contains two columns: player_id and device_id. We can use a WITH() clause to filter out the first log-in date, which can be obtained using MIN() command for each player and then write an outer query to find these two columns.

In code, we can do something as follow:

### Incorrect code:

`WITH first_log AS ( SELECT player_id, device_id, MIN(event_date) FROM Activity GROUP BY player_id)SELECT player_id, device_idFROM first_log`

Again, correct result if run it, but an error message if submit. Try it here: LeetCode.

But why?

Before reading on, I’d suggest a short break and take a look at the code:

why doesn’t it work?

The answer is each player has his/her own first log-in date, and we have to filter cases using these two columns together: player_id and event_date. In other words, each player has his/her unique minimal date.

### Solution

`# Write your MySQL query statement below SELECT player_id, device_id  FROM Activity WHERE (player_id, event_date) IN (         select player_id, MIN(event_date)      from Activity      group by player_id  )`

Note: don’t forget to include a pair of parenthesis.

## Query 4

Tina was asked to compute the average salary of all employees from the EMPLOYEES table she created but realized that the zero key in her keyboard is not working after the result showed a very less average. She wants our help in finding out the difference between miscalculated average and actual average.

We must write a query finding the error( Actual AVG — Calculated AVG).

My Solution

``````SELECT
AVG(Salary) - AVG(REPLACE(Salary, 0, ’’))
FROM table;``````

A point to note here is that we have only one table that consists of actual salary values. To create the error scenario we use `REPLACE` to replace 0’s. We will pass the column name, value to be replaced, and the value with which we will replace the `REPLACE` method. Then we find the difference in averages using the aggregate function `AVG`.

## Query 5

We are given a table, which is a Binary Search Tree consisting of two columns Node and Parent. We must write a query that returns the node type ordered by the value of nodes in ascending order. There are 3 types.

1. Root — if the node is a root
2. Leaf — if the node is a leaf
3. Inner — if the node is neither root nor leaf.

My Solution

Upon initial analysis, we can conclude that if a given node N has its corresponding P-value as NULL it is the root. And for a given Node N if it exists in the P column it is not an inner node. Based on this idea let us write a query.

``````SELECT CASE
WHEN P IS NULL THEN CONCAT(N, ' Root')
WHEN N IN (SELECT DISTINCT P from BST) THEN CONCAT(N, ' Inner')
ELSE CONCAT(N, ' Leaf')
END
FROM BST
ORDER BY N asc;``````

We can use `CASE` which acts as a switch function. As I mentioned if P is null for a given node N then N is the root. Hence we used`CONCAT` for combining the node value and label. Similarly, if a given node N is in column P it is an inner node. To get all nodes from column P we wrote a subquery which returns all the distinct nodes in column P. Since we were asked to order the output by node values in ascending order we used the `ORDER BY` Clause.

## Query 6

We are given a transaction table that consists of transaction_id, user_id, transaction_date, product_id, and quantity. We need to query the number of users who purchased products on multiple days(Note that a given user can purchase multiple products on a single day).

My Solution

To solve this query, we cannot directly count the occurrence of user_id’s and if it is more than one return that user_id because a given user can have more than one transaction on a single day. Hence if a given user_id has more than one distinct date associated with it means he purchased products on multiple days. Following the same approach, I wrote a query. (Inner query)

``````SELECT COUNT(user_id)
FROM
(
SELECT user_id
FROM orders
GROUP BY user_id
HAVING COUNT(DISTINCT DATE(date)) > 1
) t1``````

Since the question asked for the number of user_ids and not the user_id’s itself we use `COUNT` in the outer query.

## Query 7

We are given a subscription table which consists of subscription start and end date for each user. We need to write a query that returns true/false for each user based on the overlapping of dates with other users. For instance, If user1’s subscription period overlaps with any other user the query must return True for user1.

My Solution

Upon initial analysis, we understand that we must compare every subscription against every other one. Let us consider start and end dates of userA as `startA` and `endA`, similarly for userB,`startB` and `endB`.

If `startA``endB` and `endA``startB` then we can say the two date ranges overlap. Let us take two examples. Let us compare U1 AND U3 first.

`startA` = 2020–01–01
`endA` = 2020–01–31
`startB` = 2020–01–16
`endB` = 2020–01–26

Here we can see `startA`(2020–01–01) is less than `endB`(2020–01–26) and similarly, endA(2020–01–31) is greater than `startB`(2020–01–16) and hence can conclude that the dates overlap. Similarly, if you compare U1 and U4 the above condition fails and will return false.

We must also ensure that a user is not compared to his own subscription. We also want to run a left join on itself to match a user with each other user that satisfies our condition. We will create two replicas s1 and s2 of the same table now.

``````SELECT *
FROM subscriptions AS s1
LEFT JOIN subscriptions AS s2
ON s1.user_id != s2.user_id
AND s1.start_date <= s2.end_date
AND s1.end_date >= s2.start_date``````

Given the conditional join, a user_id from s2 should exist for each user_id in s1 on the condition where there exists an overlap between the dates.

Output

We can see there exists another user for each user in case the dates overlap. For user1 there are 2 rows indicating that he matches with 2 users. For user 4 the corresponding id is null indicating that he does not match with any other user.

Wrapping it all together now, we can group by the s1.user_id field and just check if any value exists true for a user where s2.user_id IS NOT NULL.

Final query

``````SELECT
s1.user_id
, (CASE WHEN s2.user_id IS NOT NULL THEN 1 ELSE 0 END) AS overlap
FROM subscriptions AS s1
LEFT JOIN subscriptions AS s2
ON s1.user_id != s2.user_id
AND s1.start_date <= s2.end_date
AND s1.end_date >= s2.start_date
GROUP BY s1.user_id``````

We used the `CASE` clause to label 1 and 0 depending on the s2.user_id value for a given user. The final output looks like this –

## Query 8

### 8. Sellers With No Sales @ LeetCode

Table: `Customer`

`+---------------+---------+| Column Name   | Type    |+---------------+---------+| customer_id   | int     || customer_name | varchar |+---------------+---------+customer_id is the primary key for this table.Each row of this table contains the information of each customer in the WebStore.`

Table: `Orders`

`+---------------+---------+| Column Name   | Type    |+---------------+---------+| order_id      | int     || sale_date     | date    || order_cost    | int     || customer_id   | int     || seller_id     | int     |+---------------+---------+order_id is the primary key for this table.Each row of this table contains all orders made in the webstore.sale_date is the date when the transaction was made between the customer (customer_id) and the seller (seller_id).`

Table: `Seller`

`+---------------+---------+| Column Name   | Type    |+---------------+---------+| seller_id     | int     || seller_name   | varchar |+---------------+---------+seller_id is the primary key for this table.Each row of this table contains the information of each seller.`

Write an SQL query to report the names of all sellers who did not make any sales in 2020. Return the result table ordered by `seller_name` in ascending order. The query result format is in the following example.

`Customer table:+--------------+---------------+| customer_id  | customer_name |+--------------+---------------+| 101          | Alice         || 102          | Bob           || 103          | Charlie       |+--------------+---------------+Orders table:+-------------+------------+--------------+-------------+-------------+| order_id    | sale_date  | order_cost   | customer_id | seller_id   |+-------------+------------+--------------+-------------+-------------+| 1           | 2020-03-01 | 1500         | 101         | 1           || 2           | 2020-05-25 | 2400         | 102         | 2           || 3           | 2019-05-25 | 800          | 101         | 3           || 4           | 2020-09-13 | 1000         | 103         | 2           || 5           | 2019-02-11 | 700          | 101         | 2           |+-------------+------------+--------------+-------------+-------------+Seller table:+-------------+-------------+| seller_id   | seller_name |+-------------+-------------+| 1           | Daniel      || 2           | Elizabeth   || 3           | Frank       |+-------------+-------------+Result table:+-------------+| seller_name |+-------------+| Frank       |+-------------+Daniel made 1 sale in March 2020.Elizabeth made 2 sales in 2020 and 1 sale in 2019.Frank made 1 sale in 2019 but no sales in 2020.`

### Walking Through My Thinking

First off, this is a rather simple question. I mean it this time. What turns me down and justifies its inclusion here is its lengthy description.

What to do with these three tables?

Where should I start?

There is too much information, and too much information is no better than no information.

My coping strategy is to break the question down into pieces and ask two big questions:

What does the question ask for?

What information is available so far?

The first step would be to take a look at the above Result table, if provided, and, if not, ask the interviewer what would be like if things work out as expected. Again, good communication is King!

Then, let’s check what available information. As for this query, the Result table only contains one column (seller_name), and it only appears in the Orders table, which also contains sale_date and order_id information. Either of these two additional variables would be sufficient to identify the year of each order.

Now, we have a good understanding of the question and have collected all the necessary information. There is no need to check the first table (Customer).

The final step is to write up the query sequence.

The question asks for sellers with no sales in 2020. If we use a WHERE clause to filter the rows for sales not in 2020, then we would eliminate cases that have sales both in 2020 and beyond, like in Elizabeth‘s case.

The following code does not work:

### Incorrect Code

`SELECT seller_nameFROM Seller JOIN OrdersUSING(seller_id)WHERE LEFT(sale_date,4) != 2020`

It outputs two cases: Frank (no sale in 2020) and Elizabeth (sales both in 2020 and 2019).

For a “no-sales-in-2020” question, we can find out the seller names that have sales in 2020, as a first step, and then use an outer query to filter them out in the following step.

### Solution

`SELECT seller_nameFROM Seller WHERE seller_id NOT IN ( SELECT DISTINCT seller_id  FROM Orders WHERE LEFT(sale_date, 4) = ‘2020’)ORDER BY seller_name`

## Query 9

### 9. Second Highest Salary @ LeetCode

Write a SQL query to get the second highest salary from the `Employee` table.

`+----+--------+| Id | Salary |+----+--------+| 1  | 100    || 2  | 200    || 3  | 300    |+----+--------+`

For example, given the above Employee table, the query should return `200` as the second highest salary. If there is no second highest salary, then the query should return `null`.

`+---------------------+| SecondHighestSalary |+---------------------+| 200                 |+---------------------+`

### Walking Through My Thinking

Actually, this is a straightforward question. The reason for its inclusion is because I want to introduce two methods: IFNULL() and OFFSET.

The question asks for the second highest salary, which reminds us of using an OFFSET method to skip the first row. Also, it returns null if there is no such value. In so doing, we can use IFNULL() function.

As a side note, the function has the following form: IFNULL(statement, NULL). If there is no null value, run the statement as usual; if there is a null value, return NULL or any other value as specified.

### Solution

```SELECT IFNULL( (SELECT DISTINCT Salary
FROM Employee
ORDER BY Salary DESC
LIMIT 1
OFFSET 1), null) AS SecondHighestSalary```

## Query 10

We are given two tables.
Accounts table — Each row of this table contains the account number and name of each user in the bank.
Transactions table — Each row of this table contains the transactions of all accounts. Amount is positive if the user received money and negative if they transferred money. All the accounts start with an initial balance of zero.

We need to write an SQL query to return the name and balance of all users having a balance greater than 10,000.
Balance = Total received money-Total transferred money

### Solution

Firstly we need to join the two tables. Since we need to find the balance of each account we can use the `GROUP BY` clause on the `Account ID`column. We can find the sum of amount column grouped by `Account ID` and filter rows having sum>10000.

`SELECT    a.name AS name,    sum(b.amount) AS balanceFROM Users AS a JOIN Transactions AS bon a.account = b.accountGROUP BY    a.accountHAVING balance > 10000;`

## Query 11

Given a table of students and their GRE test scores, write a query to return the two students with the closest test scores and their score difference. If there exists more than one pair, sort their names in ascending order and then return the first resulting pair.

### Solution

This requires some creative thinking in SQL. Since there is only one table with two columns we need to self-reference different creations of the same table. We can solve these kinds of problems by visualizing two tables having the same values.

Let us have two same copies of the above scores table named s1 and s2. Since we need to compare every student with every other student. We can perform inner join by setting:

`scores AS s1 INNER JOIN scores AS s2 ON s1.ID != s2.ID`

This way we are comparing every student with every other student and are also avoiding comparing a student to his/herself. However, if we run the final query using the above join condition, observe the output.

Since we are comparing every student in s1 with every other student in s2, duplication is happening. And if there exist a million rows our query will be inefficient. To optimize the query let us add a condition s1.id > s2.id to ensure that comparison happens only once.

Therefore all we need to do now is subtract each score from each other score and order this difference in ascending order using the ORDER BY clause. Since we need the closest score, we will print only the first row using the LIMIT clause.

`SELECT     s1.name AS s1_name    , s2.name AS s2_name    , ABS(s1.score - s2.score) AS score_diffFROM scores AS s1INNER JOIN scores AS s2    ON s1.id != s2.id        AND s1.id > s2.idORDER BY score_diff ASC, s1_name ASCLIMIT 1`

## Query 12

Given the above two tables employees and departments, select the top department which consists of the highest percentage of employees who earn over 1500 in salary and have at least 2 employees.

### Solution

We can break down the given question into separate clauses.
1) Top department.
2) % of employees making over 1500 in salary.
3) Department must have at least 2 employees.

Before solving the above clauses let us have a combined representation of the two tables where each employee is associated with their department names. Since both tables have a common column `department_id` we can perform a join using

`employees AS e INNER JOIN departments AS dON e.department_id = d.id`

Now that we have this full representation, we can start filtering, aggregating to get our output. We know that we need to calculate the total number of employees that are making over 1500 by each department. We can use the GROUP BY clause on dept_name since we need one row for each department.

To find the percentage of people earning over 1500 we can use —
(Number of people making over 1500) / (Total number of people in that department)

Finally, we need to find the count the number of Employee IDs to filter departments having more than 2 employees.

`SELECT        d.Dept_Name AS d_name,       SUM(CASE WHEN e.salary > 1500 THEN 1 ELSE 0 END)/COUNT(DISTINCT e.id) AS pct_above_1500,      COUNT(DISTINCT e.id) AS number_of_employeesFROM employees e JOIN departments d ON e.dept_id = d.dept_idGROUP BY d_nameHAVING number_of_employees >= 2ORDER BY pct_above_1500 DESCLIMIT 1`

We use the SUM and CASE clause to find the number of employees with a salary greater than 1500.

## Query 13

In the above table, we are given scores of IDs. W need to write a SQL Query to output the rank of these scores. Incase, the scores are tied, the same rank must be assigned to both the scores.

### Solution

Firstly, let us segregate the distinct scores and create another copy of the input table.

For each score in the S: Scores Table, we want to find out the number of Scores in the S2:Scores table that are greater than or equal. We can group by ID and compare the scores.

This is nothing but the rank of that particular score!

`SELECT S.Score, COUNT(S2.Score) AS Rank FROM Scores S,(SELECT DISTINCT Score FROM Scores) AS S2WHERE S.Score<=S2.ScoreGROUP BY S.Id ORDER BY S.Score DESC;`

## Query 14

The above table consists of Visitor ID, Visitor Date, and number of Visitors on that particular date. No two same dates exist. Write an SQL query to display three or more consecutive ID’s, having visitors more than or equal to 100 sorted by their visiting date. IDs will be consecutive and unique.
You may argue that ID 2 and 3 also have more than 100 visitors and are not displayed in the output. The main condition is to display 3 or more IDs with visitors more than 100 here. Hence, 5,6,7,8 are displayed.

### Solution

I found this question to be tricky as the title of this article suggests 🙂 Similar to Query 2 we need to create a self join on the stadium table. We can compare IDs and compute our results. A given row s1 can be the first, middle, or last one in the 3 consecutive rows.

• s1 in the beginning: (s1.id + 1 = s2.id AND s1.id + 2 = s3.id)
• s1 in the middle: (s1.id — 1 = s2.id AND s1.id + 1 = s3.id)
• s1 in the end: (s1.id — 2 = s2.id AND s1.id — 1 = s3.id)
```SELECT s1.* FROM stadium AS s1, stadium AS s2, stadium as s3
WHERE
((s1.id + 1 = s2.id
AND s1.id + 2 = s3.id)
OR
(s1.id - 1 = s2.id
AND s1.id + 1 = s3.id)
OR
(s1.id - 2 = s2.id
AND s1.id - 1 = s3.id)
)
AND s1.people>=100
AND s2.people>=100
AND s3.people>=100GROUP BY s1.id
ORDER BY s1.visit_date```

## Conclusion

Practice is the key to mastering SQL. This article explains 15 tricky questions and how they can be solved. There are a variety of ways to write a SQL query. This is one of SQL’s specialties. I hope you learned something new today!

You Might Also Like

#### You may also like

Explore the dynamic world of technology with DataFlareUp. Gain valuable insights, follow expert tutorials, and stay updated with the latest news in the ever-evolving tech industry.