How to Use Join and String_agg in Microsoft SQL Server
In this article, we’ll look at how to use join on more than two tables and aggregate the result using the function STRING_AGG()
in Microsoft SQL Server.
If you don’t know about Microsoft SQL Server, I’ll briefly explain to you what it is.
😃
What is Microsoft SQL Server?
Microsoft SQL Server is a Relational Database Management System that revolutionized how businesses handle data. It helps to store and manage data.
If you’re familiar with other relational database management systems such as MySQL, or PostgreSQL then picking up Microsoft SQL Server is easy.
I’m running on the default instance of SQL server.
Now, let’s consider a problem.
Problem
Let’s say we have three tables namely Employee
, Project
and EmployeeProject
. The below image is the relational database design:
The problem is to get all the employee details and their corresponding projects.
Things to consider: not all the employees from table Employee
mapped with table EmployeeProject
and not all the projects in table Project
mapped with table EmployeeProject
.
Our main goal is to retrieve all the employee details from table Employee
whether they are mapped with EmployeeProject
or not.
We can try to solve this problem by using joins. As you can see, we have to join three tables in order to solve this problem. First, we need to join tables Employee
and EmployeeProject
. Then with the resulted table, we’ll join Project
.
Let’s go through some scenarios to solve this problem.
Scenario 1
Let’s go with INNER JOIN
everywhere!
SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId
FROM Employee AS e INNER JOIN EmployeeProject AS ep
ON e.Id = ep.EmployeeId
This will give us:
Let’s think of the above resulted table as Employee-EmployeeProject
. It contains all the employee details as well as their corresponding project ids.
With the help of Employee-EmployeeProject
, we’ll be able to access Project
table. Let’s do that:
SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p
INNER JOIN
(SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId
FROM Employee AS e INNER JOIN EmployeeProject AS ep
ON e.Id = ep.EmployeeId) AS abc
ON p.Id = abc.ProjectId
Output:
Great! 😃Now we are able to retrieve employee details as well as their corresponding projects but our main goal is missing (get all the employee details) as we’re missing Sophia Ashley’s details.
Scenario 1 worked well but we missed our goal. 😆
Scenario 2
Let’s get all the details from employees whether they are mapped with EmployeeProject
or not (our goal) by using LEFT JOIN
with Employee
table:
SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId
This query will give us:
As you can see from the above figure, we are able to retrieve Sophia Ashley’s details since we’re using LEFT JOIN
on Employee
table with EmployeeProject
table.
Let’s think of the above resulted table as Employee-EmployeeProject
. It contains all the employee details as well as their corresponding project ids and NULL
for when it doesn’t contain any ProjectId
value.
Similar to scenario 1, now we can access project names since we know ProjectId
. Remember, our goal is to retrieve all the employee details whether they have a project or not.
To ensure that, we’ll need to retrieve all the values from Employee-EmployeeProject
when joining with Project
table:
SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId
Great work! We achieved our goal 😃
This is good stuff. But it would good if we’re able to group these rows and return one row per each employee. This is our new wish! 😉
This leads us to a question on how we can group these results? We can group these rows by using GROUP BY
.
So, we’ll GROUP BY
resulted in rows by FirstName
:
SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName
Output is:
What happened?
It says column LastName
is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. This error is applicable to all the remaining columns in the select list except FirstName
.
When we try to select values of FirstName
and group byFirstName
, it means that we’re going to group all the rows based on FirstName
only and select the FirstName
column. For example, let’s select only FirstName
and group by FirstName
:
SELECT abc.FirstName FROM Project AS p RIGHT JOIN
(SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName
As you can see in Figure 8, we’ve grouped all the rows by FirstName
. Here, there’s no ambiguity.
Now, let’s select FirstName
,LastName
and group all the rows by FirstName
:
SELECT abc.FirstName, abc.LastName FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName
If we check out what are the values of LastName
in Figure 6, we can notice that we have two employees who have the same FirstName
but different LastName
: James Johnson and James Smith.
So, when we try to group all the rows by FirstName
and select the values of FirstName
and LastName
, it leads us to an ambiguity state.
Imagine that MSSQL asks us, “You’re selecting FirstName
,LastName
and trying to group all the rows by FirstName
. But the FirstName
James has two different LastName
Johnson and Smith. When selecting James’s last name, what’s his LastName
should be? Johnson? Smith? or both?” It’s the ambiguity of MSSQL on which one to select.
To solve this FirstName and LastName issue, we can either (option 1) group all the rows by both FirstName
and LastName
or (option 2) enclose LastName
into an aggregate function to select only one value.
Option 1:
SELECT abc.FirstName, abc.LastName FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName, abc.LastName
Option 2:
SELECT abc.FirstName, MAX(abc.LastName) AS LastName FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName
In the above FirstName and LastName issue, although both options work, option 1 makes sense than option 2.
For more detailed information on ambiguity, check out this one!
NOTE: when you have a GROUP BY
query, the select list must be part of either grouping criteria or appear in aggregate functions such as SUM
, MAX
, COUNT
and so on and so forth.
Again back to our wish, we’ll try to GROUP BY
all the rows by all columns:
SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name
We’ve successfully grouped all the rows but couldn’t able to retrieve one row per each employee as each row is distinct from others if we consider all columns, hence grouping them by all columns won’t work.
According to our wish, we need records for Emma Cooper, James Johnson, James Smith, Maria Garcia and Sophia Ashley (five rows). GROUP BY
FirstName
, LastName
, City
and Designation
will give us these five rows but what about Project
? We can't GROUP BY
it (if we do that then the result would be similar to figure 12), but we can use an aggregate (add together) function to aggregate Project
.
Actually, we can use STRING_AGG()
MSSQL aggregate function to return one row per each for an employee by concatenating Name
column in Project
table and GROUP BY
remaining columns:
SELECT abc.FirstName, abc.LastName, abc.Designation, STRING_AGG (p.Name, ',') WITHIN GROUP (ORDER BY p.Name) AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName, abc.LastName, abc.City, abc.Designation
Yay! We’ve done it. 😃 😃
The problem we discussed in this article helped us to understand some of the Microsoft SQL Server concepts.
Now, we have a basic understanding of how to use join and STRING_AGG
in Microsoft SQL Server.
Please feel free to let me know if you have any suggestions or questions.
Thank you 😇
Happy Coding ❤️