How to Use Join and String_agg in Microsoft SQL Server

Thanoshan MV
8 min readMar 6, 2021

--

Photo by MI PHAM on Unsplash

In this article, we’ll look at how to use join on more than two tables and aggregate the result using the function STRING_AGG() in Microsoft SQL Server.

If you don’t know about Microsoft SQL Server, I’ll briefly explain to you what it is.

😃

What is Microsoft SQL Server?

Microsoft SQL Server is a Relational Database Management System that revolutionized how businesses handle data. It helps to store and manage data.

If you’re familiar with other relational database management systems such as MySQL, or PostgreSQL then picking up Microsoft SQL Server is easy.

I’m running on the default instance of SQL server.

Now, let’s consider a problem.

Problem

Let’s say we have three tables namely Employee, Projectand EmployeeProject. The below image is the relational database design:

Figure 1: Relational Database Design for the problem

The problem is to get all the employee details and their corresponding projects.

Figure 2: Tables from left to right: Employee, EmployeeProject, and Project

Things to consider: not all the employees from table Employeemapped with table EmployeeProject and not all the projects in table Project mapped with table EmployeeProject .

Our main goal is to retrieve all the employee details from table Employee whether they are mapped with EmployeeProject or not.

We can try to solve this problem by using joins. As you can see, we have to join three tables in order to solve this problem. First, we need to join tables Employeeand EmployeeProject . Then with the resulted table, we’ll join Project .

Let’s go through some scenarios to solve this problem.

Scenario 1

Let’s go with INNER JOIN everywhere!

SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId 
FROM Employee AS e INNER JOIN EmployeeProject AS ep
ON e.Id = ep.EmployeeId

This will give us:

Figure 3

Let’s think of the above resulted table as Employee-EmployeeProject . It contains all the employee details as well as their corresponding project ids.

With the help of Employee-EmployeeProject, we’ll be able to access Project table. Let’s do that:

SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p 
INNER JOIN
(SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId
FROM Employee AS e INNER JOIN EmployeeProject AS ep
ON e.Id = ep.EmployeeId) AS abc
ON p.Id = abc.ProjectId

Output:

Figure 4

Great! 😃Now we are able to retrieve employee details as well as their corresponding projects but our main goal is missing (get all the employee details) as we’re missing Sophia Ashley’s details.

Scenario 1 worked well but we missed our goal. 😆

Scenario 2

Let’s get all the details from employees whether they are mapped with EmployeeProject or not (our goal) by using LEFT JOIN with Employee table:

SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId

This query will give us:

Figure 5

As you can see from the above figure, we are able to retrieve Sophia Ashley’s details since we’re using LEFT JOIN on Employee table with EmployeeProject table.

Let’s think of the above resulted table as Employee-EmployeeProject . It contains all the employee details as well as their corresponding project ids and NULL for when it doesn’t contain any ProjectId value.

Similar to scenario 1, now we can access project names since we know ProjectId . Remember, our goal is to retrieve all the employee details whether they have a project or not.

To ensure that, we’ll need to retrieve all the values from Employee-EmployeeProject when joining with Project table:

SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId
Figure 6

Great work! We achieved our goal 😃

This is good stuff. But it would good if we’re able to group these rows and return one row per each employee. This is our new wish! 😉

This leads us to a question on how we can group these results? We can group these rows by using GROUP BY .

So, we’ll GROUP BY resulted in rows by FirstName :

SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName

Output is:

Figure 7

What happened?

It says column LastName is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. This error is applicable to all the remaining columns in the select list except FirstName .

When we try to select values of FirstName and group byFirstName, it means that we’re going to group all the rows based on FirstName only and select the FirstNamecolumn. For example, let’s select only FirstName and group by FirstName :

SELECT abc.FirstName FROM Project AS p RIGHT JOIN 
(SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName
Figure 8

As you can see in Figure 8, we’ve grouped all the rows by FirstName . Here, there’s no ambiguity.

Now, let’s select FirstName ,LastName and group all the rows by FirstName :

SELECT abc.FirstName, abc.LastName FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName
Figure 9

If we check out what are the values of LastName in Figure 6, we can notice that we have two employees who have the same FirstName but different LastName : James Johnson and James Smith.

So, when we try to group all the rows by FirstName and select the values of FirstName and LastName, it leads us to an ambiguity state.

Imagine that MSSQL asks us, “You’re selecting FirstName ,LastName and trying to group all the rows by FirstName . But the FirstName James has two different LastName Johnson and Smith. When selecting James’s last name, what’s his LastName should be? Johnson? Smith? or both?” It’s the ambiguity of MSSQL on which one to select.

To solve this FirstName and LastName issue, we can either (option 1) group all the rows by both FirstName and LastName or (option 2) enclose LastName into an aggregate function to select only one value.

Option 1:

SELECT abc.FirstName, abc.LastName FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName, abc.LastName
Figure 10

Option 2:

SELECT abc.FirstName, MAX(abc.LastName) AS LastName FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName
Figure 11

In the above FirstName and LastName issue, although both options work, option 1 makes sense than option 2.

For more detailed information on ambiguity, check out this one!

NOTE: when you have a GROUP BY query, the select list must be part of either grouping criteria or appear in aggregate functions such as SUM , MAX , COUNT and so on and so forth.

Again back to our wish, we’ll try to GROUP BY all the rows by all columns:

SELECT abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName, abc.LastName, abc.City, abc.Designation, p.Name
Figure 12

We’ve successfully grouped all the rows but couldn’t able to retrieve one row per each employee as each row is distinct from others if we consider all columns, hence grouping them by all columns won’t work.

According to our wish, we need records for Emma Cooper, James Johnson, James Smith, Maria Garcia and Sophia Ashley (five rows). GROUP BY FirstName , LastName , City and Designation will give us these five rows but what about Project ? We can't GROUP BY it (if we do that then the result would be similar to figure 12), but we can use an aggregate (add together) function to aggregate Project .

Actually, we can use STRING_AGG() MSSQL aggregate function to return one row per each for an employee by concatenating Name column in Project table and GROUP BY remaining columns:

SELECT abc.FirstName, abc.LastName, abc.Designation, STRING_AGG (p.Name, ',') WITHIN GROUP (ORDER BY p.Name) AS Project FROM Project AS p RIGHT JOIN (SELECT e.Id, e.FirstName, e.LastName, e.Designation, e.City, ep.ProjectId FROM Employee AS e LEFT JOIN EmployeeProject AS ep ON e.Id = ep.EmployeeId) AS abc ON p.Id = abc.ProjectId GROUP BY abc.FirstName, abc.LastName, abc.City, abc.Designation
Figure 13

Yay! We’ve done it. 😃 😃

The problem we discussed in this article helped us to understand some of the Microsoft SQL Server concepts.

Now, we have a basic understanding of how to use join and STRING_AGGin Microsoft SQL Server.

Please feel free to let me know if you have any suggestions or questions.

Thank you 😇

Happy Coding ❤️

To Explore Further

  1. STRING_AGG (Transact-SQL) — Microsoft Docs
  2. Aggregate Functions — Microsoft Docs
  3. An overview of the STRING_AGGfunction in SQL — SQLShack
  4. SQL | GROUP BY — GeeksforGeeks

--

--

Thanoshan MV
Thanoshan MV

Written by Thanoshan MV

Hi! I'm a Software Engineer with a passion for research in AI, software engineering, and open-source development. Web - https://thanoshanmv.github.io/