An introduction to SQL Joins
A join in SQL is used to combine columns from one or more tables. It can be used to access information from multiple tables that have overlapping datasets or just merge information from two tables. To join two tables we need a column of intersection and this is done by using values common to each of the tables. The values must be unique in order to avoid messy events, for example more than one column can be common to two tables but only one will be unique and this should be used.
SQL has different types of joins and they are as follows:
- Inner Join
- Left Join
- Right Join
- Full Outer Join
- Cross Join
- Self Join
To explore these join types further, I have created two tables:
a. Customers Table: contains basic customer information.
b. Transactions Table: contains transaction information for each customer.
INNER JOIN
This is a type of SQL join that returns all matching rows between tables based on a common column. The common column is the field by which tables are joined. An inner join can be represented in a venn diagram as shown below;
Where C represents Customers and T represents Transactions.
Using our sample tables above, a query to get all customers with transactions based on the customer_id
field is shown below:
SELECT *
FROM Customers
INNER JOIN Transactions
ON CUSTOMER.customer_id = Transactions.customer_id
LEFT JOIN
A left join returns all records from the left table including only matching records on the right table. For records left with no matching record on the right table, null values are returned. The venn diagram for a left join is shown below:
For example, to get all customer records including transactions for customers who have made at least one transaction, we can use the query below:
SELECT *
FROM Customers c
LEFT JOIN Transactions t
ON c.customer_id = t.customer_id;
Customers who have not made at least one transaction will have null returned for columns in the transactions
table.
RIGHT JOIN
A right join returns all rows from the table on the right including matching rows from the table on the left. Records on the right without matching records on the left will have null values for columns on the left table. The venn diagram for a right join is shown below;
To return all transaction records including only customers with transactions, we can use the query below:
SELECT *
FROM Customer c
RIGHT JOIN Transactions t
ON c.customer_id = t.customer_id;
FULL OUTER JOIN
This returns records from both the left and the right table, and where there are no matching records, the missing side returns nulls.
The query below can be used to return all customer and all transactions records irrespective of whether there is a match on both sides:
SELECT *
FROM Customer
FULL OUTER JOIN Transactions;
SELF JOIN
A self join is used to join a table to itself as if there were two tables. It is mostly used when a table needs to reference data in itself. To form a self join, you specify the table name twice with difference table aliases and provide the join predicate after the ON
keyword.
SELECT columns_list
FROM table1 t1
INNER JOIN table1 t2
ON t1.column_name = t2.column_name;
An example of where a self join will be useful is when we need to query an Employees record, to get employee info including their manger. A self join is needed here since the manager is also an employee.
CROSS JOIN
A cross join is a cartesian product of two tables where each record in the first table is crossed with all records in the second table. For example if the first table contains 5 records and the second table contains 4 records, there will be 20 records in the cross join result.
SELECT Column_name
FROM Table_1 t1
CROSS JOIN Table_2 t2;
Explicit Join vs Implicit Join
When using the JOIN clause in SQL, you can decide to write an Explicit Join or an Implicit Join.
Explicit Join: In an explicit join, the join table is explicitly specified with the JOIN
command and ON
is used to specify the column on which the tables will be joined.
SELECT column_names
FROM table_1
JOIN table_2
ON table_1.common_key = table_2.common_key;
Implicit Join: Here, the table to be joined or the join conditions are not explicitly specified. A combination of the FROM
and WHERE
clause is used to specify the tables where records will be pulled from and the conditions used to filter the records.
SELECT column_name
FROM table_1, table_2, table_3
WHERE table_1.common_key = table_2.common_key
AND table_2.common_key = table_3.common_key;
Finally, it is easy to confuse a UNION
with a JOIN
so I will differentiate between them below.
A JOIN
allows a user to create a table as a result of a given condition between two tables.
A UNION
allows two datasets that are similar to return a dataset that contains all the information from the two datasets. If you union two tables, it returns all the information in both tables without duplicates. It is only duplicated when you use union all. Using the customers record table and the transactions table, an SQL union will return the combination of all the columns in both tables into a single result without duplicating rows that are common to both tables.
For example:
SELECT *
FROM table_1
UNION
SELECT *
FROM table_2;
Thank you for reading.