Correlated subquery

In a SQL database query, a correlated subquery (also known as a synchronized subquery) is a subquery (a query nested inside another query) that uses values from the outer query. Because the subquery may be evaluated once for each row processed by the outer query, it can be slow.

Here is an example for a typical correlated subquery. In this example, the objective is to find all employees whose salary is above average for their department.

 SELECT employee_number, name
   FROM employees emp
   WHERE salary > (
     SELECT AVG(salary)
       FROM employees
       WHERE department = emp.department);

In the above query the outer query is

 SELECT employee_number, name
   FROM employees emp
   WHERE salary > ...

and the inner query (the correlated subquery) is

 SELECT AVG(salary)
   FROM employees
   WHERE department = emp.department

In the above nested query the inner query has to be re-executed for each employee. (A sufficiently smart implementation may cache the inner query's result on a department-by-department basis, but even in the best case the inner query must be executed once per department. See "Optimizing correlated subqueries" below.)

Correlated subqueries may appear elsewhere besides the WHERE clause; for example, this query uses a correlated subquery in the SELECT clause to print the entire list of employees alongside the average salary for each employee's department. Again, because the subquery is correlated with a column of the outer query, it must be re-executed for each row of the result.

 SELECT
   employee_number,
   name,
   (SELECT AVG(salary) 
      FROM employees
      WHERE department = emp.department) AS department_average
   FROM employees emp

Optimizing correlated subqueries

The effect of correlated subqueries can in some cases be obtained using joins. For example, the queries above (which use inefficient correlated subqueries) may be rewritten as follows.

 -- This subquery is not correlated with the outer query, and is therefore
 -- executed only once, regardless of the number of employees.
 SELECT employees.employee_number, employees.name
   FROM employees INNER JOIN
     (SELECT department, AVG(salary) AS department_average
       FROM employees
       GROUP BY department) temp ON employees.department = temp.department
   WHERE employees.salary > temp.department_average;

If the inner query is used in multiple queries, the inner query can be stored as a view, and then join the view:

 CREATE VIEW dept_avg AS
   SELECT department, AVG(salary) AS department_average
   FROM employees
   GROUP BY department;

 -- List employees making more than their department average.
 SELECT employees.employee_number, employees.name
   FROM employees INNER JOIN dept_avg ON employees.department = dept_avg.department
   WHERE employees.salary > dept_avg.department_average;

 -- List employees alongside their respective department averages.
 SELECT employees.employee_number, employees.name, dept_avg.department_average
   FROM employees INNER JOIN dept_avg ON employees.department = dept_avg.department;

 DROP VIEW dept_avg;

You could also build and reference a temp table instead of a view.

Another way to accomplish this, which would have the same performance as the "view" solution, is to use a CTE (Common Table Expression) as follows. This has the advantage of having the entire operation in one query in case that is a requirement. Note that some versions of SQL, typically older ones, do not support the "With...CTE" operation.

WITH
   SELECT department, AVG(salary) AS department_average
   FROM employees
   GROUP BY department
AS dept_avg_CTE  -- arbitrary name, does not need "CTE"

 -- List employees making more than their department average.
 SELECT employees.employee_number, employees.name
   FROM employees INNER JOIN dept_avg_CTE ON employees.department = dept_avg_CTE.department
   WHERE employees.salary > dept_avg_CTE.department_average;

Database implementations such as Oracle can automatically unnest a correlated subquery if the cost based optimizer deems this to yield a better execution plan.

gollark: Idea: eliminate palaiologistry.

gollark: Idea: compute BB(6).

gollark: Essentially, I always win.

gollark: In the unbounded whatever case, I also win.

gollark: You JUST SAID you would test it on TIO.

External links

Correlated subquery with examples

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.