[SQL]: Aggregate Functions

SQL

09/26/2019


COUNT

SQL
SELECT COUNT(*) FROM students;
  • Output:
TEXT
+----------+
| COUNT(*) |
+----------+
| 15 |
+----------+
SQL
SELECT COUNT(fname),
COUNT( DISTINCT lname ) AS uniqueLnames,
COUNT( DISTINCT lname, fname ) AS uniqueFullnames
FROM students;
  • Output:
TEXT
+--------------+--------------+-----------------+
| COUNT(fname) | uniqueLnames | uniqueFullnames |
+--------------+--------------+-----------------+
| 15 | 12 | 14 |
+--------------+--------------+-----------------+
SQL
SELECT COUNT(lname) - COUNT( DISTINCT(lname)) AS duplicateLnames
FROM students;
  • Output
TEXT
+-----------------+
| duplicateLnames |
+-----------------+
| 3 |
+-----------------+
SQL
SELECT COUNT(*) FROM students
WHERE major LIKE "%science%";
  • Show how many majors contain "science"
  • Output:
TEXT
+----------+
| COUNT(*) |
+----------+
| 3 |
+----------+

GROUP BY

GROUP BY summarizes identical data into single rows. For example, consider an imaginary query and output below. GROUP BY by default can order the list in ascending order, depending on SQL mode.

SQL
SELECT major, lname FROM students_temp;
TEXT
+---------------------+-------+
| major | lname |
+---------------------+-------+
| Computer engineering| Smith |
| Computer science | Min |
| Economics | Min |
| Accounting | Smith |
| Communication Art | Kim |
+---------------------+-------+

Result of GROUP BY lname

SQL
SELECT major, lname FROM students_temp
GROUP BY lname;
TEXT
+---------------------+-------+
| major | lname |
+---------------------+-------+
| Computer engineering| Smith |
| Computer science | Min |
| Communication Art | Kim |
+---------------------+-------+

What happens behind the scene

Though it isn't shown in above output, SQL groups by lname behind the scene

TEXT
| Computer engineering| Smith |
| Accounting | Smith |
TEXT
| Computer science | Min |
| Economics | Min |
TEXT
| Communication Art | Kim |

Since the data is grouped we can do things like how many majors each person has

SQL
SELECT lname, COUNT(*) FROM students_temp
GROUP BY lname;
TEXT
+-------+----------+
| lname | COUNT(*) |
+-------+----------+
| Smith | 2 |
| Min | 2 |
| Kim | 1 |
+-------+----------+

Group BY fullname

SQL
SELECT fname, lname, COUNT(*) FROM students
GROUP BY fname, lname;
TEXT
+----------+----------+----------+
| fname | lname | COUNT(*) |
+----------+----------+----------+
| Angelina | Randall | 1 |
| Brad | Johnson | 1 |
| Brent | Lang | 1 |
| Dan | Joyce | 1 |
| Dave | Smith | 1 |
| Diego | Miller | 1 |
| Ellis | Min | 2 |
| Jerimiah | Tucker | 1 |
| John | Johnson | 1 |
| Lamont | Marks | 1 |
| Lena | Shah | 1 |
| Mary | Fletcher | 1 |
| Sam | Smith | 1 |
| Tenzin | Pema | 1 |
+----------+----------+----------+

GROUP BY with CONCAT

SQL
SELECT CONCAT(major, ' has ', COUNT(*), ' student(s)')
FROM students
GROUP BY major;
TEXT
+-------------------------------------------------+
| CONCAT(major, ' has ', COUNT(*), ' student(s)') |
+-------------------------------------------------+
| Accounting has 1 student(s) |
| Art has 1 student(s) |
| Biochemistry has 1 student(s) |
| Biology has 2 student(s) |
| Business has 1 student(s) |
| Chemistry has 1 student(s) |
| Computer Science has 2 student(s) |
| Economics has 1 student(s) |
| Food Science has 1 student(s) |
| Japanese 10% has 1 student(s) |
| Linguistics has 1 student(s) |
| no_major has 1 student(s) |
| Physics has 1 student(s) |
+-------------------------------------------------+

MIN / MAX

SQL
SELECT MIN(credit)
FROM students;
TEXT
+-------------+
| MIN(credit) |
+-------------+
| 6 |
+-------------+

Shows the lowest credit student has

What if you want to show the name with MIN credit?

  1. Method 1 (WHERE)
SQL
SELECT CONCAT(fname, ' ', lname) AS name, MIN(credit)
FROM students
WHERE credit = (SELECT MIN(credit) FROM students);
  1. Method 2 (ORDER BY)
SQL
SELECT CONCAT(fname, ' ', lname) AS name, credit
FROM students
ORDER BY credit LIMIT 1;
  1. WRONG
SQL
SELECT fname, MIN(credit)
FROM students;
  • 1st solution is inefficient because it's running two queires
  • Method 2 is preferred
  • Correct output:
TEXT
+--------------+--------+
| name | credit |
+--------------+--------+
| Diego Miller | 6 |
+--------------+--------+
  • Incorrect output (3.)
TEXT
+-------+-------------+
| fname | MIN(credit) |
+-------+-------------+
| Ellis | 6 |
+-------+-------------+
  • This creates a wrong output because MIN(credit) executes separate from fname

MIN/MAX & GROUP BY

Ex) Show the student's full name along with their lower credit amount

SQL
SELECT CONCAT(fname,' ', lname) as fullname, MIN(credit)
FROM students
GROUP BY fname, lname;
TEXT
+------------------+-------------+
| fullname | MIN(credit) |
+------------------+-------------+
| ... | ... |
| Ellis Min | 13 |
| ... | ... |
+------------------+-------------+

We have two Ellis Min in our data. The above query selects the minimum credit 13, instead of 5555. The other students aren't duplicate so they don't matter much for this problem

In other words, Ellis Min behind the scene is grouped by like below

TEXT
| Ellis Min | 5555 |
| Ellis Min | 13 |

Then MIN(credit) will pick the lowest amount between two

SUM

SQL
SELECT SUM(credit)
FROM students;
TEXT
+-------------+
| SUM(credit) |
+-------------+
| 8427 |
+-------------+

Adds all credits of all students

SUM & GROUP BY

Ex) Sum all credits each student have

SQL
SELECT CONCAT(fname, ' ', lname) as name, SUM(credit)
FROM students
GROUP BY fname, lname;
TEXT
+------------------+-------------+
| name | SUM(credit) |
+------------------+-------------+
| ... | ... |
| Ellis Min | 5568 |
| ... | ... |
+------------------+-------------+

Ellis Min now has 5568 credits (5555 + 13)

AVG

SQL
SELECT AVG(age)
FROM students;
TEXT
+----------+
| avg(age) |
+----------+
| 21.6000 |
+----------+

Find the average of students' age

ROUND

Rounds a number to a specified number of decimal places

SQL
SELECT ROUND(AVG(age))
FROM students;
TEXT
+-----------------+
| ROUND(AVG(age)) |
+-----------------+
| 22 |
+-----------------+

No rounding.

SQL
SELECT ROUND(AVG(age), 2)
FROM students;

Round to 2 decimal place.

SQL
SELECT ROUND(AVG(age), -1)
FROM students;
TEXT
+---------------------+
| ROUND(AVG(age), -1) |
+---------------------+
| 20 |
+---------------------+

Round to -1 decimal places.

AVG & GROUP BY

Try find out average of age by the same major & round it up to 1 decimal

SQL
SELECT major, ROUND(AVG(age), 1) AS "avg age by major"
FROM students
GROUP BY major;
TEXT
+------------------+------------------+
| major | avg age by major |
+------------------+------------------+
| Accounting | 22.0 |
| Art | 26.0 |
| Biochemistry | 18.0 |
| Biology | 22.0 |
| Business | 20.0 |
| Chemistry | 23.0 |
| Computer Science | 21.5 |
| Economics | 18.0 |
| Food Science | 31.0 |
| Japanese 10% | 16.0 |
| Linguistics | 19.0 |
| no_major | 27.0 |
| Physics | 17.0 |
+------------------+------------------+

Duplicate majors are grouped together and shows the relevant average age


WRITTEN BY

Keeping a record