SQL Group By

The group by statement is generally used in conjunction with the various aggregate functions in order to produce results that are grouped by one or more columns. This is a very powerful way to report results for groups of results that share some common trait – in the example below the shared trait we are using is generally customer name.

The data used for these examples is shown below. All examples were run against MySQL 5.5 but the SQL syntax used should work on pretty much all databases.

Id Customer OrderTotal Date
1 Alice 100 2012-02-25
2 Alice 125 2012-02-25
3 Alice 80 2012-01-15
4 Bob 50 2012-02-25
5 Bob 80 2012-01-01
6 Carol 100 2012-02-10
7 Dave 25 2012-02-25
8 Dave 150 2012-02-05

Getting a Sum Total

First off lets use the SUM aggregate function to get a total for all the orders placed at our fictional store:

SELECT SUM(OrderTotal) FROM orders;

This gives us a grand total of £710 which is interesting but it would be better if we could see a total for each customer as we would then know who was spending the most at our store.

Customer Totals

Your first guess regarding how to create a per-customer total might be this:

SELECT Customer, SUM(OrderTotal) FROM orders;

Unfortunately though that query won’t return the results you expect. Under MySQL it returns:

Customer SUM(OrderTotal)
Alice 710

other databases may return different results and some may even claim the query is incorrect – I believe MS Access does this. The problem is that the SUM aggregate function only returns a single value but you have asked for all customers. Faced with the dilemma of one part of the query returning multiple rows and the other part returning a single row MySQL has decided that it will present you with the total and arbitrarily pick the first returned customer.

The correct query is actually:

SELECT Customer, SUM(OrderTotal) FROM orders GROUP BY Customer;

which gives the results:

Customer SUM(OrderTotal)
Alice 305
Bob 130
Carol 100
Dave 175

The GROUP BY statement causes the databases to first group the rows by customer before then generating a grand total for each group of rows. As we can see Alice is by far our best customer.

Other Groups

We now know who our best customer is but perhaps we should also find out which is our best day for sales. This can easily be achieved with another grouping like this:

SELECT Date, SUM(OrderTotal) FROM orders GROUP BY Date;

Which gives the following results:

Date SUM(OrderTotal)
2012-01-01 80
2012-01-15 80
2012-02-05 150
2012-02-10 100
2012-02-25 300

We can see from this table that the 25th of Feb 2012 was a very good day for sales.

Multiple Groups

Now imagine that we wanted to boost sales on a particular day and that to do that we decide to reward the best customer on that day. Effectively what we need to do is combine the two results above which we do by using two group by clauses like this:

SELECT Customer, Date, SUM(OrderTotal) FROM orders GROUP BY Customer, Date;

which gives this table:

Customer Date SUM(OrderTotal)
Alice 2012-01-15 80
Alice 2012-02-25 225
Bob 2012-01-01 80
Bob 2012-02-25 50
Carol 2012-02-10 100
Dave 2012-02-05 150
Dave 2012-02-25 25

We can see from this table that Alice is the best customer on 2012-02-25 and that Dave was a good customer on  2012-02-05, etc. This table isn’t quite a readable as the simpler groupings but it does produce the results we want. Note how Alice’s purchases on 2012-02-25 have been summed together.

It’s worth pointing out that you can also group by columns that aren’t included in the output. For example, in the query above you could group by date but omit the date column in the output. Although that functionality isn’t useful here it can be when developing a large and complex query.

If we wanted to tidy up the results a little to make them easier to read we could sort by date and order total like this:

SELECT Customer, Date, SUM(OrderTotal) 
FROM orders GROUP BY Customer, Date 
ORDER BY Date DESC, SUM(OrderTotal) DESC;

It is now worth reading about the having clause.