The group by statement is generally used in conjunction with the various aggregate functions in order to produce results that are grouped by one or more columns. This is a very powerful way to report results for groups of results that share some common trait – in the example below the shared trait we are using is generally customer name.
The data used for these examples is shown below. All examples were run against MySQL 5.5 but the SQL syntax used should work on pretty much all databases.
Getting a Sum Total
First off lets use the SUM aggregate function to get a total for all the orders placed at our fictional store:
SELECT SUM(OrderTotal) FROM orders;
This gives us a grand total of £710 which is interesting but it would be better if we could see a total for each customer as we would then know who was spending the most at our store.
Your first guess regarding how to create a per-customer total might be this:
SELECT Customer, SUM(OrderTotal) FROM orders;
Unfortunately though that query won’t return the results you expect. Under MySQL it returns:
other databases may return different results and some may even claim the query is incorrect – I believe MS Access does this. The problem is that the SUM aggregate function only returns a single value but you have asked for all customers. Faced with the dilemma of one part of the query returning multiple rows and the other part returning a single row MySQL has decided that it will present you with the total and arbitrarily pick the first returned customer.
The correct query is actually:
SELECT Customer, SUM(OrderTotal) FROM orders GROUP BY Customer;
which gives the results:
The GROUP BY statement causes the databases to first group the rows by customer before then generating a grand total for each group of rows. As we can see Alice is by far our best customer.
We now know who our best customer is but perhaps we should also find out which is our best day for sales. This can easily be achieved with another grouping like this:
SELECT Date, SUM(OrderTotal) FROM orders GROUP BY Date;
Which gives the following results:
We can see from this table that the 25th of Feb 2012 was a very good day for sales.
Now imagine that we wanted to boost sales on a particular day and that to do that we decide to reward the best customer on that day. Effectively what we need to do is combine the two results above which we do by using two group by clauses like this:
SELECT Customer, Date, SUM(OrderTotal) FROM orders GROUP BY Customer, Date;
which gives this table:
We can see from this table that Alice is the best customer on 2012-02-25 and that Dave was a good customer on 2012-02-05, etc. This table isn’t quite a readable as the simpler groupings but it does produce the results we want. Note how Alice’s purchases on 2012-02-25 have been summed together.
It’s worth pointing out that you can also group by columns that aren’t included in the output. For example, in the query above you could group by date but omit the date column in the output. Although that functionality isn’t useful here it can be when developing a large and complex query.
If we wanted to tidy up the results a little to make them easier to read we could sort by date and order total like this:
SELECT Customer, Date, SUM(OrderTotal) FROM orders GROUP BY Customer, Date ORDER BY Date DESC, SUM(OrderTotal) DESC;
It is now worth reading about the having clause.