O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Listen Print Subscribe to Databases Subscribe to Newsletters
aboutSQL

Group By

02/23/2001

This week, we continue our warp-speed introduction to SQL by looking at the GROUP BY clause. In the previous article we talked about the ORDER BY clause for sorting query results and covered aggregate functions. This week, we begin to put those two ideas together.

GROUP BY basics

The GROUP BY clause is typically used to combine database records with identical values in a specified field into a single record, usually for the purposes of calculating some sort of aggregate function. The syntax is remarkably similar to the ORDER BY clause.

SELECT ... GROUP BY column_name

While the syntax is simple, the reality is more complex. Each of the database fields involved in the SELECT statement must either be operated on by an aggregate function or otherwise reduced to a single value for all members of the records creating a group.

Also in aboutSQL:

SQL Data Types

Working With Tables

Data Definition Language

Introducing SQL Sets

SQL Subqueries

Counting all of the employees of a large corporation using GROUP BY on the office_location field would work and so would simple returning the values of office_location after grouping on that field.

Selecting the names of those individual employees using GROUP BY would not work because the names for everyone in a single office_location group do not reduce to a single value. There are ways to combine queries to create that sort of a result, but we aren't quite there yet. To add another layer of complexity, you can group on more than one column, just like with the ORDER BY clause.

Another problem with GROUP BY is the behavior of null values. When a column of data is grouped, null values count just like any other value -- and thus all of the "NULLs" are put in one group for the purposes of aggregate functions. This is actually a remarkably sensible way for the underlying SQL algorithms to work and should seem intuitive, but it is important to remember that you'll get one more aggregate value than you expect if the database contains null values unless you somehow filter out those null values (which we'll talk about next week).

GROUP BY in action

Let's revisit a query I talked about a couple weeks ago -- generating a sales report broken down by a particular employee:

SELECT SUM(SalesAmount) WHERE EmployeeID=3

which, given the following table of data

EmployeeID SalesAmount SalesDate
1 150 1/1/01
1 2500 1/4/01
2 25 1/3/01
3 45 1/7/01
3 1000 1/8/01
3 2010 1/8/01

would return a value of $3,055. This sort of query could be useful in a drill-down or detail report, but it is much more likely that a sales manager would ask for the total sales for all of their employees. We could write a separate SQL statement for each, but it should now be obvious to you that the GROUP BY clause would be much more useful for this scenario. Given the same data set, we can use this SQL statement:

SELECT EmployeeID, SUM(SalesAmount) GROUP BY EmployeeID

which would return the result set

EmployeeID SUM(SalesAmount)
1 2650
2 25
3 3055

Note that I selected both the aggregated sum of the SalesAmount field for each employee, but also the EmployeeID field. This is allowed since the EmployeeID is identical for each set of records in a group. We couldn't SELECT the SalesDate field since the records in each group consist of different values for the SalesDate field. Of course if we grouped the records by SalesDate to create a daily report.

SELECT SalesDate, SUM(SalesAmount) GROUP BY SalesDate

SUM(SalesAmount) SalesDate
150 1/1/01
2500 1/4/01
25 1/3/01
45 1/7/01
3010 1/8/01

we can SELECT the SalesDate field which is identical for each of the records in a group, but now we cannot SELECT the EmployeeID since each day's sales will typically be made up of more than one employee's sales.

Next steps

Next we, we'll combine aggregate functions and the GROUP BY clause along with searching using the HAVING clause to complete our whirlwind introduction to data-processing with SQL. After we reach that point, we'll switch gears completely back to database fundamentals and start talking about relationships, keys, and joining tables together. Until then, feel free to contact me with comments and questions.

John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.


Read more aboutSQL columns.




Tagged Articles

Be the first to post this article to del.icio.us

Sponsored Resources

  • Inside Lightroom

Related to this Article

Understanding Oracle Clinical Understanding Oracle Clinical
by Joan M. Johnson
May 2007
$9.99 USD

Inside SQLite Inside SQLite
by Sibsankar Haldar
April 2007
$9.99 USD

Advertisement
O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com