Structured query language (SQL) is the lingua franca of relational database management systems. Visual C++ 4.0 and Microsoft Access both use SQL exclusively to process queries against desktop, client-server, and mainframe databases. Access includes a graphical query by example (QBE) toolthe query design mode windowto write Access SQL statements for you. You can develop quite sophisticated applications using Access without even looking at an SQL statement in Access's SQL window. Visual C++ 4.0 doesn't include a graphical QBE tool, so until some enterprising third-party developer creates a Query OLE Control, you'll need to learn Access SQL in order to create Visual C++ 4.0 applications that interact in any substantial way with databases.
NOTE
Microsoft Access without a version number refers to Access 2.0 and Access for Windows 95, version 7.0. There are no significant differences between Access SQL in these two versions of Access. Access 2.0 added a substantial number of reserved words to the SQL vocabulary of Access 1.x. SQL statements for adding tables to Access databases, plus adding fields and indexes to Access tables, are discussed in Chapter 8, "Running Crosstab and Action Queries." Microsoft calls the Visual C++ 4.0 dialect of SQL Microsoft Jet Database Engine SQL. This book uses the term Access SQL because the dialect originated in Access 1.0.
The first part of this chapter introduces you to the standardized version of SQL specified by the American National Standards Institute (ANSI), a standard known as X.3.135-1992 and called SQL-92 in this book. (When you see the terms SQL-89 and SQL-92, the reference is to ANSI SQL, not the Access variety.) ANSI SQL-92 has been accepted by the International Standards Organization (ISO), a branch of the United Nations headquartered in Geneva, and the International Electrotechnical Commission (IEC) as ISO/IEC 9075:1992, "Database Language SQL." A separate ANSI standard, X.3.168-1989, defines "Database Language Embedded SQL." Thus, SQL-92 is a thoroughly standardized language, much more so than xBase, for which no independent standards yet exist. Today's client-server RDBMSs support SQL-89 and many of SQL-92's new SQL reserved words; many RDBMSs also add their own reserved words to create proprietary SQL dialects. A knowledge of ANSI SQL is required to use SQL pass-through techniques with the Jet 3.0 database engine and to employ Remote Data Objects (RDO) and Remote Data Control (RDC). SQL pass-through is described in Chapter 20, "Creating Front Ends for Client-Server Databases".
The second part of this chapter, beginning with the section "Comparing the Access SQL Dialect and ODBC," discusses the differences between SQL-92 and Access SQL. If you're fluent in the ANSI versions of SQL, either SQL-89 or SQL-92, you'll probably want to skip to the latter part of this chapter, which deals with the flavor of SQL used by the Jet 3.0 database engine. Chapter 7, "Using the Open Database Connectivity API," describes how the Jet 3.0 database engine translates Access SQL into the format used by ODBC drivers. Although this chapter describes the general SQL syntax for queries that modify data (called action queries by Access and in this book) and the crosstab queries of Access SQL, examples of the use of these types of queries are described in Chapter 8.
Reviewing the Foundations of SQL
Dr. E. F. Codd's relational database model of 1970, discussed in the preceding chapter, was a theoretical description of how relational databases are designed, not how they're used. You need a database application language to create tables and specify the fields that the tables contain, establish relations between tables, and manipulate the data in the database. The first language that Dr. Codd and his associates at the IBM San Jose laboratory defined was Structured English Query Language (SEQUEL), which was designed for use with a prototype relational database that IBM called System R. The second version of SEQUEL was called SEQUEL/2. SEQUEL/2 was later renamed SQL. Technically, SQL is the name of an IBM data manipulation language, not an abbreviation for "structured query language." As you'll see later in this chapter, there are significant differences between IBM's SQL used for its DB2 mainframe and DB2/2 OS/2 databases, and ANSI SQL-92.
The sections that follow describe the differences between SQL and the procedural languages commonly used for computer programming, and how applications use SQL with desktop, client-server, and mainframe databases.
Elements of SQL Statements
This book has made extensive use of the term query without defining what it means. Because Visual C++ 4.0 uses SQL to process all queries, this book defines query as an expression in any dialect of SQL that defines an operation to be performed by a database management system. A query usually contains at least the following three elements:
A verb, such as SELECT, that determines the type of operation
A predicate object that specifies one or more field names of one or more table object(s), such as * to specify all of the fields of a table
A prepositional clause that determines the object(s) in the database on which the verb acts, such as FROM TableName
The simplest SQL query that you can construct is SELECT * FROM TableName, which returns the entire contents of TableName as the query result set. Queries are classified in this book as select queries, which return data (query result sets), or action queries, which modify the data contained in a database without returning any data.
IBM's original version of SQL, implemented as SEQUEL, had relatively few reserved words and simple syntax. Over the years, new reserved words have been added to the language by publishers of database management software. Many of the reserved words in proprietary versions of SQL have found their way into the ANSI SQL standards. Vendors of SQL RDBMSs that claim adherence to the ANSI standards have the option of adding their own reserved words to the language, as long as the added reserved words don't conflict with the usage of the ANSI-specified reserved words. Transact-SQL, the language used by the Microsoft and Sybase versions of SQL Server (both of these products were originally developed from the same product), has many more reserved words than conventional ANSI SQL. Transact-SQL even includes reserved words that allow conditional execution and loops within SQL statements. (The CASE, NULLIF, and COALESCE reserved words of SQL-92 are rather primitive for conditional execution purposes.) Access SQL includes the TRANSFORM and PIVOT statements needed to create crosstab queries that, while missing from ANSI SQL, are a very useful construct. TRANSFORM and PIVOT can be accomplished using ANSI SQL, but the construction of such an ANSI SQL statement would be quite difficult.
A further discussion of the details of the syntax of SQL statements appears after the following sections, which describe the basic characteristics of the SQL language and tell you how to combine SQL and conventional 3GL source code statements.
Differences Between SQL and Procedural Computer Languages
All the dialects of SQL are fourth-generation languages (4GLs). The term fourth-generation derives from the following descriptions of the generations in the development of languages to control the operation of computers:
First-generation languages required that you program in the binary language of the computer's hardware, called object or machine code. (The computer is the object in this case.) As an example, in the early days of mini- and microcomputers, you started (booted) the computer by setting a series of switches that sent instructions directly to the computer's CPU. Once you booted the computer, you could load binary-coded instructions with a paper tape reader. 1GLs represent programming the hard way. The first computer operating systems (OS) were written directly in machine code and loaded from paper tape or punched cards.
Second-generation languages greatly improved the programming process by using assembly language to eliminate the necessity of setting individual bits of CPU instructions. Assembly language lets you use simple alphabetic codescalled mnemonic codes because they're easier to remember than binary instructionsand octal or hexadecimal values to substitute for one or more CPU instructions in the more arcane object code. Once you've written an assembly language program, you compile the assembly code into object code instructions that the CPU can execute. Microsoft's MASM is a popular assembly language compiler for Intel 80x86 CPUs. Assembly language remains widely used today when speed or direct access to the computer hardware is needed.
Third-generation languages, typified by the early versions of FORTRAN (Formula Translator) and BASIC (Beginners' All-purpose Symbolic Instruction Code), let programmers substitute simple statements, usually in a structured version of English, for assembly code. 3GLs are called procedural languages because the statements you write in a 3GL are procedures that the computer executes in the sequence you specify in your program's source code. Theoretically, procedural languages should be independent of the type of CPU for which you compile your source code. Few 3GL languages actually achieve the goal of being fully platform-independent; most, such as Microsoft Visual Basic and Visual C++, are designed for 80x86 CPUs. (You can, however, run Visual C++ applications on Digital Equipment's Alpha workstations and workstations that use the MIPS RISC (Reduced Instruction Set Computer) CPU using Windows NT as the operating system. In this case, the operating system, Windows NT, handles the translation of object code to differing CPUs.)
Fourth-generation languages are often called nonprocedural languages. The source code you write in 4GLs tells the computer the ultimate result you want, not how to achieve it. SQL is generally considered to be a 4GL language because, for example, your SQL query statements specify the data you want the database manager to send you, rather than instructions that tell the DBM how to accomplish this feat. Whether SQL is a true 4GL is subject to controversy, because the SQL statements you write are actually executed by a 3GL or, in some cases, a 2GL language that deals directly with the data stored in the database file(s) and is responsible for sending your application the data in a format that the application can understand.
Regardless of the controversy over whether generic SQL is a 4GL, you need to be aware of some other differences between SQL and conventional 3GLs. The most important of these differences are as follows:
SQL is a set-oriented language, whereas most 3GLs can be called array-oriented languages. SQL returns sets of data in a logical tabular format. The query-return sets are dependent on the data in the database, and you probably won't be able to predict the number of rows (data set members) that a query will return. The number of members of the data set may vary each time you execute a query and also may vary almost instantaneously in a multiuser environment. 3GLs can handle only a fixed number of tabular data elements at a time, specified by the dimensions that you assign to a two-dimensional array variable. Thus, the application needs to know how many columns and rows are contained in the result set of an SQL query so that the application can handle the data with row-by-row, column-by-column methods. Visual C++ 4.0's CRecordSet object handles this transformation for you automatically.
SQL is a weakly typed language, whereas most 3GLs are strongly typed. You don't need to specify field data types in SQL statements; SQL queries return whatever data types have been assigned to the fields that constitute the columns of the query return set. Most compiled 3GL languages are strongly typed. COBOL, C, C++, Pascal, Modula-2, and ADA are examples of strongly typed compiled programming languages. Strongly typed languages require that you declare the names and data types of all your variables before you assign values to the variables. If the data type of a query column doesn't correspond to the data type you defined for the receiving variable, an error (sometimes called an impedance mismatch error) occurs. Visual C++ is a compiled language and is strongly typed.
Consider yourself fortunate that you're using Visual C++ 4.0 to process SQL statements. You don't need to worry about how many rows a query will return or what data types occur in the query result set's columns. The CRecordSet object receiving the data handles all of these details for you. With Visual C++ 4's incremental compile and incremental link, you don't need to recompile and link your entire Visual C++ application each time you change a query statement; just change the statement and rebuild your application. Visual C++ compiles and links only the functions that have been changed. The process is really quite fast.
Types of ANSI SQL
The current ANSI SQL standards recognize four different methods of executing SQL statements. The method you use depends on your application programming environment, as described in the following list:
Interactive SQL lets you enter SQL statements at a command line prompt, similar to dBASE's dot prompt. As mentioned in Chapter 1, "Positioning Visual C++ in the Desktop Database Market," the use of the interactive dBASE command LIST is quite similar to the SELECT statement in interactive SQL. Mainframe and client-server RDBMSs also provide interactive SQL capability; Microsoft SQL Server provides the isql application for this purpose. Using interactive SQL is also called direct invocation. Interactive SQL is called a bulk process; if you enter a query at the SQL prompt, the result of your query appears on-screen. DBMs offer a variety of methods of providing a scrollable display of interactive query result sets.
Embedded SQL lets you execute SQL statements by preceding the SQL statement with a keyword, such as EXEC SQL in C. Typically, you declare variables that you intend to use to receive data from an SQL query between EXEC SQL BEGIN DECLARE SECTION and EXEC SQL END DECLARE SECTION statements. You need a precompiler that is specific to the language and to the RDBMS to be used. The advantage of embedded SQL is that you assign attribute classes to a single variable in a one-step process. The disadvantage is that you have to deal with query-return sets on a row-by-row basis rather than the bulk process of interactive SQL.
Module SQL lets you compile SQL statements separately from your 3GL source code and then link the compiled object modules into your executable program. SQL modules are similar to Visual C++ 4.0 code modules. The modules include declarations of variables and temporary tables to contain query result sets, and you can pass argument values from your 3GL to parameters of procedures declared in SQL modules. The stored procedures that execute precompiled queries on database servers have many characteristics in common with module SQL.
Dynamic SQL lets you create SQL statements whose contents you can't predict when you write the statement. (The preceding SQL types are classified as static SQL.) As an example of dynamic SQL, suppose you want to design a Visual C++ application that can process queries against a variety of databases. Dynamic SQL lets you send queries to the database in the form of strings. For example, you can send a query to the database and obtain detailed information from the database catalog that describes the tables and fields of tables in the database. Once you know the structure of the database, you or the user of your application can construct a custom query that adds the correct field names to the query. Visual C++'s implementation of Access SQL resembles a combination of dynamic and static SQL, although the Access database engine handles the details of reading the catalog information for you automatically when your application creates a Recordset object from the database. Chapter 6, "Understanding the Access Database Engine and DAO," describes the methods you use to extract catalog information contained in Visual C++ collections.
Technically, static SQL and dynamic SQL are called methods of binding SQL statements to database application programs. Binding refers to how you combine or attach SQL statements to your source or object code, how you pass values to SQL statements, and how you process query result sets. A third method of binding SQL statements is the call-level interface (CLI). The Microsoft Open Database Connectivity (ODBC) API uses the CLI developed by the SQL Access Group (SAG), a consortium of RDBMS publishers and users. A CLI accepts SQL statements from your application in the form of strings and passes the statements directly to the server for execution. The server notifies the CLI when the data is available and then returns the data to your application. Details of the ODBC CLI are given in Chapter 7.
If you're a COBOL coder or a C/C++ programmer who is accustomed to writing embedded SQL statements, you'll need to adjust to Visual C++'s automatic creation of virtual tables when you execute a SELECT query, rather than executing CURSOR-related FETCH statements to obtain the query result rows one-by-one.
Writing ANSI SQL Statements
ANSI SQL statements have a very flexible format. Unlike all dialects of BASIC, which separate statements with newline pairs (a carriage return and a line feed), and C, C++, and Pascal, which use semicolons as statement terminators, SQL doesn't require you to separate the elements that constitute a complete SQL statement with newline pairs, semicolons, or even a space in most cases. (SQL ignores most white space, which comprises newline pairs, tabs, and extra spaces.) Thus, you can use white space to format your SQL statements to make them more readable. The examples of SQL statements in this book place groups of related identifiers and SQL reserved words on separate lines and use indentation to identify continued lines. Here's an example of an Access SQL crosstab query statement that uses this formatting convention:
TRANSFORM Sum(CLng([Order Details].UnitPrice*Quantity*
(1 - Discount)*100)/100) AS ProductAmount
SELECT Products.ProductName, Orders.CustomerID
FROM Orders, Products, [Order Details],
Orders INNER JOIN [Order Details] ON Orders.OrderID =
[Order Details].OrderID,
Products INNER JOIN [Order Details] ON Products.ProductID =
[Order Details].ProductID
WHERE Year(OrderDate)=1994
GROUP BY Products.ProductName, Orders.CustomerID
ORDER BY Products.ProductName
PIVOT "Qtr " & DatePart("q",OrderDate) In("Qtr 1",
"Qtr 2","Qtr 3","Qtr 4")
NOTE
The square brackets surrounding the [Order Details] table name are specific to Access SQL and are used to group table or field names that contain spaces or other punctuation that is illegal in the naming rules for tables and fields of SQL RDBMSs. Access SQL also uses the double quotation mark (") to replace the single quotation mark (or apostrophe) ('), which acts as the string identifier character in most implementations of SQL. The preceding example of the SQL statement for a crosstab query is based on the tables in Access 95's Northwind.MDB sample database. Many field names in Access 2.0's NWIND.MDB contain spaces; spaces are removed from field names in NorthWind.MDB.
The sections that follow describe how you categorize SQL statements and how the formal grammar of SQL is represented. They also provide examples of writing a variety of select queries in ANSI SQL.
Categories of SQL Statements
ANSI SQL is divided into the following six basic categories of statements, presented here in the order of most frequent use:
Data-query language (DQL) statements, also called data retrieval statements, obtain data from tables and determine how that data is presented to your application. The SELECT reserved word is the most commonly used verb in DQL (and in all of SQL). Other commonly used DQL reserved words are WHERE, ORDER BY, GROUP BY, and HAVING; these DQL reserved words often are used in conjunction with other categories of SQL statements.
Data-manipulation language (DML) statements include the INSERT, UPDATE, and DELETE verbs, which append, modify, and delete rows in tables, respectively. DML verbs are used to construct action queries. Some books place DQL statements in the DML category.
Transaction-processing language (TPL) statements are used when you need to make sure that all the rows of tables affected by a DML statement are updated at once. TPL statements include BEGIN TRANSACTION, COMMIT, and ROLLBACK.
Data-control language (DCL) statements determine access of individual users and groups of users to objects in the database through permissions that you GRANT or REVOKE. Some RDBMSs let you GRANT permissions to or REVOKE permissions from individual columns of tables.
Data-definition language (DDL) statements let you create new tables in a database (CREATE TABLE), add indexes to tables (CREATE INDEX), establish constraints on field values (NOT NULL, CHECK, and CONSTRAINT), define relations between tables (PRIMARY KEY, FOREIGN KEY, and REFERENCES), and delete tables and indexes (DROP TABLE and DROP INDEX). DDL also includes many reserved words that relate to obtaining data from the database catalog. This book classifies DDL queries as action queries because DDL queries don't return records.
Cursor-control language (CCL) statements, such as DECLARE CURSOR, FETCHINTO, and UPDATE WHERE CURRENT, operate on individual rows of one or more tables.
It's not obligatory that a publisher of a DBM who claims to conform to ANSI SQL support all of the reserved words in the SQL-92 standard. In fact, it's probably safe to state that, at the time this book was written, no commercial RDBMS implemented all the SQL-92 keywords for interactive SQL. The Jet 3.0 database engine, for example, doesn't support any DCL reserved words. You use the Data Access Object's programmatic security objects with Visual C++ reserved words and keywords instead. The Jet 3.0 engine doesn't need to support CCL statements, because neither Visual C++ 4.0 nor Access manipulates cursors per se. Visual C++ 4.0's Data control creates the equivalent of a scrollable (bidirectionally movable) cursor. The Remote Data Object supports the scrollable cursors of Microsoft SQL Server 6.0.
This book uses the terminology defined by Appendix C of the Programmer's Reference for the Microsoft ODBC Software Development Kit (SDK) to define the following levels of SQL grammatical compliance:
Minimum: The statements (grammar) that barely qualify a DBM as an SQL DBM but not an RDBMS. A DBM that provides only the minimum grammar is not salable in today's market.
Core: Comprising minimum grammar plus basic DDL and DCL commands, additional DML functions, data types other than CHAR, SQL aggregate functions such as SUM() and AVG(), and a wider variety of allowable expressions to select records. Most desktop DBMs, to which SQL has been added, support core SQL grammar and little more.
Extended: Comprising minimum and core grammar, plus DML outer joins, more complex expressions in DML statements, all ANSI SQL data types (as well as long varchar and long varbinary), batch SQL statements, and procedure calls. Extended SQL grammar has two levels of conformance1 and 2.
The Formal Grammar of SQL
The formal grammar of SQL is represented in the Backus Naur Form (BNF), which is used to specify the formal grammar of many computer programming languages. Here is the full BNF form of the verb that specifies the operation that a query is to perform on a database:
To use BNF representation, you locate the class (<action> in the preceding example) where the reserved word is included. Members of the class are separated by the vertical bar (|) character. Optional parameters of reserved words and elements are enclosed in square brackets ([]). Literal values, such as <privilegecolumnlist>, are enclosed in angle braces (<>), and elements that must be grouped, such as a comma preceding a second <columnname>, are enclosed in French braces ({}). You then search the list of elements to find the allowable composition of an element. In this example, the <privilegecolumnlist> is composed of the <columnnamelist>. Then check to see if <columnnamelist> has a composition (in this case, one or more <columnname> elements). This process is tedious, especially when the elements aren't arranged in alphabetical order.
Microsoft uses a simplified form of BNF to describe the grammar supported by the present version of the ODBC API. The Access SQL syntax rules eliminate the use of the ::= characters to indicate the allowable substitution of values for an element. Instead, they substitute a tabular format, as shown in Table 5.1. Ellipses (...) in the table indicate that you have to search for the element; the element is not contiguous with the preceding element of the table.
Table 5.1. The partial syntax of the Access SQL SELECT statement.
The DISTINCTROW qualifier and the querydef-name element are specific to Access SQL. DISTINCTROW is discussed in the section "Theta Joins and the DISTINCTROW Keyword." Chapter 6 describes the Access QueryDef object.
After you've looked up all the allowable forms of the elements in the table, you might have forgotten the key word whose syntax you set out to determine. The modified Backus Naur form used by Microsoft is unquestionably easier to use than full BNF.
The Practical Grammar of a Simple SQL SELECT Statement
Here is a more practical representation of the syntax of a typical ANSI SQL statement, substituting underscores for hyphens:
SELECT [ALL|DISTINCT] select_list
FROM table_names
[WHERE {search_criteria|join_criteria}
[{AND|OR search_criteria}]
[ORDER BY {field_list} [ASC|DESC]]
The following list explains the use of each SQL reserved word in the preceding statement:
SELECT specifies that the query is to return data from the database rather than modify the data in the database. The select_list element contains the names of the fields of the table that are to appear in the query. Multiple fields appear in a comma-separated list. The asterisk (*) specifies that data from all fields of a table is returned. If more than one table is involved (joined) in the query, you use the table_name.field_name syntax, in which the period (.) separates the name of the table from the name of the field.
The ALL qualifier specifies that you want the query to return all rows, regardless of duplicate values; DISTINCT returns only nonduplicate rows. These qualifiers have significance only in queries that involve joins. The penalty for using DISTINCT is that the query will take longer to process.
FROM begins a clause that specifies the names of the tables that contain the fields you include in your select_list. If more than one table is involved in select_list,table_list consists of comma-separated table names.
WHERE begins a clause that serves two purposes in ANSI SQL: specifying the fields on which tables are joined, and limiting the records returned to records with field values that meet a particular criterion or set of criteria. The WHERE clause must include an operator and two operands, the first of which must be a field name. (The field name doesn't need to appear in the select_list, but the table_name that includes field_name must be included in the table_names list.)
SQL operators include LIKE, IS {NULL|NOT NULL}, and IN, as well as the arithmetic operators<, <=, =, =>, >, and <>. If you use the arithmetic equal operator (=) and specify table_name.field_name values for both operands, you create an equi-join (also called an inner join) between the two tables on the specified fields. You can create left and right joins by using the special operators *= and =*, respectively, if your DBM supports outer joins. (Both left and right joins are called outer joins.) Types of joins are discussed in the section "Joining Tables."
NOTE
If you use more than one table in your query, make sure that you create a join between the tables with a WHERE Table1.field_name = Table2.field_name clause. If you omit the statement that creates the join, your query will return the Cartesian product of the two tables. A Cartesian product is all the combinations of fields and rows in the two tables. This results in extremely large query-return set and, if the tables have a large number of records, it can cause your computer to run out of memory to hold the query result set. (The term Cartesian is derived from the name of a famous French mathematician, René Déscartes.)
ORDER BY defines a clause that determines the sort order of the records returned by the SELECT statement. You specify the field(s) on which you want to sort the query result set by the table_names list. You can specify a descending sort with the DESC qualifier; ascending (ASC) is the default. As in other lists, if you have more than one table_name, you use a comma-separated list. You use the table_name.field_name specifier if you have joined tables.
Depending on the dialect of SQL your database uses and the method of transmitting the SQL statement to the DBM, you might need to terminate the SQL statement with a semicolon. (Access SQL no longer requires the semicolon; statements you send directly to the server through the ODBC driver don't use terminating semicolons.)
Using the MS Query Application to Explore Queries
The MS Query application that accompanies Visual C++ version 1.5 (\MSVC15\MSQUERY) is an excellent application that can be used to create SQL statements. Visual C++ 2.x and 4 don't include MS Query; however, because Visual C++ 1.5 is included with later versions of Visual C++, you can install that version from Visual C++ 1.5. Also, when you purchase Microsoft Office, you will receive a 32-bit version of Microsoft Query. It can be found on the Microsoft Office Pro CD in the \OS\MSAPPS\MSQUERY folder.
NOTE
BIBLIO.MDB is included on the CD that comes with this book in the CHAPTR05 folder. Visual Basic users will have an Access database called BIBLIO, which is included with Visual Basic. Visual C++ users don't have this sample database. If you have Visual Basic, you can use the copy of BIBLIO included with Visual Basic or the copy included on the CD in the CHAPTR05 folder.
NOTE
Query as found on the Visual C++ 1.5x CD (a 16-bit application) works only with the Access 2 version of BIBLIO. It might not work correctly with the second version, called BIBLIO 95, which is an Access 7 version of the database. MSQRY32 (the 32-bit version of MS Query, which is on the Microsoft Office CD) will work with the Access 7 version of BIBLIO. The 32-bit version of MS Query is a bit more reliable and should be used if possible.
MS Query falls into the category of ad hoc query generators. You can use MS Query to test some simple SQL statements by following these steps:
Start MS Query and choose File | New Query.
MS Query displays the Select Data Source dialog box. Select the BIBLIO datasource. If you haven't previously opened BIBLIO using MS Query, click the Other button to add BIBLIO to MS Query's list of datasources.
MS Query displays the Add Tables dialog box. Select the Authors table and click the Add button. Then click the Close button.
MS Query displays its Query 1 MDI child window. Click the SQL button on the toolbar.
Enter SELECT * FROM Authors in the SQL Statement window as a simple query to check whether MS Query works, as shown in Figure 5.1.
Access SQL statements require a semicolon statement terminator. The MS Query application doesn't need a semicolon at the end of the SQL statement. Adding a semicolon will disable MS Query's graphical query representation, but the query will still work as expected.
Click the OK button in the SQL dialog box. The query result set appears in the child window.
Click the SQL toolbar button. The query, reformatted to fully qualify all names, appears in the SQL dialog box. The query now reads SELECT Authors.Au_ID, Authors.Author FROM Authors Authors. Figure 5.2 shows a portion of the query result set and the reformatted SQL query.
A typical result of this type of query, which returns 46 rows in .0547 seconds, is approximately 840 rows/second. A 486DX2/66 with local bus video and 16M of RAM was used for these tests. These rates represent quite acceptable performance for a Windows database front end.
Reopen the SQL dialog box and clear the current SQL query edit box of the SQL Statement window. Enter SELECT * FROM Publishers WHERE State = 'NY' in the SQL Statement window and then click the OK button. The results of this query appear in Figure 5.3.
In this case, the query-data return rate and the display rate are about 24 rows per second. The query-data return rate was reduced because there are more columns in the Publishers table (eight) than in the Authors table (two). However, if the query-data return rate is inversely proportional to the number of columns, the rate should be 840 * 2 / 8, or 210 rows per second. The extrapolated grid display rate, 170 * 2 / 8, is 42.5 rows per second, which is closer to the 24 rows per second rate of the prior example and can be accounted for by the greater average length of the data in the fields. Part of the difference between 24 and 210 rows per second for the query-data return rate is because the Access database engine must load the data from the table on the fixed disk into a temporary buffer. If you run the query again, you'll find that the rate increases to 8 / 0.0625, or 128 rows per second. The remainder of the difference in the query-data return rate is because the Access database engine must test each value of the State field for 'NY'.
Open the SQL dialog box again and add ORDER BY Zip to the end of your SQL statement. Figure 5.4 shows the query and its result.
The data return rate will now have dropped to about 68 rows per second. The decrease in speed can be attributed to the sort operation that you added to the query. The data-return rates and data-display rates you achieve will depend on the speed of the computer you use. As a rule, each clause you add to a query will decrease the data-return rate because of the additional data-manipulation operations that are required.
Replace the * in the SELECT statement, which returns all fields, with PubID, 'Company Name', City so that only three of the fields appear in the SnapShot window. The result, shown in Figure 5.5, demonstrates that you don't have to include the fields that you use for the WHERE and ORDER BY clauses in the field_names list of your SELECT statement.
The single quotes (') surrounding Company Name are necessary when a field name or table name contains a space. Only Access databases permit spaces and punctuation other than the underscore (_) in field names. Using spaces in field and table names, or in the names of any other database objects, is not considered good database-programming practice. Spaces in database field names and table names appear in this book only when such names are included in sample databases created by others.
NOTE
The MS Query toolbar provides a number of buttons that let you search for records in the table, filter the records so that only selected records appear, and sort the records on selected fields. A filter is the equivalent of adding a WHERE field_namewhere_expression clause to your SQL statement. The sort buttons add an ORDER BY field_names clause.
Microsoft designed the MS Query application to demonstrate the features of SQL and ODBC that pertain to manipulating and displaying data contained in the tables of databases. MS Query is a rich source of SQL examples. It also contains useful examples of user interface design techniques for database applications and MDI child forms.
SQL Operators and Expressions
As I mentioned earlier in this chapter, SQL provides the basic arithmetic operators (<, <=, =, =>, >, and <>). SQL also has a set of operators that are used in conjunction with values of fields of the text data type (LIKE and IN) and that deal with NULL values in fields (IS NULL and IS NOT NULL). The Access database engine also supports the use of many string and numeric functions in SQL statements to calculate column values of query return sets. (Few of these functions are included in ANSI SQL.)
NOTE
Access supports the use of user-defined functions (UDFs) in SQL statements to calculate column values in queries. Visual C++ and ODBC support only native functions that are reserved words, such as Val(). Functions other than SQL aggregate functions are called implementation-specific in ANSI SQL. Implementation-specific means that the supplier of the DBM is free to add functions to the supplier's implementation of ANSI SQL.
The majority of the operators you use in SQL statements are dyadic. Dyadic functions require two operands. (All arithmetic functions and BETWEEN are dyadic.) Operators such as LIKE, IN, IS NULL, and IS NOT NULL are monadic. Monadic operators require only one operand. All expressions that you create with comparison operators return True or False, not a value. The sections that follow describe the use of the common dyadic and monadic operators of ANSI SQL.
Dyadic Arithmetic Operators and Functions
The use of arithmetic operators with SQL doesn't differ greatly from their use in Visual C++ or other computer languages. The following is a list of the points you need to remember about arithmetic operators and functions used in SQL statements (especially in WHERE clauses):
The = and <> comparison operators are used for both text and numeric field data types. The angle-brace pair "not-equal" symbol (<>) is equivalent to the != combination used to represent "not equal" in ANSI SQL. (The equals sign isn't used as an assignment operator in SQL.)
The arithmetic comparison operators<, <=, =>, and >are intended primarily for use with operands that have numeric field data types. If you use the preceding comparison operators with values of the text field data type, the numeric ANSI values of each character of the two fields are compared in left-to-right sequence.
The remaining arithmetic operators+, -, *, /, and ^ or ** (the implementation-specific exponentiation operator )aren't comparison operators. These operators apply only to calculated columns of query result sets, the subject of the next section.
To compare the values of text fields that represent numbers, such as the zip code field of the Publishers table of BIBLIO.MDB, you can use the Val() function in a WHERE clause to process the text values as the numeric equivalent when you use the Access database engine. An example of this usage is SELECT * FROM Publishers WHERE Zip > 12000.
NOTE
If you attempt to execute the preceding SQL statement in MS Query (but not MSQRY32), you might receive an error message (usually with no text), or sometimes MS Query will simply GPF. The error is caused by Null values in the Zip Code data cells of several publishers in the table. Most expressions don't accept Null argument values. Thus, you need to add an IS NOT NULL criterion to your WHERE clause. If you use the form WHERE (Zip > '12000' AND (Zip IS NOT NULL), you get the same error message, because the order in which the criteria are processed is the sequence in which the criteria appear in your SQL statement. Using WHERE Zip IS NOT NULL AND Zip > '12000' solves the problem. The syntax of the NULL predicates is explained in the section "Monadic Text Operators, Null Value Predicates, and Functions."
The BETWEEN predicate in ANSI SQL and the Between operator in Access SQL are used with numeric or date-time field data types. The syntax is field_name BETWEEN Value1 AND Value2. This syntax is equivalent to the expression field_name => Value1 OR field_name <= Value2. Access SQL requires you to surround date-time values with number signs (#), as in DateField Between #1-1-93# And #12-31-93#. You can negate the BETWEEN predicate by preceding BETWEEN with NOT.
NOTE
Where Access SQL uses syntax that isn't specified by ANSI SQL, such as the use of number signs (#) to indicate date-time field data types, or where examples of complete statements are given in Access SQL, the SQL reserved words that are also keywords or reserved words appear in the upper-and-lowercase convention.
Calculated Query Columns
Using Access, you can create calculated columns in query return sets by defining fields that use SQL arithmetic operators and functions that are supported by the Access database engine or your client-server RDBMS. Ordinarily, calculated columns are derived from fields of numeric field data types. BIBLIO.MDB uses a numeric data type (the auto-incrementing long integer Counter field) for ID fields, so you can use the PubID field or Val(Zip) expression as the basis for the calculated field. Enter SELECT DISTINCTROW Publishers.Name, Val([Zip])*3 AS Zip_Times_3, Publishers.State FROM Publishers in Access's SQL query window. The query result set appears as shown in Figure 5.6.
The AS qualifier designates an alias for the column name, column_alias. If you don't supply the AS column_alias qualifier, the column name is empty when you use the Access database engine. Access provides a default AS Expr1 column alias for calculated columns; the column_alias that appears when you use ODBC to connect to databases is implementation-specific. IBM's DB2 and DB2/2, for example, don't support aliasing of column names with the AS qualifier. ODBC drivers for DB2 and DB2/2 may assign the field name from which the calculated column value is derived, or apply an arbitrary name, such as Col_1.
NOTE
If you must include spaces in the column_alias, make sure that you enclose the column_alias in square brackets for the Access database engine and in single quotation marks for RDBMSs that support spaces in column_alias fields. (Although you might see column names such as Col 1 when you execute queries against DB2 or other mainframe databases in an emulated 3270 terminal session, these column_alias values are generated by the local query tool running on your PC, not by DB2.) If you use single or double quotation marks with the Access database engine, these quotation marks appear in the column headers.
Monadic Text Operators, Null Value Predicates, and Functions
One of the most useful operators for the WHERE criterion of fields of the text field data type is ANSI SQL's LIKE predicate, called the Like operator in Access SQL. (The terms predicate and operator are used interchangeably in this context.) The LIKE predicate lets you search for one or more characters you specify at any location in the text. Table 5.2 shows the syntax of the ANSI SQL LIKE predicate and the Access SQL Like operator used in the WHERE clause of an SQL statement.
Table 5.2. Forms of the ANSI SQL LIKE and Access SQL Like predicates.
ANSI SQL
Access SQL
Description
What It Returns
LIKE '%am%'
Like "*am*"
Matches any text that contains the characters.
ram, rams, damsel, amnesty
LIKE 'John%'
Like "John*"
beginning with the characters.
Johnson, Johnsson
LIKE '%son'
Like "*son"
ending with the characters.
Johnson, Anderson
LIKE 'Glen_'
Like "Glen?"
Matches the text and any single trailing character.
Glenn, Glens
LIKE '_am'
Like "?am"
Matches the text and any single preceding character.
dam, Pam, ram
LIKE '_am%'
Like "_am*"
with one preceding character and any trailing characters.
dams, Pam, Ramses
The IS NULL and IS NOT NULL predicates (Is Null and Is Not Null operators in Access SQL) test whether a value has been entered in a field. IS NULL returns False and IS NOT NULL returns True if a value, including an empty string "" or 0, is present in the field.
The SQL-92 POSITION() function returns the position of characters in a test field using the syntax POSITION(characters IN field_name). The equivalent Access SQL function is InStr(field_name, characters). If characters are not found in field_name, both functions return 0.
The SQL-92 SUBSTRING() function returns a set of characters with SUBSTRING(field_name FROM start_position FOR number_of_characters). This function is quite useful for selecting and parsing text fields.
Joining Tables
As I mentioned earlier in this chapter, you can join two tables by using table_name.field_name operands with a comparison operator in the WHERE clause of an SQL statement. You can join additional tables by combining two sets of join statements with the AND operator. SQL-86 and SQL-89 supported only WHERE joins. You can create equi-joins, natural equi-joins, left and right equi-joins, not-equal joins, and self-joins with the WHERE clause. Joins that are created with the equals (=) operator use the prefix equi.
SQL-92 added the JOIN reserved words, plus the CROSS, NATURAL, INNER, OUTER, FULL, LEFT, and RIGHT qualifiers, to describe a variety of JOINs. At the time this book was written, few RDBMSs supported the JOIN statement. (Microsoft SQL Server 4.2, for example, doesn't include the JOIN statement in Transact-SQL.) Access SQL supports INNER, LEFT, and RIGHT JOINs with SQL-92 syntax using the ON predicate. Access SQL doesn't support the USING clause or the CROSS, NATURAL, or FULL qualifiers for JOINs.
A CROSS JOIN returns the Cartesian product of two tables. The term CROSS is derived from cross-product, a synonym for Cartesian product. You can emulate a CROSS JOIN by leaving out the join components of the WHERE clause of a SELECT statement that includes a table name from more than one table. Figure 5.7 shows Access 95 displaying the first few rows of the 29-row Cartesian product created when you enter SELECT Publishers.Name, Authors.Author FROM Publishers, Authors in the SQL Statement window. There are seven Publishers records and 42 Authors records; thus, the query returns 294 rows (7 * 42 = 294). It is highly unlikely that you would want to create a CROSS JOIN in a commercial database application.
The common types of joins that you can create with SQL-89 and Access SQL are described in the following sections.
NOTE
All joins except the CROSS JOIN or Cartesian product require that the field data types of the two fields be identical or that you use a function (where supported by the RDBMS) to convert dissimilar field data types to a common type.
Conventional Inner or Equi-Joins
The most common type of join is the equi-join or INNER JOIN. You create an equi-join with a WHERE clause using the following generalized statement:
SELECT Table1.field_name, ... Table2.field_name ...
FROM Table1, Table2
WHERE Table1.field_name = Table2.field_name
The SQL-92 JOIN syntax to achieve the same result is as follows:
SELECT Table1.field_name, ... Table2.field_name ...
FROM Table1 INNER JOIN Table2
ON Table1.field_name = Table2.field_name
A single-column equi-join between the PubID field of the Publishers table and the PubID field of the Titles table of the BIBLIO.MAK table appears as follows:
SELECT Publishers.Name, Titles.ISBN, Titles.Title
FROM Publishers INNER JOIN Titles
ON Publishers.PubID = Titles.PubID;
When you execute this query with Access 95, the Publishers and Titles tables are joined by the PubID columns of both fields. Figure 5.8 shows the result of this join.
The INNER qualifier is optional in SQL-92 but is required in Access SQL. If you omit the INNER qualifier when you use the Access database engine, you receive the message Syntax error in FROM clause when you attempt to execute the query.
NOTE
Natural equi-joins create joins automatically between identically named fields of two tables. Natural equi-joins eliminate the necessity of including the ON predicate in the JOIN statement. Access SQL doesn't support the NATURAL JOIN statement.
The Access SQL statements that you create in the graphical QBE design mode of Access generate an expanded JOIN syntax. Access separates the JOIN statement from a complete FROM clause with a comma and repeats the table names in a separate, fully defined join statement. Using the Access SQL syntax shown in the following example gives the same result as the preceding ANSI SQL-92 example:
SELECT DISTINCTROW Publishers.Name, Titles.ISBN, Titles.Title
FROM Publishers, Titles,
Publishers INNER JOIN Titles
ON Publishers.PubID = Titles.PubID
The purpose of the DISTINCTROW statement in Access SQL is discussed in the section "Comparing the Access SQL Dialect and ODBC" later in this chapter.
Here is the equivalent of the two preceding syntax examples, using the WHERE clause to create the join:
SELECT Publishers.Name, Titles.ISBN, Titles.Title
FROM Publishers, Titles
WHERE Publishers.PubID = Titles.PubID
There is no difference between using the INNER JOIN and the WHERE clause to create an equi-join.
NOTE
Equi-joins return only rows in which the values of the joined fields match. Field values of records of either table, which don't have matching values in the other table, don't appear in the query result set returned by an equi-join. If there is no match between any of the records, no rows are returned. A query without rows is called a null set.
Multiple Equi-Joins
You can create multiple equi-joins to link several tables by pairs of fields with common data values. For example, you can link the Publishers, Titles, and Authors tables of BIBLIO.MAK with the following SQL-92 statement:
SELECT Publishers.Name, Titles.Title, Titles.Au_ID, Authors.Author
FROM Publishers INNER JOIN Titles
ON Publishers.PubID = Titles.PubID,
INNER JOIN Authors
ON Titles.Au_ID = Authors.Au_ID
You need to include the Titles.Au_ID field in the query because the second join is based on the result set returned by the first join.
Access SQL, however, requires that you explicitly define each INNER JOIN with the following syntax:
SELECT DISTINCTROW Publishers.Name, Titles.Title, Titles.Au_ID,
Authors.Author
FROM Publishers, Titles, Authors,
Publishers INNER JOIN Titles
ON Publishers.PubID = Titles.PubID,
Titles INNER JOIN Authors
ON Titles.Au_ID = Authors.Au_ID
The query result set from the preceding Access SQL query appears in Figure 5.9.
Here is the equivalent of the preceding example using the WHERE clause:
SELECT Publishers.Name, Titles.Title, Titles.Au_ID,
Authors.Author
FROM Publishers, Titles, Authors,
WHERE Publishers.PubID = Titles.PubID AND
Titles.Au_ID = Authors.Au_ID
NOTE
As a rule, using the WHERE clause to specify equi-joins results in simpler query statements than specifying INNER JOINs. When you need to create OUTER JOINs, the subject of the next section, you might want to use INNER JOIN statements to maintain consistency in Access SQL statements.
OUTERJOINs
INNER JOINs (equi-joins) return only rows with matching field values. OUTER JOINs return all the rows of one table and only those rows in the other table that have matching values. There are two types of OUTER JOINs:
LEFT OUTER JOINs return all rows of the table or result set to the left of the LEFT OUTER JOIN statement and only the rows of the table to the right of the statement that have matching field values. In WHERE clauses, LEFT OUTER JOINs are specified with the *= operator.
RIGHT OUTER JOINs return all rows of the table or result set to the right of the RIGHT OUTER JOIN statement and only the rows of the table to the left of the statement that have matching field values. WHERE clauses specify RIGHT OUTER JOINs with the =* operator.
It is a convention that joins are created in one-to-many form; that is, the primary table that represents the "one" side of the relation appears to the left of the JOIN expression, or the operator of the WHERE clause and the related table of the "many" side appears to the right of the expression or operator. You use LEFT OUTER JOINs to display all of the records of the primary table, regardless of matching records in the related table. RIGHT OUTER JOINs are useful for finding orphan records. Orphan records are records in related tables that have no related records in the primary tables. They are created when you violate referential integrity rules.
The SQL-92 syntax for a statement that returns all Publishers records, regardless of matching values in the Titles table, and all Titles records, whether authors for individual titles are identified, is as follows:
SELECT Publishers.Name, Titles.Title, Titles.Au_ID, Authors.Author
FROM Publishers LEFT OUTER JOIN Titles
ON Publishers.PubID = Titles.PubID,
LEFT OUTER JOIN Authors
ON Titles.Au_ID = Authors.Au_ID
The equivalent joins using the WHERE clause are created by the following query:
SELECT Publishers.Name, Titles.Title, Titles.Au_ID,
Authors.Author
FROM Publishers, Titles, Authors,
WHERE Publishers.PubID *= Titles.PubID AND
Titles.Au_ID *= Authors.Au_ID
Access SQL requires you to use the special syntax described in the preceding section, and it doesn't permit you to add the OUTER reserved word in the JOIN statement. Here is the Access SQL equivalent of the previous query example:
SELECT DISTINCTROW Publishers.Name, Titles.Title, Titles.Au_ID,
Authors.Author
FROM Publishers, Titles, Authors,
Publishers LEFT JOIN Titles
ON Publishers.PubID = Titles.PubID,
Titles LEFT JOIN Authors
ON Titles.Au_ID = Authors.Au_ID
Figure 5.10 shows the result of running the preceding query against the BIBLIO.MDB database.
Access SQL doesn't support the *= and =* operators in WHERE clauses. You need to use the LEFT JOIN or RIGHT JOIN reserved words to create outer joins when you use the Access database engine. This restriction doesn't apply to SQL pass-through queries that you execute on servers that support *= and =* operators, such as Microsoft and Sybase SQL Server.
Theta Joins and the DISTINCTROW Keyword
You can create joins using comparison operators other than =, *=, and =*. Joins that are not equi-joins are called theta joins. The most common form of theta join is the not-equal (theta) join, which uses the WHERE table_name.field_name <> table_name.field_name syntax. The BIBLIO.MDB database doesn't contain tables with fields that lend themselves to demonstrating not-equal joins. However, if you have a copy of Access's NorthWind.MDB sample database, you can execute an Access SQL query to find records in the Orders table that have a Ship Address value that differs from the Address value in the Customers field by employing the following query:
SELECT DISTINCTROW Customers.[Company Name], Customers.Address,
Orders.[Ship Address]
FROM Customers, Orders,
Customers INNER JOIN Orders
ON Customers.[Customer ID] = Orders.[Customer ID]
WHERE ((Orders.[Ship Address]<>[Customers].[Address]))
The preceding query results in the query return set shown in Figure 5.11.
If you execute the same query without Access SQL's DISTINCTROW qualifier, you get the same result. However, if you substitute the ANSI SQL DISTINCT qualifier for Access SQL's DISTINCTROW, the result is distinctly different, as shown in Figure 5.12.
The query result set shown in Figure 5.12 is created by the following statement, which is the same in Access SQL and ANSI SQL, disregarding the unconventional table names enclosed in square brackets:
SELECT DISTINCT Customers.[Company Name], Customers.Address,
Orders.[Ship Address]
FROM Customers, Orders
WHERE Customers.[Customer ID] = Orders.[Customer ID]
AND Orders.[Ship Address] <> Customers.Address
The DISTINCT qualifier specifies that only rows that have differing values in the fields specified in the SELECT statement should be returned by the query. Access SQL's DISTINCTROW qualifier causes the return set to include each row in which any of the values of all of the fields in the two tables (not just the fields specified to be displayed by the SELECT statement) differ.
Self-Joins and Composite Columns
A self-join is a join created between two fields of the same table having similar field data types. The first field is usually the primary key field, and the second field of the join ordinarily is a foreign key field that relates to the primary key field, although this isn't a requirement for a self-join. (This may be a requirement to make the result of the self-join meaningful, however.)
When you create a self-join, the DBM creates a copy of the original table and then joins the copy to the original table. No tables in BIBLIO.MDB offer fields on which you can create a meaningful self-join. The Employees table of NorthWind.MDB, however, includes the Reports To field, which specifies the Employee ID of an employee's supervisor. Here is the Access SQL statement to create a self-join on the Employee table to display the name of an employee's supervisor:
SELECT Employees.[Employee ID] AS EmpID,
Employees.[Last Name] & ", " & Employees.[First Name]
AS Employee,
Employees.[Reports To]
AS SupID,
EmpCopy.[Last Name] & ", " & EmpCopy.[First Name]
AS Supervisor
FROM Employees, Employees
AS EmpCopy,
Employees INNER JOIN EmpCopy
ON Employees.[Reports To] = EmpCopy.[Employee ID]
You create a temporary copy of the table, named EmpCopy, with the FROM... Employees AS EmpCopy clause. Each of the query's field names is aliased with an AS qualifier. The Employee and Supervisor columns are composite columns whose values are created by combining a last name, a comma, and a space with the first name. The query result set from the preceding SQL statement appears in Figure 5.13.
ANSI SQL doesn't provide a SELF INNER JOIN, but you can create the equivalent by using the ANSI version of the preceding statement. You can substitute a WHERE Employees.[Reports To] = EmpCopy.[Employee ID] clause for the INNER JOIN...ON statement.
NOTE
Self-joins are relatively uncommon, because a table that is normalized to fourth normal form wouldn't include an equivalent of the Reports To field. A separate table would relate the Employee ID values of employees and supervisors. However, creating a separate table to contain information that can be held in single table without ambiguity is generally considered over-normalization. This is the primary reason that most developers stop normalizing tables at the third normal form.
SQL Aggregate Functions and the GROUP BY and HAVING Clauses
ANSI SQL includes set functions (called SQL aggregate functions in this book), which act on sets of records. The standard SQL-92 aggregate functions are described in the following list. The field_name argument of the functions can be the name of a field (with a table_name. specifier, if required) or the all-fields specifier, the asterisk (*).
COUNT(field_name) returns the number of rows that contain NOT NULL values of field_name. COUNT(*) returns the number of rows in the table or query without regard for NULL values in fields.
MAX(field_name) returns the largest value of field_name in the set.
MIN(field_name) returns the smallest value of field_name in the set.
SUM(field_name) returns the total value of field_name in the set.
AVG(field_name) returns the arithmetic average (mean) value of field_name in the set.
The SQL aggregate functions can act on persistent tables or virtual tables, such as query result sets. Here is the basic syntax of queries that use the SQL aggregate functions:
SELECT FUNCTION(field_name|*) [AS column_alias]
This example returns a single record with the value of the SQL aggregate function you choose. You can test the SQL aggregate functions with BIBLIO.MDB using the following query:
SELECT COUNT(*) AS Count,
SUM(PubID) AS Total,
AVG(PubID) AS Average,
MIN(PubID) AS Minimum,
MAX(PubID) AS Maximum
FROM Publishers
Figure 5.14 shows the result of the preceding aggregate query.
Databases with significant content usually have tables that contain fields representing the classification of objects. The BIBLIO.MDB database doesn't have such a classification, but the Products tables of NorthWind.MDB classifies an eclectic assortment of exotic foodstuffs into eight different categories. You use the GROUP BY clause when you want to obtain values of the SQL aggregate functions for each class of an object. The GROUP BY clause creates a virtual table called, not surprisingly, a grouped table.
The following Access SQL query counts the number of items in each of the eight food categories included in the Category ID field and then calculates three total and average values for each of the categories:
SELECT [Category ID] AS Category,
COUNT(*) AS Items,
Format(AVG([Unit Price]), "$#,##0.00") AS Avg_UP,
SUM([Units in Stock]) AS Sum_Stock,
SUM([Units on Order]) AS Sum_Ordered
FROM Products
GROUP BY [Category ID]
NOTE
The preceding query uses the Access SQL Format() function to format the values returned for average unit price (Avg_UP) in conventional monetary format. This feature is not found in ANSI SQL.
The result of the preceding query appears in Figure 5.15.
You might want to restrict group (category) membership using a particular criteria. You might think that you could use a WHERE clause to establish the criteria, but WHERE clauses apply to the entire table. The HAVING clause acts like a WHERE clause for groups. Therefore, if you want to limit the applicability of SQL aggregate functions to a particular set or group, you would add the HAVING clause and the IN() operator, as in the following Access SQL example, which returns only rows for the BEVR and COND categories:
SELECT [Category ID] AS Category,
COUNT(*) AS Items,
Format(AVG([Unit Price]), "$#,##0.00") AS Avg_UP,
SUM([Units in Stock]) AS Sum_Stock,
SUM([Units on Order]) AS Sum_Ordered
FROM Products
GROUP BY [Category ID]
HAVING [Category ID] IN('BEVR', 'COND')
The result of the preceding query appears in Figure 5.16.
The preceding sections of this chapter have outlined many of the syntactical differences between Access SQL and ANSI SQL-92 (plus earlier versions of ANSI SQL, such as SQL-86 and SQL-89). Here are some of the more important differences between the present implementation of Access SQL (based on the Access definition of Access SQL) and SQL syntax supported by ODBC:
Access SQL doesn't support ANSI SQL Data Definition Language (DDL) statements. You modify the Tables, Fields, and Indexes collections with Visual C++ code to create or modify database objects. You can use the Microsoft ODBC Desktop Database Drivers kit to provide limited (ODBC Extended Level 1) DDL capability.
Access SQL doesn't support ANSI SQL Data Control Language (DCL), and Visual C++ doesn't offer an alternative method of granting and revoking user permissions for database objects. The Microsoft ODBC Desktop Database Drivers kit provides limited (ODBC Extended Level 1) DCL capability.
Access SQL doesn't support subqueries. To create the equivalent of a subquery, you need to execute a second query against a Dynaset object created by a query.
The following sections summarize the differences between the keywords of Access SQL and the reserved words of ANSI SQL, as well as how Access SQL deals with the data types defined by ANSI SQL.
ANSI SQL Reserved Words and Access SQL Keywords
ANSI SQL reserved words, by tradition, are set in uppercase type. Reserved words in ANSI SQL may not be used as names of objects, such as tables or fields, or as names of parameters or variables used in SQL statements. This book refers to elements of Access SQL syntax as keywords because, with the exception of some Access SQL functions, Access SQL keywords aren't reserved words in Visual C++. Table 5.3 lists the ANSI SQL reserved words.
Table 5.3. ANSI SQL reserved words that correspond to Access SQL keywords.
ALL
DELETE
HAVING
JOIN
SELECT
AND
DESC
IN
ON
SET
AS
DISTINCT
INNER
OPTION
UPDATE
ASC
FROM
INSERT
OR
WHERE
BY
GROUP
INTO
ORDER
WITH
This list shows the ANSI set functions that are identical to Access SQL aggregate functions:
COUNT()
SUM()
AVG()
MIN()
MAX()
Table 5.4 lists the commonly used ANSI SQL reserved words (including functions) and symbols that don't have a directly equivalent Access SQL reserved word or symbol. This table doesn't include many of the new reserved words added to SQL-89 by SQL-92, because these reserved words hadn't yet been implemented in the versions of client-server RDBMS that had been released as commercial products at the time this book was written.
Table 5.4. Common ANSI SQL reserved words that don't have a direct equivalent in Access SQL.
Reserved Word
Category
Substitute
ALL
DQL
Applies only to subqueries.
ALTER TABLE
DDL
Use Fields collection.
ANY
DQL
Applies only to subqueries.
AUTHORIZATION
DCL
Access SQL doesn't support DCL.
BEGIN
TPL
Visual C++ MFC BeginTrans() member function.
CHECK
DDL
Access SQL doesn't support DDL.
CLOSE
DCL
Access SQL doesn't support DCL.
COMMIT
TPL
Visual C++ MFC CommitTrans() member function.
CREATE INDEX
DDL
Use Indexes collection.
CREATE TABLE
DDL
Use Tables collection.
CREATE VIEW
DDL
Equivalent to a Snapshot object.
CURRENT
CCL
Scrollable cursors are built into Dynaset and Snapshot objects.
CURSOR
CCL
Scrollable cursors are built into Dynaset and Snapshot objects.
DECLARE
CCL
Scrollable cursors are built into Dynaset and Snapshot objects.
DROP INDEX
DDL
Use Indexes collection.
DROP TABLE
DDL
Use Tables collection.
DROP VIEW
DDL
Use Close method on Snapshot object.
FETCH
CCL
Field name of a Dynaset or Snapshot object.
FOREIGN KEY
DDL
Access SQL doesn't support DDL
GRANT
DCL
Access SQL doesn't support DCL.
IN subquery
DQL
Use a query against a query Dynaset instead of a subquery.
POSITION()
DQL
Use InStr().
PRIMARY KEY
DDL
Access SQL doesn't support DDL.
PRIVILEGES
DCL
Access SQL doesn't support DCL.
REFERENCES
DDL
Access SQL doesn't support DDL.
REVOKE
DCL
Access SQL doesn't support DDL.
ROLLBACK
TPL
Visual C++ MFC Rollback() member function.
SUBSTRING()
DQL
Use Mid() functions.
UNION
DQL
UNIONs currently aren't supported by Access SQL.
UNIQUE
DDL
Access SQL doesn't support DDL.
WORK
TPL
Not required by Visual C++ CDataBase transaction functions.
*=
DQL
Use LEFT JOIN.
=*
DQL
Use RIGHT JOIN.
!= (not equal)
DQL
Use the <> for not equal.
: (variable prefix)
DQL
Use the PARAMETERS statement (if needed).
Table 5.5 lists Access SQL keywords that aren't reserved words in ANSI SQL. Many of the Access SQL keywords describe data types that you specify by using the DB_ constants. Data type conversion to and from ANSI SQL is discussed shortly.
Table 5.5. Access SQL keywords and symbols that aren't reserved words or symbols in ANSI SQL.
Access SQL
ANSI SQL
Category
Description
BINARY
No equivalent
DDL
Presently, not an official Access data type (used for SID field in SYSTEM.MDA).
BOOLEAN
No equivalent
DDL
Logical field data type (0 or -1 values only).
BYTE
No equivalent
DDL
Asc()/Chr() data type; 1-byte integer (tinyint of SQL Server).
CURRENCY
No equivalent
DDL
Currency data type.
DATETIME
No equivalent
DDL
Date/time field data type (Variant subtype 7).
DISTINCTROW
No equivalent
DQL
Creates an updatable Dynaset object.
DOUBLE
REAL
DDL
Double-precision floating-point number.
IN predicate with crosstab queries
No equivalent
DQL
Defines fixed-column headers for crosstab queries.
LONG
INT[EGER]
DDL
Long integer data type.
LONGBINARY
No equivalent
DDL
OLE Object field data type.
LONGTEXT
No equivalent
DDL
Memo field data type.
(WITH) OWNERACCESS
No equivalent
DQL
Runs queries with (OPTION) object owner's permissions.
PARAMETERS
No equivalent
DQL
User- or program-entered query parameters. Should be avoided in Visual C++ code.
PIVOT
No equivalent
DQL
Use in crosstab queries.
SHORT
SMALLINT
DDL
Integer data type; 2 bytes.
SINGLE
FLOAT
DDL
Single-precision real number.
TEXT
VARCHAR[ACTER]
DDL
Text data type.
TRANSFORM
No equivalent
DQL
Specifies a crosstab query.
? (LIKE wildcard)
_ (wildcard)
DQL
Single character with Like.
* (LIKE wildcard)
% (wildcard)
DQL
Zero or more characters.
# (LIKE wildcard)
No equivalent
DQL
Single digit, 0 through 9.
# (date specifier)
No equivalent
DQL
Encloses date/time values.
<> (not equal)
!=
DQL
Access uses ! as a separator.
Access SQL provides the four SQL statistical aggregate functions listed in Table 5.6 that are not included in ANSI SQL. These Access SQL statistical aggregate functions are set in upper- and lowercase type in the Microsoft documentation but are set in uppercase type in this book.
Table 5.7 lists the Access SQL keywords that often appear in upper- and lowercase rather than the all-uppercase SQL format of the Microsoft documentation.
Table 5.7. Typesetting conventions for Access SQL keywords and ANSI SQL reserved words.
Access SQL
ANSI SQL and This Book
And
AND
Avg()
AVG()
Between
BETWEEN
Count()
COUNT()
Is
IS
Like
LIKE
Max()
MAX()
Min()
MIN()
Not
NOT
Null
NULL
Or
OR
Sum()
SUM()
Data Type Conversion Between ANSI SQL and Access SQL
Table 5.8 lists the data types specified by ANSI SQL-92 and the equivalent data types of Access SQL when equivalent data types exist. Categories of ANSI SQL data types precede the SQL-92 data type identifier.
Table 5.8. Data type conversion to and from ANSI SQL and Access SQL.
ANSI SQL-92
Access SQL
C Datatype
Comments
Exact Numeric
Number
INTEGER
Long (integer)
long int
4 bytes
SMALLINT
Integer
short int
2 bytes
NUMERIC[(p[, s])]
Not supported
p = precision,s = scale
DECIMAL[(p[, s])]
Not supported
p = precision,s = scale
Approximate Numeric
Number
REAL
Double (precision)
double
8 bytes
DOUBLE PRECISION
Not supported
16 bytes
FLOAT
Single (precision)
float
4 bytes
Character (Text)
Text
CHARACTER[(n)]
String
char *
Text fields are variable-length.
CHARACTER VARYING
String
char *
Bit Strings
None supported
BIT[(n)]
Not supported
Binary fields are variable-length.
BIT VARYING
Not supported
Used by Microsoft.
Datetimes
DATE
Not supported
10 bytes
TIME
Not supported
8 bytes (plus fraction)
TIMESTAMP
Date/Time
COleDateTime
19 bytes
TIME WITH TIME ZONE
Not supported
14 bytes
TIMESTAMP WITH
Not supported
25 bytes
TIME ZONE
Intervals (Datetimes)
None supported
Many of the data types described in the Access SQL column of Table 5.8 as not being supported are converted by ODBC drivers to standard ODBC data types that are compatible Access SQL data types. When you use attached database files, data types are converted by the Access database engine's ISAM driver for dBASE, FoxPro, Paradox, and Btrieve files. Data type conversion by ODBC and ISAM drivers is one of the subjects of the next chapter.
Summary
It's impossible to fully describe all the reserved words and syntax of Structured Query Language in a single chapter, especially when the chapter also must compare a particular dialect of SQLAccess SQLto a "standard" version of the language. This is particularly true when the standard language is new, as is the case for SQL-92, and when no RDBMSs support more than a fraction of the reserved words added to SQL-89 by the new standard. For a full exposition of SQL-92, you need a reference guide, such as Jim Melton and Alan R. Simon's Understanding the New SQL: A Complete Guide (see the section "A Visual C++ and Database Bibliography" in this book's Introduction).
This chapter introduced newcomers to SQLfirst to the ANSI variety, and then to the Access dialect. ANSI SQL (as implemented by the Microsoft ODBC API) must use the SQL pass-through technique, which lets you process queries on the back-end server of a client-server database. In order to use the Access database engine to process queries, you also need to know the Access dialect of SQL. There are many examples of both ANSI SQL and Access queries in this book, so you've just started down the path to fluency in using SQL with Visual C++ database applications. The next chapter delves into the innards of the Access database engine and its relationship to the ODBC API. Chapter 7 shows you how Visual C++ applications interface with ODBC drivers. The last chapter in Part II expands your SQL vocabulary to Access SQL's crosstab query syntax and shows you how to write SQL statements that modify the data in database tables.