EDUCATION
   
 
Education :- Computer :- SQL TUTORIAL :- Common SQL Questions & Advanced Topics

  1. Why can't I just ask for the first three rows in a table? --Because in relational databases, rows are inserted in no particular order, that is, the system inserts them in an arbitrary order; so, you can only request rows using valid SQL features, like ORDER BY, etc.

  2. What is this DDL and DML I hear about? --DDL (Data Definition Language) refers to (in SQL) the Create Table statement...DML (Data Manipulation Language) refers to the Select, Update, Insert, and Delete statements.  Also, QML, referring to Select statements, stands for Query Manipulation Language.

  3. Aren't database tables just files? --Well, DBMS's store data in files declared by system managers before new tables are created (on large systems), but the system stores the data in a special format, and may spread data from one table over several files. In the database world, a set of files created for a database is called a tablespace. In general, on small systems, everything about a database (definitions and all table data) is kept in one file.

  4. (Related question) Aren't database tables just like spreadsheets? --No, for two reasons. First, spreadsheets can have data in a cell, but a cell is more than just a row-column-intersection. Depending on your spreadsheet software, a cell might also contain formulas and formatting, which database tables cannot have (currently). Secondly, spreadsheet cells are often dependent on the data in other cells. In databases, "cells" are independent, except that columns are logically related (hopefully; together a row of columns describe an entity), and, other than primary key and foreign key constraints, each row in a table is independent from one another.

  5. How do I import a text file of data into a database? --Well, you can't do it directly...you must use a utility, such as Oracle's SQL*Loader, or write a program to load the data into the database. A program to do this would simply go through each record of a text file, break it up into columns, and do an Insert into the database.

  6. What web sites and computer books would you recommend for more information about SQL and databases? --First, look at the sites at the bottom of this page. I would especially suggest the following: Ask the SQL Pro (self-explanatory), DB Ingredients (more theorical topics), DBMS Lab/Links (comprehensive academic DBMS link listing), Access on the Web (about web access of Access databases), Tutorial Page (listing of other tutorials), and miniSQL (more information about the best known free DBMS).

  7. What is a schema? --A schema is a logical set of tables, such as the Antiques database above...usually, it is thought of as simply "the database", but a database can hold more than one schema. For example, a star schema is a set of tables where one large, central table holds all of the important information, and is linked, via foreign keys, to dimension tables which hold detail information, and can be used in a join to create detailed reports.

  8. I understand that Oracle offers a special keyword, Decode, that allows for some "if-then" logic. How does that work? -- Technically, Decode allows for conditional output based on the value of a column or function. The syntax looks like this (from the Oracle: Complete Reference series):

    Select ...DECODE (Value, If1, Then1, [If 2, Then 2, ...,] Else) ...From ...;

    The Value is the name of a column, or a function (conceivably based on a column or columns), and for each If included in the statement, the corresponding Then clause is the output if the condition is true. If none of the conditions are true, then the Else value is output. Let's look at an example:

    Select Distinct City,
    DECODE (City, 'Cincinnati', 'Queen City', 'New York', 'Big Apple', 'Chicago',
    'City of Broad Shoulders', City) AS Nickname
    From Cities;

    The output might look like this:

    City         Nickname
    ------------ ------------------------------
    Boston       Boston
    Cincinnati   Queen City
    Cleveland    Cleveland
    New York     Big Apple

    'City' in the first argument denotes the column name used for the test. The second, fourth, etc. arguments are the individual equality tests (taken in the orden given) against each value in the City column. The third, fifth, etc. arguments are the corresponding outputs if the corresponding test is true. The final parameter is the default output if none of the tests are true; in this case, just print out the column value.

    TIP: If you want nothing to be output for a given condition, such as the default "Else" value, enter the value Null for that value, such as:

    Select Distinct City,
    DECODE (City, 'Cincinnati', 'Queen City', 'New York', 'Big Apple', 'Chicago',
    'City of Broad Shoulders', Null) AS Nickname
    From Cities;

    If the City column value is not one of the ones mentioned, nothing is outputted, rather than the city name itself.

    City         Nickname
    ------------ ----------
    Boston
    Cincinnati   Queen City
    Cleveland
    New York     Big Apple
     

  9. You mentioned Referential Integrity before, but what does that have to do with this concept I've heard about, Cascading Updates and Deletes? --This is a difficult topic to talk about, because it's covered differently in different DBMS's.

    For example, Microsoft SQL Server (7.0 & below) requires that you write "triggers" (see the Yahoo SQL Club link to find links that discuss this topic--I may include that topic in a future version of this page) to implement this. (A quick definition, though; a Trigger is a SQL statement stored in the database that allows you to perform a given query [usually an "Action" Query--Delete, Insert, Update] automatically, when a specified event occurs in the database, such as a column update, but anyway...) Microsoft Access (believe it or not) will perform this if you define it in the Relationships screen, but it will still burden you with a prompt. Oracle does this automatically, if you specify a special "Constraint" (see reference at bottom for definition, not syntax) on the keyed column.

    So, I'll just discuss the concept. First, see the discussion above on Primary and Foreign keys.

    Concept: If a row from the primary key column is deleted/updated, if "Cascading" is activated, the value of the foreign key in those other tables will be deleted (the whole row)/updated.

    The reverse, a foreign key deletion/update causing a primary key value to be deleted/changed, may or may not occur: the constraint or trigger may not be defined, a "one-to-many" relationship may exist, the update might be to another existing primary key value, or the DBMS itself may or may not have rules governing this. As usual, see your DBMS's documentation.

    For example, if you set up the AntiqueOwners table to have a Primary Key, OwnerID, and you set up the database to delete rows on the Foreign Key, SellerID, in the Antiques table, on a primary key deletion, then if you deleted the AntiqueOwners row with OwnerID of '01', then the rows in Antiques, with the Item values, Bed, Cabinet, and Jewelry Box ('01' sold them), will all be deleted. Of course, assuming the proper DB definition, if you just updated '01' to another value, those Seller ID values would be updated to that new value too.

  10. Show me an example of an outer join. --Well, from the questions I receive, this is an extremely common example, and I'll show you both the Oracle and Access queries...

    Think of the following Employee table (the employees are given numbers, for simplicity):

    Name Department
    1 10
    2 10
    3 20
    4 30
    5 30

    Now think of a department table:

    Department
    10
    20
    30
    40

    Now suppose you want to join the tables, seeing all of the employees and all of the departments together...you'll have to use an outer join which includes a null employee to go with Dept. 40.

    In the book, "Oracle 7: the Complete Reference", about outer joins, "think of the (+), which must immediately follow the join column of the table, as saying add an extra (null) row anytime there's no match". So, in Oracle, try this query (the + goes on Employee, which adds the null row on no match):

    Select E.Name, D.Department
    From Department D, Employee E
    Where E.Department(+) = D.Department;

    This is a left (outer) join, in Access:

    SELECT DISTINCTROW Employee.Name, Department.Department
    FROM Department LEFT JOIN Employee ON Department.Department = Employee.Department;

    And you get this result:

    Name Department
    1 10
    2 10
    3 20
    4 30
    5 30
      40


  11. What are some general tips you would give to make my SQL queries and databases better and faster (optimized)?
    • You should try, if you can, to avoid expressions in Selects, such as SELECT ColumnA + ColumnB, etc. The query optimizer of the database, the portion of the DBMS that determines the best way to get the required data out of the database itself, handles expressions in such a way that would normally require more time to retrieve the data than if columns were normally selected, and the expression itself handled programmatically.
    • Minimize the number of columns included in a Group By clause.
    • If you are using a join, try to have the columns joined on (from both tables) indexed.
    • When in doubt, index.
    • Unless doing multiple counts or a complex query, use COUNT(*) (the number of rows generated by the query) rather than COUNT(Column_Name).

  12. What is normalization? --Normalization is a technique of database design that suggests that certain criteria be used when constructing a table layout (deciding what columns each table will have, and creating the key structure), where the idea is to eliminate redundancy of non-key data across tables. Normalization is usually referred to in terms of forms, and I will introduce only the first three, even though it is somewhat common to use other, more advanced forms (fourth, fifth, Boyce-Codd; see documentation).

    First Normal Form refers to moving data into separate tables where the data in each table is of a similar type, and by giving each table a primary key.

    Putting data in Second Normal Form involves removing to other tables data that is only dependent of a part of the key. For example, if I had left the names of the Antique Owners in the items table, that would not be in Second Normal Form because that data would be redundant; the name would be repeated for each item owned; as such, the names were placed in their own table. The names themselves don't have anything to do with the items, only the identities of the buyers and sellers.

    Third Normal Form involves getting rid of anything in the tables that doesn't depend solely on the primary key. Only include information that is dependent on the key, and move off data to other tables that are independent of the primary key, and create a primary key for the new tables.

    There is some redundancy to each form, and if data is in 3NF (shorthand for 3rd normal form), it is already in 1NF and 2NF. In terms of data design then, arrange data so that any non-primary key columns are dependent only on the whole primary key. If you take a look at the sample database, you will see that the way then to navigate through the database is through joins using common key columns.

    Two other important points in database design are using good, consistent, logical, full-word names for the tables and columns, and the use of full words in the database itself. On the last point, my database is lacking, as I use numeric codes for identification. It is usually best, if possible, to come up with keys that are, by themselves, self-explanatory; for example, a better key would be the first four letters of the last name and first initial of the owner, like JONEB for Bill Jones (or for tiebreaking purposes, add numbers to the end to differentiate two or more people with similar names, so you could try JONEB1, JONEB2, etc.).

  13. What is the difference between a single-row query and a multiple-row query and why is it important to know the difference? --First, to cover the obvious, a single-row query is a query that returns one row as its result, and a multiple-row query is a query that returns more than one row as its result. Whether a query returns one row or more than one row is entirely dependent on the design (or schema) of the tables of the database. As query-writer, you must be aware of the schema, be sure to include enough conditions, and structure your SQL statement properly, so that you will get the desired result (either one row or multiple rows). For example, if you wanted to be sure that a query of the AntiqueOwners table returned only one row, consider an equal condition of the primary key column, OwnerID.

    Three reasons immediately come to mind as to why this is important. First, getting multiple rows when you were expecting only one, or vice-versa, may mean that the query is erroneous, that the database is incomplete, or simply, you learned something new about your data. Second, if you are using an update or delete statement, you had better be sure that the statement that you write performs the operation on the desired row (or rows)...or else, you might be deleting or updating more rows than you intend. Third, any queries written in Embedded SQL must be carefully thought out as to the number of rows returned. If you write a single-row query, only one SQL statement may need to be performed to complete the programming logic required. If your query, on the other hand, returns multiple rows, you will have to use the Fetch statement, and quite probably, some sort of looping structure in your program will be required to iterate processing on each returned row of the query.

  14. Tell me about a simple approach to relational database design. -- First, create a list of important things (entities) and include those things you may not initially believe is important. Second, draw a line between any two entities that have any connection whatsoever; except that no two entities can connect without a 'rule'; e.g.: families have children, employees work for a department. Therefore put the 'connection' in a diamond, the 'entities' in squares. Third, your picture should now have many squares (entities) connected to other entities through diamonds (a square enclosing an entity, with a line to a diamond describing the relationship, and then another line to the other entity). Fourth, put descriptors on each square and each diamond, such as customer -- airline -- trip. Fifth, give each diamond and square any attributes it may have (a person has a name, an invoice has a number), but some relationships have none (a parent just owns a child). Sixth, everything on your page that has attributes is now a table, whenever two entities have a relationship where the relationship has no attributes, there is merely a foreign key between the tables. Seventh, in general you want to make tables not repeat data. So, if a customer has a name and several addresses, you can see that for every address of a customer, there will be repeated the customer's first name, last name, etc. So, record Name in one table, and put all his addresses in another. Eighth, each row (record) should be unique from every other one; Mr. Freedman suggests a 'auto-increment number' primary key, where a new, unique number is generated for each new inserted row. Ninth, a key is any way to uniquely identify a row in a table...first and last name together are good as a 'composite' key. That's the technique.

  15. What are relationships? --Another design question...the term "relationships" (often termed "relation") usually refers to the relationships among primary and foreign keys between tables. This concept is important because when the tables of a relational database are designed, these relationships must be defined because they determine which columns are or are not primary or foreign keys. You may have heard of an Entity-Relationship Diagram, which is a graphical view of tables in a database schema, with lines connecting related columns across tables. See the sample diagram at the end of this section or some of the sites below in regard to this topic, as there are many different ways of drawing E-R diagrams. But first, let's look at each kind of relationship...

    A One-to-one relationship means that you have a primary key column that is related to a foreign key column, and that for every primary key value, there is one foreign key value. For example, in the first example, the EmployeeAddressTable, we add an EmployeeIDNo column. Then, the EmployeeAddressTable is related to the EmployeeStatisticsTable (second example table) by means of that EmployeeIDNo. Specifically, each employee in the EmployeeAddressTable has statistics (one row of data) in the EmployeeStatisticsTable. Even though this is a contrived example, this is a "1-1" relationship. Also notice the "has" in bold...when expressing a relationship, it is important to describe the relationship with a verb.

    The other two kinds of relationships may or may not use logical primary key and foreign key constraints...it is strictly a call of the designer. The first of these is the one-to-many relationship ("1-M"). This means that for every column value in one table, there is one or more related values in another table. Key constraints may be added to the design, or possibly just the use of some sort of identifier column may be used to establish the relationship. An example would be that for every OwnerID in the AntiqueOwners table, there are one or more (zero is permissible too) Items bought in the Antiques table (verb: buy).

    Finally, the many-to-many relationship ("M-M") does not involve keys generally, and usually involves idenifying columns. The unusual occurence of a "M-M" means that one column in one table is related to another column in another table, and for every value of one of these two columns, there are one or more related values in the corresponding column in the other table (and vice-versa), or more a common possibility, two tables have a 1-M relationship to each other (two relationships, one 1-M going each way). A [bad] example of the more common situation would be if you had a job assignment database, where one table held one row for each employee and a job assignment, and another table held one row for each job with one of the assigned employees. Here, you would have multiple rows for each employee in the first table, one for each job assignment, and multiple rows for each job in the second table, one for each employee assigned to the project. These tables have a M-M: each employee in the first table has many job assignments from the second table, and each job has many employees assigned to it from the first table. This is the tip of the iceberg on this topic...see the links below for more information and see the diagram below for a simplified example of an E-R diagram.

    Sample Simplified Entity-Relationship Diagram

  16. What are some important nonstandard SQL features (extremely common question)? --Well, see the next section...