2022-02-24

IntermediateSQL

Structured Query Language

User only need to specify what data they want rather than the strategy on how to query the data

Three part of SQL

Data Manipulate Language(DML)
Data Definition Language(DDL)
Data Control Language(DCL)

SQL is based on bags(duplicate) rather then sets(no-duplicate)

Basic Syntax

SELECT column1,column2,... FROM Relation WHERE predicate1,...;
# it's OK to use lowercase key word
# However, use capital for key words is convention
select column1,column2,... from Relation where predicate1,...;

Aggregation+Group by

Aggregate: return a single value from a bag of tuples

AVG(col);
MIN(col);
MAX(col);
SUM(col);
COUNT(col);

Group by : Project tuples into subset and calculate aggregate against each subset

GROUP BY

group by

String/Date/Time Operation

In mysql string is case-insensitive while in sql-92 it is case sensitive

Like

# match any substring : %
# match exactly one character : _
SELECT * FROM student AS s WHERE s.login LIKE '%@cs';
SELECT * FROM enrolled AS e WHERE e.cid LIKE '15-7_1';

String functions

Each DBMS have its own implementation

String concatenation

1 2	'a' \|\| 'b' # in sql-92 CONCAT('a','b') # in MySQL

Date Operation

differs widely between different DBMS

Output Control + Redirection

Redirect the query result to another new table

INTO

1	SELECT DISTINCT cid INTO newtable FROM enrolled;

INSERT

1 2	INSERT INTO existTable (SELECT DISTINCT cid FROM enrolled); # query result must generate the same columns as the to be inserted table

Order by

1
2
3

# order the results by a specified order
SELECT sid,grade FROM enrolled WHERE cid='15-721' ORDER BY grade ;
SELECT sid,grade FROM enrolled WHERE cid='15-721' ORDER BY grade [DESC | ASC];

Limit

limit the number returned by querying

1 2	SELECT * FROM student LIMIT 5; # get the first 5 students SELECT * FROM student LIMIT 5 OFFSET 2; # get student 2~7

Nested Queries

IN/ANY/ALL/EXISTS

1
2
3

SELECT name FROM student WHERE sid IN(SELECT sid FROM enrolled);
SELECT name FROM student WHERE sid ALL(SELECT sid FROM enrolled);
# IN (first query)

Common Table Expression

create temporary table in large query

1	WITH sname AS (SELECT 1) SELECT * FROM sname;

recursion can be implemented via common table expression

Window Function

performs a “sliding” calculation across a set of tuples are related.

SQL from CS50

# CRUD
# Create
CREATE , INSERT
# Retrieve
SELECT
# Update
UPDATE
# Delete
DELETE

1	CREATE TABLE tablename (columnname TYPE, ...);

DISTINCT()
AVG()
MAX()
UPPER()
...

1	SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...;

1	SELECT LOWER(TRIM(title)),COUNT(LOWER(TRIM(title))) FROM shows GROUP BY LOWER(TRIM(title)) ORDER BY COUNT(TRIM(title)) DESC LIMIT 10;

1	INSERT INTO tablename (columnname , ...) VALUES(valuee,...);

1	UPDATE tablename SET column1 = value1, column2 = value2, ... WHERE ...;

1	DELETE FROM table_name WHERE condition;

Main Datatype

BLOB # binary stuff
INTEGER # int
NUMERIC
REAL # float
TEXT

INDEX

you can build a B-tree in memory via INDEX in SQL

USER `?` Placeholder to avoid SQLI attack

Comment in SQL

1	-- some comment after `--`

Connect with Python

import sqlite3 as sql
open('new.db','wb').close()
con = sql.connect('new.db')
cur = con.cursor()
cur.execute("CREATE TABLE shows (id INTEGER , title TEXT , PRIMARY KEY(id))")
cur.execute("CREATE TABLE genres (show_id INTEGER , genre TEXT , FOREIGN KEY(show_id) REFERENCES shows(id))")

import csv

with open('favorites.csv','rt') as file:
    reader = csv.DictReader(file)
    cot = 1
    for row in reader:
        title = row['title'].strip().lower()
        cmd = 'INSERT INTO shows (id , title) VALUES({} , "{}")'.format(cot , title)
        cur.execute(cmd)
        for genre in row['genres'].split():
            cmd = 'INSERT INTO genres (show_id , genre) VALUES( {} , "{}")'.format(cot , genre)
            cur.execute(cmd)
        cot += 1
con.commit() # Don't forget to commit !
con.close()
print("OK")

Homework 1

link

The homework contains 10 questions in total and is graded out of 100 points. For each question, you will need to construct a SQL query that fetches the desired data from the SQLite DBMS. It will likely take you approximately 6-8 hours to complete the questions.

$ mkdir placeholder
$ cd placeholder
$ touch q1_sample.sql\
  q2_string_function.sql\
  q3_northamerican.sql\
  q4_delaypercent.sql\
  q5_aggregates.sql\
  q6_discontinued.sql\
  q7_order_lags.sql\
  q8_total_cost_quartiles.sql\
  q9_youngblood.sql\
  q10_christmas.sql
$ cd ..

1	$ wget https://15445.courses.cs.cmu.edu/fall2021/files/northwind-cmudb2021.db.gz

schema

Q1

List all Category Names ordered alphabetically.

1	SELECT CategoryName FROM Category ORDER BY CategoryName;

Q2

Get all unique ShipNames from the Order table that contain a hyphen '-'.

Details: In addition, get all the characters preceding the (first) hyphen. Return ship names alphabetically. Your first row should look like Bottom-Dollar Markets|Bottom

1	SELECT DISTINCT ShipName,SUBSTR(ShipName , 1, INSTR(ShipName , '-') - 1) FROM 'Order' WHERE ShipName LIKE '%-%' ORDER BY ShipName ASC;

1
2

INSTR() -- Find a substring in a string and returns an integer indicating the position of the first occurrence of the substring.
SUBSTR() -- Extract and returns a substring with a predefined length starting at a specified position in a source string

Q3

Details: You should print the Order Id, ShipCountry, and another column that is either 'NorthAmerica' or 'OtherPlace' depending on the Ship Country.
Order by the primary key (Id) ascending and return 20 rows starting from Order Id 15445 Your output should look like 15445|France|OtherPlace or 15454|Canada|NorthAmerica

SELECT Id,ShipCountry,
CASE
	WHEN ShipCountry IN ('USA', 'Mexico','Canada')
	THEN 'NorthAmerica'
	ELSE 'OtherPalce'
END
    FROM 'Order' WHERE Id >= 15445 ORDER BY ID ASC LIMIT 20 ;

Q4

For each Shipper, find the percentage of orders which are late.

Details: An order is considered late if ShippedDate > RequiredDate. Print the following format, order by descending precentage, rounded to the nearest hundredths, like United Package|23.44

This one is too complex for me at the time

Nested selection
Table Joining

-- First get order count
SELECT ShipVia, COUNT(ShipVia) FROM Shipper AS S INNER JOIN 'Order' AS O ON S.Id=O.ShipVia GROUP BY O.ShipVia;

-- Then get later count
SELECT ShipVia,COUNT(ShipVia) FROM Shipper AS S INNER JOIN 'Order' AS O ON S.Id=O.ShipVia WHERE O.ShippedDate > O.RequiredDate GROUP BY O.ShipVia;

-- Finally calculate the percentage

SELECT  ROUND(late*100.0/cnt,2) AS percent FROM

(SELECT ShipVia, COUNT(ShipVia) AS cnt FROM Shipper AS S INNER JOIN 'Order' AS O ON S.Id=O.ShipVia GROUP BY O.ShipVia) AS cntT

INNER JOIN

(SELECT ShipVia,COUNT(ShipVia) AS late FROM Shipper AS S INNER JOIN 'Order' AS O ON S.Id=O.ShipVia WHERE O.ShippedDate > O.RequiredDate GROUP BY O.ShipVia) AS lateT

ON cntT.ShipVia=lateT.ShipVia;

-- Add the CompanyName and reorder table

SELECT  CompanyName , ROUND(late*100.0/cnt,2) AS percent FROM

(SELECT ShipVia, COUNT(ShipVia) AS cnt FROM Shipper AS S INNER JOIN 'Order' AS O ON S.Id=O.ShipVia GROUP BY O.ShipVia) AS cntT

INNER JOIN

(SELECT ShipVia,COUNT(ShipVia) AS late FROM Shipper AS S INNER JOIN 'Order' AS O ON S.Id=O.ShipVia WHERE O.ShippedDate > O.RequiredDate GROUP BY O.ShipVia) AS lateT

ON cntT.ShipVia=lateT.ShipVia

INNER JOIN 'Shipper' AS S ON S.Id=cntT.ShipVia
ORDER BY percent DESC;

Q5

Details: Get the number of products, average unit price (rounded to 2 decimal places), minimum unit price, maximum unit price, and total units on order for categories containing greater than 10 products.
Order by Category Id. Your output should look like Beverages|12|37.98|4.5|263.5|60

You often use the HAVING clause with the GROUP BY clause. The GROUP BY clause groups a set of rows into a set of summary rows or groups. Then the HAVING clause filters groups based on a specified condition.

SELECT
	column_1, 
        column_2,
	aggregate_function (column_3)
FROM
	table
GROUP BY
	column_1,
        column_2
HAVING
	search_condition;

-- First, get categoryId whose size >= 10
SELECT CategoryId , COUNT(*) AS C FROM Product GROUP BY CategoryId HAVING C>=10;

-- Then, join it with Category Table
SELECT CategoryName , C FROM (SELECT CategoryId , COUNT(*) AS C FROM Product GROUP BY CategoryId HAVING C>=10) AS P INNER JOIN Category ON P.CategoryId=Category.Id;