Selecting column(s) in a relational database table
Introduction
Relational algebra is a theoretical language that operates on one or more relations and generates another relation without changing the original relations. The output of one operation can be input to another process.
According to Codd (1972), there are eight operations consisting of:
Five basic operations
▸ projection
▸ selection
▸ Cartesian product
▸ union
▸ set difference
Three additional operations are derived from the basic operations
▸ join
▸ intersection
▸ division.
Relational algebra is the basis for queries. Query is a noun in English that semantically means a question or a request for information about something.
A query is a question or a request for information about something. (https://www.merriam-webster.com/dictionary/query)
As a term in the database, query means requesting data retrieved from the database. Query on a relational database using the SELECT statement.
A query is a request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL).
(https://www.techopedia.com/definition/5736/query)
Projection Operation
Projection is used to select columns or attributes from a relation. See Table 1. The projection operation produces a relation containing the attribute values a₁, a₂, …, aₙ without duplicating tuples. The relational algebraic expression uses the Π operator.
| Relational Algebra | Query |
|---|---|
| Πa1, a2, … , an (Relation) | SELECT a1, a2, … , an FROM Relation |
Here’s an example of a student table with six columns. Using the projection operation, we can select columns according to our needs. First, we create the StudentInfo database containing one Student table. The SQL statements are written using SQL Server.
CREATE DATABASE StudentInfo
GO
USE University
GO
CREATE TABLE Student (
Id Char(7) PRIMARY KEY,
Name VarChar(16),
Gender Char(1),
DOB Date,
BloodType VarChar(2),
GPA Decimal(3,2),
CHECK (Gender IN('F','M')),
CHECK (BloodType IN('A','B','AB','O')),
CHECK (GPA BETWEEN 0.0 AND 4.0)
)
GO
INSERT INTO Student VALUES
('1221234','Rajabasa', 'M','2003-03-25','A', 2.34),
('1120235','Merapi', 'M','2003-02-17','O', 3.55),
('1120236','Mawar Merah', 'F','2004-04-01',NULL,2.10),
('1222237','Semeru', 'M','2004-12-31','A', NULL),
('1322238','Melati Putih','F',NULL, 'B', 1.23),
('1321239','Kutilang', 'M','2004-03-17','O', 3.21),
('1221210','Cempaka', 'F','2003-03-07','O', 0.75),
('1219254','Kilau Intan', 'F',NULL, NULL, 2.00),
('1119300','Nur Cahaya', 'F','2002-04-16','B', NULL),
('1120335','Jayagiri', 'F',NULL, 'O', 3.50);
GO
Queries SELECT * FROM Student return:
Id Name Gender DOB BloodType GPA
------- ---------------- ------ ---------- --------- ----
1119300 Nur Cahaya F 2002-04-16 B NULL
1120235 Merapi M 2003-02-17 O 3.55
1120236 Mawar Merah F 2004-04-01 NULL 2.10
1120335 Jayagiri F NULL O 3.50
1219254 Kilau Intan F NULL NULL 2.00
1221210 Cempaka F 2003-03-07 O 0.75
1221234 Rajabasa M 2003-03-25 A 2.34
1222237 Semeru M 2004-12-31 A NULL
1321239 Kutilang M 2004-03-17 O 3.21
1322238 Melati Putih F NULL B 1.23
(10 rows affected)
(1) List of student blood types
| Relational Algebra | Query |
|---|---|
| ΠBloodType (Student) | SELECT BloodType FROM Student |
When executed, the query above produces:
BloodType
---------
B
O
NULL
O
NULL
O
A
A
O
B
(10 rows affected)
It turned out that the results did not match those in the book; there was still data duplication. DBMS doesn’t implement it in the same way as described in theory. So do Oracle, MySQL, Access, and PostgreSQL. The user must explicitly request data using the DISTINCT clause to eliminate data duplication.
SELECT DISTINCT BloodType
FROM Student
The results are in line with the theory.
BloodType
---------
NULL
A
B
O
(4 rows affected)
(2) Number of students for each blood type
Add the Count(*) aggregation function along with the GROUP BY clause.
SELECT BloodType,
Count(*) AS Total
FROM Student
GROUP BY BloodType
The clause DISTINCT is no longer required. Here are the results.
BloodType Total
--------- -----
NULL 2
A 2
B 2
O 4
(4 rows affected)
(3) List of gender and blood type
| Relational Algebra | Query |
|---|---|
| ΠGender, BloodType (Student) | SELECT Gender, BloodType FROM Student |
The query returns:
Gender BloodType
------ ---------
F NULL
F B
F O
M A
M O
(5 rows affected)
The resulting columns’ order corresponds to the projection statement’s order. The order of the columns at the time of defining the table has no effect.
(4) Number of students for each blood type and gender
Add the Count(*) aggregation function along with the GROUP BY clause.
SELECT Gender, BloodType,
Count(*) AS Total
FROM Students
GROUP BY Gender, BloodType
When executed, the result is as follows.
Gender BloodType Total
------ --------- -----
F NULL 2
M A 2
F B 2
F O 2
M O 2
(5 rows affected)
Summary
Projection is a column-selecting operation without tuple duplication. The reality is that the DBMS does not implement it in the same way as described in theory. To get a unique tuple result, the user must explicitly request it using the DISTINCT clause.
Hopefully beneficial. Your feedback will be precious.
References
Codd, E.F. (March 1972). “Relational Completeness of Data Base Sublanguages” in Computer Sciences. San Jose, California: IBM Research Laboratory.
Connolly, Thomas M., & Begg, Carolyn E. (2015). Database Systems: A Practical Approach to Design, Implementation, and Management. 6th Edition. Essex, England: Pearson Education.
Coronel, C., Steven, M., Crockett, K., & Blewett, C. (2020). Database Principles: Fundamentals of Design, Implementation, and Management. 3rd Edition. Hampshire, United Kingdom: Cengage Learning.
