Relational Algebra and Query #1: Projection

Selecting column(s) in a relational database table

Photo by Mauricio Artieda on Unsplash

Introduction

Relational algebra is a theoretical language that operates on one or more relations and generates another relation without changing the original relations. The output of one operation can be input to another process.

According to Codd (1972), there are eight operations consisting of:

Five basic operations
▸ projection
▸ selection
▸ Cartesian product
▸ union
▸ set difference

Three additional operations are derived from the basic operations
▸ join
▸ intersection
▸ division.

Relational algebra is the basis for queries. Query is a noun in English that semantically means a question or a request for information about something.

A query is a question or a request for information about something. (https://www.merriam-webster.com/dictionary/query)

As a term in the database, query means requesting data retrieved from the database. Query on a relational database using the SELECT statement.

A query is a request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL).
(https://www.techopedia.com/definition/5736/query)

Projection Operation

Projection is used to select columns or attributes from a relation. See Table 1. The projection operation produces a relation containing the attribute values ​​a₁, a₂, …, aₙ without duplicating tuples. The relational algebraic expression uses the Π operator.

Relational AlgebraQuery
Πa1, a2, … , an (Relation)SELECT a1, a2, … , an
FROM Relation
Table 1: Projection Operation

Here’s an example of a student table with six columns. Using the projection operation, we can select columns according to our needs. First, we create the StudentInfo database containing one Student table. The SQL statements are written using SQL Server.

CREATE DATABASE StudentInfo
GO
USE University
GO
CREATE TABLE Student (
Id Char(7) PRIMARY KEY,
Name VarChar(16),
Gender Char(1),
DOB Date,
BloodType VarChar(2),
GPA Decimal(3,2),
CHECK (Gender IN('F','M')),
CHECK (BloodType IN('A','B','AB','O')),
CHECK (GPA BETWEEN 0.0 AND 4.0)
)
GO
INSERT INTO Student VALUES
('1221234','Rajabasa', 'M','2003-03-25','A', 2.34),
('1120235','Merapi', 'M','2003-02-17','O', 3.55),
('1120236','Mawar Merah', 'F','2004-04-01',NULL,2.10),
('1222237','Semeru', 'M','2004-12-31','A', NULL),
('1322238','Melati Putih','F',NULL, 'B', 1.23),
('1321239','Kutilang', 'M','2004-03-17','O', 3.21),
('1221210','Cempaka', 'F','2003-03-07','O', 0.75),
('1219254','Kilau Intan', 'F',NULL, NULL, 2.00),
('1119300','Nur Cahaya', 'F','2002-04-16','B', NULL),
('1120335','Jayagiri', 'F',NULL, 'O', 3.50);
GO

Queries SELECT * FROM Student return:

Id      Name             Gender DOB        BloodType GPA
------- ---------------- ------ ---------- --------- ----
1119300 Nur Cahaya F 2002-04-16 B NULL
1120235 Merapi M 2003-02-17 O 3.55
1120236 Mawar Merah F 2004-04-01 NULL 2.10
1120335 Jayagiri F NULL O 3.50
1219254 Kilau Intan F NULL NULL 2.00
1221210 Cempaka F 2003-03-07 O 0.75
1221234 Rajabasa M 2003-03-25 A 2.34
1222237 Semeru M 2004-12-31 A NULL
1321239 Kutilang M 2004-03-17 O 3.21
1322238 Melati Putih F NULL B 1.23

(10 rows affected)

(1) List of student blood types

Relational AlgebraQuery
ΠBloodType (Student)SELECT BloodType
FROM Student
Table 2: Projection of Student Blood Types

When executed, the query above produces:

BloodType
---------
B
O
NULL
O
NULL
O
A
A
O
B

(10 rows affected)

It turned out that the results did not match those in the book; there was still data duplication. DBMS doesn’t implement it in the same way as described in theory. So do Oracle, MySQL, Access, and PostgreSQL. The user must explicitly request data using the DISTINCT clause to eliminate data duplication.

SELECT  DISTINCT BloodType
FROM Student

The results are in line with the theory.

BloodType
---------
NULL
A
B
O

(4 rows affected)

(2) Number of students for each blood type

Add the Count(*) aggregation function along with the GROUP BY clause.

SELECT BloodType, 
Count(*) AS Total
FROM Student
GROUP BY BloodType

The clause DISTINCT is no longer required. Here are the results.

BloodType Total
--------- -----
NULL 2
A 2
B 2
O 4

(4 rows affected)

(3) List of gender and blood type

Relational AlgebraQuery
ΠGender, BloodType (Student)SELECT Gender, BloodType
FROM Student
Table 3: Gender and BloodType Projection

The query returns:

Gender BloodType
------ ---------
F NULL
F B
F O
M A
M O

(5 rows affected)

The resulting columns’ order corresponds to the projection statement’s order. The order of the columns at the time of defining the table has no effect.

(4) Number of students for each blood type and gender

Add the Count(*) aggregation function along with the GROUP BY clause.

SELECT Gender, BloodType, 
Count(*) AS Total
FROM Students
GROUP BY Gender, BloodType

When executed, the result is as follows.

Gender BloodType Total
------ --------- -----
F NULL 2
M A 2
F B 2
F O 2
M O 2

(5 rows affected)

Summary

Projection is a column-selecting operation without tuple duplication. The reality is that the DBMS does not implement it in the same way as described in theory. To get a unique tuple result, the user must explicitly request it using the DISTINCT clause.

Hopefully beneficial. Your feedback will be precious.


References

Codd, E.F. (March 1972). “Relational Completeness of Data Base Sublanguages” in Computer Sciences. San Jose, California: IBM Research Laboratory.

Connolly, Thomas M., & Begg, Carolyn E. (2015). Database Systems: A Prac­tical Approach to Design, Implementation, and Management. 6th Edition. Essex, England: Pearson Education.

Coronel, C., Steven, M., Crockett, K., & Blewett, C. (2020). Database Principles: Fundamentals of Design, Implementation, and Management. 3rd Edition. Hampshire, United Kingdom: Cengage Learning.

Leave a Reply

Your email address will not be published. Required fields are marked *