Relational Algebra and Query #1: Projection - M. Ramadhan

Selecting column(s) in a relational database table

Table of Contents hide

Introduction

Relational algebra is a theoretical language that operates on one or more relations and generates another relation without changing the original relations. The output of one operation can be input to another process.

According to Codd (1972), there are eight operations consisting of:

Five basic operations
▸ projection
▸ selection
▸ Cartesian product
▸ union
▸ set difference

Three additional operations are derived from the basic operations
▸ join
▸ intersection
▸ division.

Relational algebra is the basis for queries. Query is a noun in English that semantically means a question or a request for information about something.

A query is a question or a request for information about something. (https://www.merriam-webster.com/dictionary/query)

As a term in the database, query means requesting data retrieved from the database. Query on a relational database using the SELECT statement.

A query is a request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL).
(https://www.techopedia.com/definition/5736/query)

Projection Operation

Projection is used to select columns or attributes from a relation. See Table 1. The projection operation produces a relation containing the attribute values a₁, a₂, …, aₙ without duplicating tuples. The relational algebraic expression uses the Π operator.

Relational Algebra	Query
Π_{a1, a2, … , an} (Relation)	SELECT a1, a2, … , an FROM Relation

Table 1: Projection Operation

Here’s an example of a student table with six columns. Using the projection operation, we can select columns according to our needs. First, we create the StudentInfo database containing one Student table. The SQL statements are written using SQL Server.

CREATE DATABASE StudentInfo
GO
USE University
GO
CREATE TABLE Student (
   Id        Char(7) PRIMARY KEY,
   Name      VarChar(16),
   Gender    Char(1),
   DOB       Date,
   BloodType VarChar(2),
   GPA       Decimal(3,2),
   CHECK (Gender IN('F','M')),
   CHECK (BloodType IN('A','B','AB','O')),
   CHECK (GPA BETWEEN 0.0 AND 4.0)
)
GO
INSERT INTO Student VALUES
('1221234','Rajabasa',    'M','2003-03-25','A', 2.34),
('1120235','Merapi',      'M','2003-02-17','O', 3.55),
('1120236','Mawar Merah', 'F','2004-04-01',NULL,2.10),
('1222237','Semeru',      'M','2004-12-31','A', NULL),
('1322238','Melati Putih','F',NULL,        'B', 1.23),
('1321239','Kutilang',    'M','2004-03-17','O', 3.21),
('1221210','Cempaka',     'F','2003-03-07','O', 0.75),
('1219254','Kilau Intan', 'F',NULL,       NULL, 2.00),
('1119300','Nur Cahaya',  'F','2002-04-16','B', NULL),
('1120335','Jayagiri',    'F',NULL,        'O', 3.50);
GO

Queries SELECT * FROM Student return:

Id      Name             Gender DOB        BloodType GPA
------- ---------------- ------ ---------- --------- ----
1119300 Nur Cahaya       F      2002-04-16 B         NULL
1120235 Merapi           M      2003-02-17 O         3.55
1120236 Mawar Merah      F      2004-04-01 NULL      2.10
1120335 Jayagiri         F      NULL       O         3.50
1219254 Kilau Intan      F      NULL       NULL      2.00
1221210 Cempaka          F      2003-03-07 O         0.75
1221234 Rajabasa         M      2003-03-25 A         2.34
1222237 Semeru           M      2004-12-31 A         NULL
1321239 Kutilang         M      2004-03-17 O         3.21
1322238 Melati Putih     F      NULL       B         1.23

(10 rows affected)

(1) List of student blood types

Relational Algebra	Query
Π_BloodType (Student)	SELECT BloodType FROM Student

Table 2: Projection of Student Blood Types

When executed, the query above produces:

BloodType
---------
B
O
NULL
O
NULL
O
A
A
O
B

(10 rows affected)

It turned out that the results did not match those in the book; there was still data duplication. DBMS doesn’t implement it in the same way as described in theory. So do Oracle, MySQL, Access, and PostgreSQL. The user must explicitly request data using the DISTINCT clause to eliminate data duplication.

SELECT  DISTINCT BloodType
FROM Student

The results are in line with the theory.

BloodType
---------
NULL
A
B
O

(4 rows affected)

(2) Number of students for each blood type

Add the Count(*) aggregation function along with the GROUP BY clause.

SELECT BloodType, 
       Count(*) AS Total 
FROM   Student 
GROUP  BY BloodType

The clause DISTINCT is no longer required. Here are the results.

BloodType Total
--------- -----
NULL      2
A         2
B         2
O         4

(4 rows affected)

(3) List of gender and blood type

Relational Algebra	Query
Π_{Gender, BloodType} (Student)	SELECT Gender, BloodType FROM Student

Table 3: Gender and BloodType Projection

The query returns:

Gender BloodType
------ ---------
F      NULL
F      B
F      O
M      A
M      O

(5 rows affected)

The resulting columns’ order corresponds to the projection statement’s order. The order of the columns at the time of defining the table has no effect.

(4) Number of students for each blood type and gender

Add the Count(*) aggregation function along with the GROUP BY clause.

SELECT Gender, BloodType, 
       Count(*) AS Total 
FROM   Students 
GROUP BY Gender, BloodType

When executed, the result is as follows.

Gender BloodType Total
------ --------- -----
F      NULL      2
M      A         2
F      B         2
F      O         2
M      O         2

(5 rows affected)

Summary

Projection is a column-selecting operation without tuple duplication. The reality is that the DBMS does not implement it in the same way as described in theory. To get a unique tuple result, the user must explicitly request it using the DISTINCT clause.

Hopefully beneficial. Your feedback will be precious.

References

Codd, E.F. (March 1972). “Relational Completeness of Data Base Sublanguages” in Computer Sciences. San Jose, California: IBM Research Laboratory.

Connolly, Thomas M., & Begg, Carolyn E. (2015). Database Systems: A Practical Approach to Design, Implementation, and Management. 6th Edition. Essex, England: Pearson Education.

Coronel, C., Steven, M., Crockett, K., & Blewett, C. (2020). Database Principles: Fundamentals of Design, Implementation, and Management. 3rd Edition. Hampshire, United Kingdom: Cengage Learning.

Introduction

Projection Operation

Summary

References

Leave a Reply Cancel reply