Under which cases you will probably need a database.
You just finished the typing of an essay that you need to deliver in one of your university courses. You decided to save it in your personal folder, which results in a document stored in the file system of your personal computer. In that scenario the file is often accessible by any user who has access to your personal environment, and you can modify the file manually. Nonetheless, you do not really care about these problems due to the small scale of the project and the idea that you will rarely need that information again.
Now, think of a second scenario in which you are the CEO of a company and you need to store a bunch of information related to the system of the company, the employees and the collaborators. It is obvious that you would prefer a solution that improves the way that information is stored and retrieved from the system, and the security of the data as well. For example, if you are thinking of doing something similar like the first example, it is more likely to end up with an inconsistent storage structure, which doesn't follow a certain rule about how the data is stored. Moreover, the file system doesn't provide any tool that allows to easily handle and retrieve information from the system. If such considerations are excluded from this scale of projects, it is more likely the company to encounter problems related to accessing and updating its own information. But, is there any other solution? The answer is yes, a database.
What is a database and how is related to a database management system?
A database is a storage unit that allows to easily and securely store data with either a structured or unstructured way, providing all the necessary tools to interact with the stored information. More importantly, a database is equipped with a tool called DataBase Management System (DBMS) that has been programmed to interact with the database by storing, modifying and querying the database.
A DBMS may fall in one of the three below architectural paradigms:
Relational Database Management System (RDBMS)
Object-Oriented Database Management System (OODBMS)
Object-Relational Database Management System (ORDBMS)
I will attempt to briefly explain the concepts of each DBMS, and in later articles I will cover the general functionality of each system, explaining any definition associated with it.
Relational Database Management System?
A Relational Database Management System (RDBMS) is defined as the DBMS that interacts with a relational database. Essentially, a relational database is a collection of one or more relations, where each relation is a table with rows (tuples or records) and columns (attributes) ― this structure is often visualised as an excel spreadsheet ―. The foundations of this architecture are based on relational algebra and the speculation that a relation represents a set. This means that a row in a table must be uniquely defined, either by specifying a single or multiple columns ― this rule is commonly referred as primary key ―. The existence of a primary key allows to connect a parent relation with a children relation and associate them. When I say that two tables are associated, I mean that the values of a primary key exist in another table ― children table, without a specific order ―, and relates information that appears in the children table. The column in the children table that contains the values related to the primary key is commonly referred foreign key.
It's easier to get a grasp of the whole idea by a simple example. Let’s say that the table Users contains the users that have been registered in a website. Until now, two users have been stored in the database, with each user to be described by four attributes ( id, first_name , last_name , age). The primary key of a user is expected to uniquely define a user, which means that no other user will appear in the relation with the same value. In theory, this value will be the result of first_name, last_name and date_of_birth attributes ― for example John is uniquely defined by the combination of John Doe, borned at 1st January 2000 ―,however, to minimize complexity and improve performance, databases abstract this value as a numeric id. As a result, if we search for an id of 1 in the relations of Users, we will get the record of John Doe, borned at 1st January 2000, which doesn't exist in any other row.
id (pk) first_name last_name dae_of_birth 1 John Doe 2000-01-01 2 Paul Foo 1996-07-23
In order to record the user’s activity, we have also created an additional table, called Logins. This relation consists of the attributes of id, user_id and time_of_login. Similarly as Users relation, Logins relation has an attribute that doesn’t allow duplicate values (id), which is the primary key of the current table, however, it appears to has an additional column that records the users id. This column is the foreign key of Logins relation and connects the Users relation with the Logins.
id (pk) user_id (fid) time_of_login 1 1 2019-01-04 15:33:00 2 2 2019-01-04 15:10:00 3 1 2019-01-05 11:07:30 4 1 2019-01-06 15:59:15
The power of RDBMS comes when we use SQL to handle and retrieve information. SQL (Structured Query Language) is a high-level language that is the de-facto language for RDBMS ― SQLite is an example of such a system ―. Continuing from our example above, a SQL statement may be used to exploit the appearance of a foreign key in the Logins table. We can aggregate the information contained in both tables, extracting a single table that gathers all the information related to a user.
SELECT u.first_name, u.last_name, u.date_of_birth, l.time_of_loginFROM Users as u, Logins as lWHERE u.id = l.user_id;
first_name last_name date_of_birth time_of_login John Doe 2000-1-1 2019-01-04 15:33:00 Paul Foo 1996-07-23 2019-01-04 15:10:00 John Doe 2000-1-1 2019-01-05 11:07:30 John Doe 2000-1-1 2019-01-06 15:59:15
Object-Oriented Database Management System
While a RDBMS is table-oriented, meaning that its basic concepts are based on relations ― have a specific schema ―, an Object-Oriented Database Management System (OODBMS) is strongly influenced by the concept of Object-Oriented Programming (OOP). Because of that, the information is stored in the form of objects, similar to JSON, which is a lightweight key-value data format. A key-value pair in a OODBMS is analogous to an attribute in a RDBMS. Such a format does not require a specific structure, and therefore such databases are called schemaless.
The smallest unit that may be stored in an OODBMS is the document/object, which is equivalent with a record in a RDBMS. A larger unit is the collection, which can store a number of documents that doesn't necessarily have the same structure ― schemaless ―, and is analogous to a table unit in a RDBMS. One limitation of OODBMS is the lack of SQL language ― this is one of the reasons that they are also referred to No-SQL databases ―, however most of them implement their own query language. For example, if you install MongoDB you will see that a JavaScript interpreter language is installed by default. It's worth to note that there are some No-SQL databases that may support a SQL language, however this is not so often. A huge advantage of No-SQL databases is their increased performance, which is highly influenced by the low complexity of their data format. This is one of the reasons that these type of databases are preferred for storing and retrieval procedures. MongoDB and Cassandra DB are a couple of databases that support an OODBMS.
Object-Relational Database Management System
The last architectural paradigm that most of the databases support is the Object-Relational Database System (ORDBMS). ORDBMS is a hybrid model between a RDBMS and OODBMS, because it inherits most functionalities from both systems. The ORDBMS is a bridge between a relational and an object-oriented paradigm as it preserves the basic principles of RDBMS with SQL language, being capable at the same time to store unstructured data without a specific format. PostgreSQL, Microsoft SQL Server (MSSQL) and MySQL are some examples of ORDBMS.