The Importance of Limiting Null Values in Databases: Ensuring Data Integrity and Accurate Queries

Null values are often a result of design choices in relational databases (RDBMS), but their usage can introduce significant challenges, especially when it comes to data integrity and query accuracy. In this article, we will explore why null values should be limited in a database, discuss the concept of null values in SQL, and provide practical examples to illustrate the potential issues that arise when nulls are allowed.

Null Values in SQL

In the context of SQL, null values serve as placeholders for unknown data in a tuple or row component. This design choice is often a consequence of how databases are built and maintained, but it is important to understand that using null values can complicate data processing and querying. According to the principles of relational theory, null attributes are generally forbidden because they can lead to ambiguities and logical inconsistencies.

The Argument Against Null Values

The primary argument against null value usage is rooted in the concept of three-valued logic (3VL). In a standard truth table, we typically have two values: True and False. However, in 3VL, a third value, Unknown, is introduced. This results in some unusual outcomes, such as:

Unknown AND Unknown Unknown Unknown AND True Unknown True AND Unknown Unknown Unknown OR Unknown Unknown Unknown OR False Unknown False OR Unknown Unknown

The incidence of these unknown values when performing queries with null data can lead to unexpected results. This can be particularly problematic in relational database management systems (RDBMS), where data integrity and query accuracy are paramount.

Practical Example: DATA ENTRY FIELDS

Consider a simple data entry form that collects name, address, and city. If null values are allowed for the address and city fields, it would indicate that a form was submitted without completing these fields. This would be a violation of data integrity and could lead to operational issues. A better approach is to explicitly define these fields as not accepting nulls, thereby ensuring that all records are complete and usable.

Example Queries

Let's illustrate this with two example tables: CLIENT and SUPPLIER.

CREATE TABLE CLIENT

ID INT NOT NULL AUTO_INCREMENT,
SURNAME VARCHAR(128) NOT NULL,
CITY VARCHAR(64) NULL,
PRIMARY KEY (ID)

CREATE TABLE SUPPLIER

ID INT NOT NULL AUTO_INCREMENT,
SURNAME VARCHAR(128) NOT NULL,
CITY VARCHAR(128) NOT NULL,
PRIMARY KEY (ID)

Populating these tables with some example data:

SUPPLIER

1 'Atkinson' 'London' 2 'McKenzie' 'Moscow'

CLIENT

1 'Jackson' 2 'Arafat'

Now, let's write a query to find out in which client's cities we do not have suppliers yet:

SELECT * FROM CLIENT C JOIN SUPPLIER S ON    WHERE  IS NULL OR  IS NULL

The result of this query might not be as expected. SQL evaluates the WHERE clause based on the 3VL, leading to the expression 'Unknown OR Unknown', which results in 'Unknown'. This means that the query does not return any data, even though there might be clients with unknown cities.

This issue highlights the potential pitfalls of using null values, especially in complex queries involving multiple tables. It underscores the importance of strict data validation and minimizing the use of null values wherever possible.

Conclusion

Avoiding null values is not merely a matter of pedantry. It is a best practice that ensures data integrity and improves the reliability and accuracy of database operations. While the concept of null values is fundamental in SQL, their usage can introduce subtle but significant challenges. By limiting the use of nulls and validating data at the point of entry, database designers and developers can create more robust and dependable systems.