Description
Motivation
There's a bunch of issues around handling object names that exist due to the existing design assumptions (e.g. #4357). The assumptions are:
- Only the identifiers that are SQL keywords should be quoted. The list of keywords managed by the DBAL is always up to date.
- Object names are case-insensitive on all platforms under any circumstances.
- Unqualified object names cannot contain dots (
.
). - Object names cannot contain special characters (e.g.
`
).
The above assumptions lead to the API where any name is represented as a string which then has to be parsed to identify its meaning. E.g.:
- The quotes are stripped from a quoted name but without considering the escaping rules.
- A dot in the middle of the name is considered to be the delimiter between the name and its qualifier.
Proposal
This design proposal attempts to replace parsing the string with representing names as domain objects.
The name interface
The corner-stone interface describing object names will be Name
:
/**
* SQL object name.
*/
interface Name
{
/**
* Returns object name representation as an SQL literal.
*/
public function toLiteral(NameBuilder $builder): string;
/**
* Returns object name representation as an SQL identifier.
*/
public function toIdentifier(NameBuilder $builder): string;
}
It will represent any name: qualified or unqualified; quoted or unquoted and will be used by all SQL objects like tables, views, columns, databases, etc.
Note, currently, there's no plan to expose the raw value of the name. This will make it easier to abuse it in the raw form which may be the source of all sorts of issues (e.g. SQL injections).
A name can be used in two forms in SQL, hence the methods:
- A literal. The name will be represented as an SQL literal primarily during schema introspection. For instance:
SELECT * FROM columns WHERE table = <literal>
- An identifier. The name will be represented as an SQL identifier when the corresponding object is used an an SQL statement. For instance:
SELECT * FROM <identifier>
The SQL Builder interface family
The string representations of the name will be built by an SQL builder that is specific to the given database platform. For the sake of simplicity, we can consider the platform and the SQL builder synonymous for now.
/**
* SQL builder that builds SQL representing object names. Each database platform must provide an implementation
* of this interface according to its logic of handling object names.
*/
interface NameBuilder
{
/**
* Builds an SQL literal of an unquoted object name.
*/
public function buildUnquotedNameLiteral(string $name): string;
/**
* Builds an SQL identifier of an unquoted object name.
*/
public function buildUnquotedNameIdentifier(string $name): string;
/**
* Builds an SQL literal of a quoted object name.
*/
public function buildQuotedNameLiteral(string $name): string;
/**
* Builds an SQL identifier of a quoted object name.
*/
public function buildQuotedNameIdentifier(string $name): string;
}
The need to have different methods for representing names as literals and identifiers should be clear from the above. The need to have different methods for quoted and unquoted names will be explained below.
Unqualified and qualified names
Before we get to the details of the unquoted and quoted names, we need to establish a foundation regarding the unqualified and qualified names.
An unqualified name is the "given" name object without the reference to its schema or catalog (e.g. "accounts"):
/**
* An unqualified name represents the own name of an SQL object within its schema or catalog.
*/
abstract class UnqualifiedName implements Name
{
protected string $name;
public function __construct(string $name)
{
$this->name = $name;
}
}
A qualified name consists of the qualifier (which can be a qualified name itself) and an unqualified name. For instance, a table named "accounts" contained in the "accounting" schema can be referenced as accounting.accounts
, or even more specifically, customer1.accounting.accounts
where "customer1" is the name of the database or a catalog to which the "accounting" schema belongs.
/**
* A qualified name consists of an unqualified name and a qualifier.
*/
final class QualifiedName implements Name
{
private Name $qualifier;
private UnqualifiedName $name;
public function __construct(Name $qualifier, UnqualifiedName $name)
{
$this->qualifier = $qualifier;
$this->name = $name;
}
public function toLiteral(NameBuilder $builder): string
{
throw new LogicException("A qualified name cannot be converted to an SQL literal.");
}
public function toIdentifier(NameBuilder $builder): string
{
return $this->qualifier->toIdentifier($builder) . '.' . $this->name->toIdentifier($builder);
}
}
Note that a qualified name is a compound object, so it can be used as an identifier but there's no scenario where it would be used as a literal. Hence, the corresponding method will not be implemented and isn't supposed to be used.
Unquoted and quoted names
An unqualified name can be unquoted or quoted. There are two reasons why a name may need to be quoted in SQL:
-
The name represents an SQL keyword. The quoting may be needed for the SQL parser on a certain database platform to interpret the token as an identifier rather than a keyword.
The following query is syntactically invalid in MySQL:
SELECT * FROM from
But this one is valid:
SELECT * FROM `from`
-
The that is not fully upper-cased must retain its original case on the platforms like Oracle and IBM DB2.
In the the following query, the name will be internally converted to the upper case:
SELECT * FROM accounts -- converted internally to SELECT * FROM ACCOUNTS
But in this one it will remain lower-cased:
SELECT * FROM "accounts"
Proper design and implementation of the casing aspect will allow us to quote identifiers unconditionally (i.e. always) which will remove the requirement to maintain the keywords lists and improve the stability and security aspects of the API.
/**
* An unquoted name is an unqualified name that is not quoted.
*/
final class UnquotedName extends UnqualifiedName
{
public function toLiteral(NameBuilder $builder): string
{
return $builder->buildUnquotedNameLiteral($this->name);
}
public function toIdentifier(NameBuilder $builder): string
{
return $builder->buildUnquotedNameIdentifier($this->name);
}
}
/**
* An unquoted name is an unqualified name that is quoted.
*/
final class QuotedName extends UnqualifiedName
{
public function toLiteral(NameBuilder $builder): string
{
return $builder->buildQuotedNameLiteral($this->name);
}
public function toIdentifier(NameBuilder $builder): string
{
return $builder->buildQuotedNameIdentifier($this->name);
}
}
Unquoted name normalization
Depending on the destination database platform, an unquoted name may need to be normalized before getting quoted. Currently, this is implemented in an obscure and non-systematic way like the following:
dbal/src/Platforms/OraclePlatform.php
Line 514 in 03a976a
dbal/src/Platforms/DB2Platform.php
Line 254 in 03a976a
Instead, we can introduce an interface that would describe this concern and be implemented by the platforms as required:
final class DefaultUnquotedNameNormalizer implements UnquotedNameNormalizer
{
public function normalizeUnquotedName(string $name): string
{
return $name;
}
}
final class UpperCasingNameNormalizer implements UnquotedNameNormalizer
{
public function normalizeUnquotedName(string $name): string
{
return strtoupper($name);
}
}
The implementations of the NameBuilder
interface will depend on UnquotedNameNormalizer
.
Object sets
There is a common component missing in the Schema API that would allow implementing a set of named database objects. Currently, the Schema
and the Table
classes have their own implementations of the set of Table
s and Column
s. A unified API is needed because the object names will be considered not only depending on their value but also depending on the platform in which the objects will be created.
Breaking API changes
Once the above APIs are implemented and solidified, we need to transition the APIs under the Schema
namespace to use the names expressed as objects instead of plain strings:
- The objects that reside immediately within a schema (e.g. tables), instead of
string $name
will require anUnqualifiedName $name
in their constructor. The same applies to the objects referencing such objects (e.g. the referenced table of a foreign key). - All other objects like columns, indexes, etc. will require an
UnqualifiedName $name
.
To implement that, some breaking API changes are needed around schema introspection. Specifically, in the implementation of the AbstractSchemaManager::list*()
. See the "Reasoning behind the design changes" section in #4548.
This is where we're trying squeeze a qualified name into a string:
dbal/src/Schema/PostgreSQLSchemaManager.php
Line 185 in cc378c3
And then parse:
dbal/src/Schema/AbstractAsset.php
Line 47 in dc4ad2a
And miserably fail: #4708.
Get rid of AbstractAsset
- All objects have a name but for some of them it's nullable (e.g. constraints) while for others it's not (e.g. tables). Keeping it defined in a base class is a static analysis nightmare.
- Some objects allow a qualified name (e.g. tables when used with PostgreSQL) but others don't (e.g. columns when defined within the table). Keeping the name defined in the base class again makes it challenging for static analysis.
- Generation of nested object names (e.g. when creating an index on the table) is definitely not the responsibility of an abstract asset.