Skip to content

RFC: Improve the APIs around handling SQL object names #4772

Open
@morozov

Description

@morozov

Motivation

There's a bunch of issues around handling object names that exist due to the existing design assumptions (e.g. #4357). The assumptions are:

  1. Only the identifiers that are SQL keywords should be quoted. The list of keywords managed by the DBAL is always up to date.
  2. Object names are case-insensitive on all platforms under any circumstances.
  3. Unqualified object names cannot contain dots (.).
  4. Object names cannot contain special characters (e.g. `).

The above assumptions lead to the API where any name is represented as a string which then has to be parsed to identify its meaning. E.g.:

  1. The quotes are stripped from a quoted name but without considering the escaping rules.
  2. A dot in the middle of the name is considered to be the delimiter between the name and its qualifier.

Proposal

This design proposal attempts to replace parsing the string with representing names as domain objects.

The name interface

The corner-stone interface describing object names will be Name:

/**
 * SQL object name.
 */
interface Name
{
    /**
     * Returns object name representation as an SQL literal.
     */
    public function toLiteral(NameBuilder $builder): string;

    /**
     * Returns object name representation as an SQL identifier.
     */
    public function toIdentifier(NameBuilder $builder): string;
}

It will represent any name: qualified or unqualified; quoted or unquoted and will be used by all SQL objects like tables, views, columns, databases, etc.

Note, currently, there's no plan to expose the raw value of the name. This will make it easier to abuse it in the raw form which may be the source of all sorts of issues (e.g. SQL injections).

A name can be used in two forms in SQL, hence the methods:

  1. A literal. The name will be represented as an SQL literal primarily during schema introspection. For instance:
    SELECT * FROM columns WHERE table = <literal>
  2. An identifier. The name will be represented as an SQL identifier when the corresponding object is used an an SQL statement. For instance:
    SELECT * FROM <identifier>

The SQL Builder interface family

The string representations of the name will be built by an SQL builder that is specific to the given database platform. For the sake of simplicity, we can consider the platform and the SQL builder synonymous for now.

/**
 * SQL builder that builds SQL representing object names. Each database platform must provide an implementation
 * of this interface according to its logic of handling object names.
 */
interface NameBuilder
{
    /**
     * Builds an SQL literal of an unquoted object name.
     */
    public function buildUnquotedNameLiteral(string $name): string;

    /**
     * Builds an SQL identifier of an unquoted object name.
     */
    public function buildUnquotedNameIdentifier(string $name): string;

    /**
     * Builds an SQL literal of a quoted object name.
     */
    public function buildQuotedNameLiteral(string $name): string;

    /**
     * Builds an SQL identifier of a quoted object name.
     */
    public function buildQuotedNameIdentifier(string $name): string;
}

The need to have different methods for representing names as literals and identifiers should be clear from the above. The need to have different methods for quoted and unquoted names will be explained below.

Unqualified and qualified names

Before we get to the details of the unquoted and quoted names, we need to establish a foundation regarding the unqualified and qualified names.

An unqualified name is the "given" name object without the reference to its schema or catalog (e.g. "accounts"):

/**
 * An unqualified name represents the own name of an SQL object within its schema or catalog.
 */
abstract class UnqualifiedName implements Name
{
    protected string $name;

    public function __construct(string $name)
    {
        $this->name = $name;
    }
}

A qualified name consists of the qualifier (which can be a qualified name itself) and an unqualified name. For instance, a table named "accounts" contained in the "accounting" schema can be referenced as accounting.accounts, or even more specifically, customer1.accounting.accounts where "customer1" is the name of the database or a catalog to which the "accounting" schema belongs.

/**
 * A qualified name consists of an unqualified name and a qualifier.
 */
final class QualifiedName implements Name
{
    private Name $qualifier;
    private UnqualifiedName $name;

    public function __construct(Name $qualifier, UnqualifiedName $name)
    {
        $this->qualifier = $qualifier;
        $this->name      = $name;
    }

    public function toLiteral(NameBuilder $builder): string
    {
        throw new LogicException("A qualified name cannot be converted to an SQL literal.");
    }

    public function toIdentifier(NameBuilder $builder): string
    {
        return $this->qualifier->toIdentifier($builder) . '.' . $this->name->toIdentifier($builder);
    }
}

Note that a qualified name is a compound object, so it can be used as an identifier but there's no scenario where it would be used as a literal. Hence, the corresponding method will not be implemented and isn't supposed to be used.

Unquoted and quoted names

An unqualified name can be unquoted or quoted. There are two reasons why a name may need to be quoted in SQL:

  1. The name represents an SQL keyword. The quoting may be needed for the SQL parser on a certain database platform to interpret the token as an identifier rather than a keyword.

    The following query is syntactically invalid in MySQL:

    SELECT * FROM from

    But this one is valid:

    SELECT * FROM `from`
  2. The that is not fully upper-cased must retain its original case on the platforms like Oracle and IBM DB2.

    In the the following query, the name will be internally converted to the upper case:

    SELECT * FROM accounts -- converted internally to SELECT * FROM ACCOUNTS

    But in this one it will remain lower-cased:

    SELECT * FROM "accounts"

Proper design and implementation of the casing aspect will allow us to quote identifiers unconditionally (i.e. always) which will remove the requirement to maintain the keywords lists and improve the stability and security aspects of the API.

/**
 * An unquoted name is an unqualified name that is not quoted.
 */
final class UnquotedName extends UnqualifiedName
{
    public function toLiteral(NameBuilder $builder): string
    {
        return $builder->buildUnquotedNameLiteral($this->name);
    }

    public function toIdentifier(NameBuilder $builder): string
    {
        return $builder->buildUnquotedNameIdentifier($this->name);
    }
}

/**
 * An unquoted name is an unqualified name that is quoted.
 */
final class QuotedName extends UnqualifiedName
{
    public function toLiteral(NameBuilder $builder): string
    {
        return $builder->buildQuotedNameLiteral($this->name);
    }

    public function toIdentifier(NameBuilder $builder): string
    {
        return $builder->buildQuotedNameIdentifier($this->name);
    }
}

Unquoted name normalization

Depending on the destination database platform, an unquoted name may need to be normalized before getting quoted. Currently, this is implemented in an obscure and non-systematic way like the following:

  1. return $identifier->isQuoted() ? $identifier : new Identifier(strtoupper($name));
  2. WHERE UPPER(c.tabname) = UPPER(" . $table . ')

Instead, we can introduce an interface that would describe this concern and be implemented by the platforms as required:

final class DefaultUnquotedNameNormalizer implements UnquotedNameNormalizer
{
    public function normalizeUnquotedName(string $name): string
    {
        return $name;
    }
}

final class UpperCasingNameNormalizer implements UnquotedNameNormalizer
{
    public function normalizeUnquotedName(string $name): string
    {
        return strtoupper($name);
    }
}

The implementations of the NameBuilder interface will depend on UnquotedNameNormalizer.

Object sets

There is a common component missing in the Schema API that would allow implementing a set of named database objects. Currently, the Schema and the Table classes have their own implementations of the set of Tables and Columns. A unified API is needed because the object names will be considered not only depending on their value but also depending on the platform in which the objects will be created.

Breaking API changes

Once the above APIs are implemented and solidified, we need to transition the APIs under the Schema namespace to use the names expressed as objects instead of plain strings:

  1. The objects that reside immediately within a schema (e.g. tables), instead of string $name will require an UnqualifiedName $name in their constructor. The same applies to the objects referencing such objects (e.g. the referenced table of a foreign key).
  2. All other objects like columns, indexes, etc. will require an UnqualifiedName $name.

To implement that, some breaking API changes are needed around schema introspection. Specifically, in the implementation of the AbstractSchemaManager::list*(). See the "Reasoning behind the design changes" section in #4548.

This is where we're trying squeeze a qualified name into a string:

return $table['schema_name'] . '.' . $table['table_name'];

And then parse:

if (strpos($name, '.') !== false) {

And miserably fail: #4708.

Get rid of AbstractAsset

  1. All objects have a name but for some of them it's nullable (e.g. constraints) while for others it's not (e.g. tables). Keeping it defined in a base class is a static analysis nightmare.
  2. Some objects allow a qualified name (e.g. tables when used with PostgreSQL) but others don't (e.g. columns when defined within the table). Keeping the name defined in the base class again makes it challenging for static analysis.
  3. Generation of nested object names (e.g. when creating an index on the table) is definitely not the responsibility of an abstract asset.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions