RFC: Improve the APIs around handling SQL object names

## Motivation

There's a bunch of issues around handling object names that exist due to the existing design assumptions (e.g. https://github.com/doctrine/dbal/issues/4357). The assumptions are:

1. Only the identifiers that are SQL keywords should be quoted. The list of keywords managed by the DBAL is always up to date.
2. Object names are case-insensitive on all platforms under any circumstances.
3. Unqualified object names cannot contain dots (`.`).
4. Object names cannot contain special characters (e.g. `` ` ``).

The above assumptions lead to the API where any name is represented as a string which then has to be _parsed_ to identify its meaning. E.g.:

1. The quotes are stripped from a quoted name but without considering the escaping rules.
2. A dot in the middle of the name is considered to be the delimiter between the name and its qualifier.

## Proposal

This design proposal attempts to replace parsing the string with representing names as domain objects.

### The name interface

The corner-stone interface describing object names will be `Name`:
```php
/**
 * SQL object name.
 */
interface Name
{
    /**
     * Returns object name representation as an SQL literal.
     */
    public function toLiteral(NameBuilder $builder): string;

    /**
     * Returns object name representation as an SQL identifier.
     */
    public function toIdentifier(NameBuilder $builder): string;
}
```
It will represent any name: qualified or unqualified; quoted or unquoted and will be used by all SQL objects like tables, views, columns, databases, etc.

Note, currently, there's no plan to expose the raw value of the name. This will make it easier to abuse it in the raw form which may be the source of all sorts of issues (e.g. SQL injections).

A name can be used in two forms in SQL, hence the methods:

1. **A literal**. The name will be represented as an SQL literal primarily during schema introspection. For instance:
   ```sql
   SELECT * FROM columns WHERE table = <literal>
   ```
2. **An identifier.** The name will be represented as an SQL identifier when the corresponding object is used an an SQL statement. For instance:
   ```sql
   SELECT * FROM <identifier>
   ```

### The SQL Builder interface family

The string representations of the name will be built by an SQL builder that is specific to the given database platform. For the sake of simplicity, we can consider the platform and the SQL builder synonymous for now.

```php
/**
 * SQL builder that builds SQL representing object names. Each database platform must provide an implementation
 * of this interface according to its logic of handling object names.
 */
interface NameBuilder
{
    /**
     * Builds an SQL literal of an unquoted object name.
     */
    public function buildUnquotedNameLiteral(string $name): string;

    /**
     * Builds an SQL identifier of an unquoted object name.
     */
    public function buildUnquotedNameIdentifier(string $name): string;

    /**
     * Builds an SQL literal of a quoted object name.
     */
    public function buildQuotedNameLiteral(string $name): string;

    /**
     * Builds an SQL identifier of a quoted object name.
     */
    public function buildQuotedNameIdentifier(string $name): string;
}
```
The need to have different methods for representing names as literals and identifiers should be clear from the above. The need to have different methods for quoted and unquoted names will be explained below.

### Unqualified and qualified names

Before we get to the details of the unquoted and quoted names, we need to establish a foundation regarding the unqualified and qualified names.

An unqualified name is the "given" name object without the reference to its schema or catalog (e.g. "accounts"):

```php
/**
 * An unqualified name represents the own name of an SQL object within its schema or catalog.
 */
abstract class UnqualifiedName implements Name
{
    protected string $name;

    public function __construct(string $name)
    {
        $this->name = $name;
    }
}
```

A qualified name consists of the qualifier (which can be a qualified name itself) and an unqualified name. For instance, a table named "accounts" contained in the "accounting" schema can be referenced as `accounting.accounts`, or even more specifically, `customer1.accounting.accounts` where "customer1" is the name of the database or a catalog to which the "accounting" schema belongs.

```php
/**
 * A qualified name consists of an unqualified name and a qualifier.
 */
final class QualifiedName implements Name
{
    private Name $qualifier;
    private UnqualifiedName $name;

    public function __construct(Name $qualifier, UnqualifiedName $name)
    {
        $this->qualifier = $qualifier;
        $this->name      = $name;
    }

    public function toLiteral(NameBuilder $builder): string
    {
        throw new LogicException("A qualified name cannot be converted to an SQL literal.");
    }

    public function toIdentifier(NameBuilder $builder): string
    {
        return $this->qualifier->toIdentifier($builder) . '.' . $this->name->toIdentifier($builder);
    }
}
```

Note that a qualified name is a compound object, so it can be used as an identifier but there's no scenario where it would be used as a literal. Hence, the corresponding method will not be implemented and isn't supposed to be used.

### Unquoted and quoted names

An unqualified name can be unquoted or quoted. There are two reasons why a name may need to be quoted in SQL:

1. The name represents an SQL keyword. The quoting may be needed for the SQL parser on a certain database platform to interpret the token as an identifier rather than a keyword.

   The following query is syntactically invalid in MySQL:
   ```sql
   SELECT * FROM from
   ```

   But this one is valid:
   ```sql
   SELECT * FROM `from`
   ```
2. The that is not fully upper-cased must retain its original case on the platforms like Oracle and IBM DB2.

   In the the following query, the name will be internally converted to the upper case:
   ```sql
   SELECT * FROM accounts -- converted internally to SELECT * FROM ACCOUNTS
   ```

   But in this one it will remain lower-cased:
   ```sql
   SELECT * FROM "accounts"
   ```

Proper design and implementation of the casing aspect will allow us to quote identifiers _unconditionally_ (i.e. _always_) which will remove the requirement to maintain the keywords lists and improve the stability and security aspects of the API.

```php
/**
 * An unquoted name is an unqualified name that is not quoted.
 */
final class UnquotedName extends UnqualifiedName
{
    public function toLiteral(NameBuilder $builder): string
    {
        return $builder->buildUnquotedNameLiteral($this->name);
    }

    public function toIdentifier(NameBuilder $builder): string
    {
        return $builder->buildUnquotedNameIdentifier($this->name);
    }
}

/**
 * An unquoted name is an unqualified name that is quoted.
 */
final class QuotedName extends UnqualifiedName
{
    public function toLiteral(NameBuilder $builder): string
    {
        return $builder->buildQuotedNameLiteral($this->name);
    }

    public function toIdentifier(NameBuilder $builder): string
    {
        return $builder->buildQuotedNameIdentifier($this->name);
    }
}
```
### Unquoted name normalization

Depending on the destination database platform, an unquoted name may need to be normalized before getting quoted. Currently, this is implemented in an obscure and non-systematic way like the following:
1. https://github.com/doctrine/dbal/blob/03a976a687a523e93f5f859f18590f9f47468e0b/src/Platforms/OraclePlatform.php#L514
2. https://github.com/doctrine/dbal/blob/03a976a687a523e93f5f859f18590f9f47468e0b/src/Platforms/DB2Platform.php#L254

Instead, we can introduce an interface that would describe this concern and be implemented by the platforms as required:
```php
final class DefaultUnquotedNameNormalizer implements UnquotedNameNormalizer
{
    public function normalizeUnquotedName(string $name): string
    {
        return $name;
    }
}

final class UpperCasingNameNormalizer implements UnquotedNameNormalizer
{
    public function normalizeUnquotedName(string $name): string
    {
        return strtoupper($name);
    }
}
```

The implementations of the `NameBuilder` interface will depend on `UnquotedNameNormalizer`.

### Object sets

There is a common component missing in the Schema API that would allow implementing a set of named database objects. Currently, the `Schema` and the `Table` classes have their own implementations of the set of `Table`s and `Column`s. A unified API is needed because the object names will be considered not only depending on their value but also depending on the platform in which the objects will be created.

## Breaking API changes

Once the above APIs are implemented and solidified, we need to transition the APIs under the `Schema` namespace to use the names expressed as objects instead of plain strings:

1. The objects that reside immediately within a schema (e.g. tables), instead of `string $name` will require an `UnqualifiedName $name` in their constructor. The same applies to the objects referencing such objects (e.g. the referenced table of a foreign key).
2. All other objects like columns, indexes, etc. will require an `UnqualifiedName $name`.

To implement that, some breaking API changes are needed around schema introspection. Specifically, in the implementation of the `AbstractSchemaManager::list*()`. See the "Reasoning behind the design changes" section in https://github.com/doctrine/dbal/pull/4548.

This is where we're trying squeeze a qualified name into a string:

https://github.com/doctrine/dbal/blob/cc378c34815cf3377312888a71047b7e167f7a76/src/Schema/PostgreSQLSchemaManager.php#L185

And then parse:
https://github.com/doctrine/dbal/blob/dc4ad2a0d7f72e9e55dd1f9e2a341c068fd2b89f/src/Schema/AbstractAsset.php#L47

And miserably fail: https://github.com/doctrine/dbal/pull/4708.

### Get rid of `AbstractAsset`
1. All objects have a name but for some of them it's nullable (e.g. constraints) while for others it's not (e.g. tables). Keeping it defined in a base class is a static analysis nightmare.
2. Some objects allow a qualified name (e.g. tables when used with PostgreSQL) but others don't (e.g. columns when defined within the table). Keeping the name defined in the base class again makes it challenging for static analysis.
3. Generation of nested object names (e.g. when creating an index on the table) is definitely not the responsibility of an abstract asset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Improve the APIs around handling SQL object names #4772

Motivation

Proposal

The name interface

The SQL Builder interface family

Unqualified and qualified names

Unquoted and quoted names

Unquoted name normalization

Object sets

Breaking API changes

Get rid of `AbstractAsset`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC: Improve the APIs around handling SQL object names #4772

Description

Motivation

Proposal

The name interface

The SQL Builder interface family

Unqualified and qualified names

Unquoted and quoted names

Unquoted name normalization

Object sets

Breaking API changes

Get rid of AbstractAsset

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Get rid of `AbstractAsset`