Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] type issue in databricks_sql_table #4422

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ian-norris-ncino
Copy link

@ian-norris-ncino ian-norris-ncino commented Jan 21, 2025

Changes

Updated type check to only compare the first word in the type string.

Resolves #4421

Tests

  • make test run locally
  • relevant change in docs/ folder
  • covered with integration tests in internal/acceptance
  • using Go SDK
  • using TF Plugin Framework

@ian-norris-ncino ian-norris-ncino requested review from a team as code owners January 21, 2025 19:34
@ian-norris-ncino ian-norris-ncino requested review from parthban-db and removed request for a team January 21, 2025 19:34
@alexott alexott changed the title Fix type issue in databricks_sql_table [Fix] type issue in databricks_sql_table Jan 22, 2025
@alexott
Copy link
Contributor

alexott commented Jan 23, 2025

You also need to think how to handle cases like, if you have a decimal type declared as DECIMAL(3, 2) - it will be converted into DECIMAL(3, and this is not correct. So we'll need to have a bit more sophisticated approach.

@ian-norris-ncino
Copy link
Author

ian-norris-ncino commented Jan 23, 2025

You also need to think how to handle cases like, if you have a decimal type declared as DECIMAL(3, 2) - it will be converted into DECIMAL(3, and this is not correct. So we'll need to have a bit more sophisticated approach.

I used regex instead. I looked at this document and it seems all the types use parentheses to signify arguments.

@alexott
Copy link
Contributor

alexott commented Jan 24, 2025

Integration tests has failed with the following error:

cannot create sql table: cannot execute CREATE TABLE `main`.`sheghbiihgiaj`.`fjdhiaajkejh` (`name` TIMESTAMP DEFAULT current_timestamp() COMMENT 'comment')
  | COMMENT 'this table is managed by terraform'
  | TBLPROPERTIES ('delta.minWriterVersion'='7', 'this'='that', 'delta.feature.columnMapping'='supported', 'delta.feature.invariants'='supported', 'something'='else', 'delta.columnMapping.mode'='name', 'delta.minReaderVersion'='3');: [WRONG_COLUMN_DEFAULTS_FOR_DELTA_FEATURE_NOT_ENABLED] Failed to execute CREATE TABLE command because it assigned a column DEFAULT value,
  | but the corresponding table feature was not enabled. Please retry the command again
  | after executing ALTER TABLE tableName SET
  | TBLPROPERTIES('delta.feature.allowColumnDefaults' = 'supported'). SQLSTATE: 0AKDE

@ian-norris-ncino
Copy link
Author

Integration tests has failed with the following error:

cannot create sql table: cannot execute CREATE TABLE `main`.`sheghbiihgiaj`.`fjdhiaajkejh` (`name` TIMESTAMP DEFAULT current_timestamp() COMMENT 'comment')
  | COMMENT 'this table is managed by terraform'
  | TBLPROPERTIES ('delta.minWriterVersion'='7', 'this'='that', 'delta.feature.columnMapping'='supported', 'delta.feature.invariants'='supported', 'something'='else', 'delta.columnMapping.mode'='name', 'delta.minReaderVersion'='3');: [WRONG_COLUMN_DEFAULTS_FOR_DELTA_FEATURE_NOT_ENABLED] Failed to execute CREATE TABLE command because it assigned a column DEFAULT value,
  | but the corresponding table feature was not enabled. Please retry the command again
  | after executing ALTER TABLE tableName SET
  | TBLPROPERTIES('delta.feature.allowColumnDefaults' = 'supported'). SQLSTATE: 0AKDE

I updated the template generation to include the needed delta flag

@ian-norris-ncino
Copy link
Author

@alexott Anything else I can do?

@alexott
Copy link
Contributor

alexott commented Jan 28, 2025

It looks like there is some syntax error - integration tests are failing with:

Error: Missing key/value separator
  | 
  |   on terraform_plugin_test.tf line 23, in resource "databricks_sql_table" "this":
  |   20:     properties         = {
  |   21:         "this"                        = "that"
  |   22:         "something"                   = "else"
  |   23:         "delta.feature.allowColumnDefaults = "supported"
  | 
  | Expected an equals sign ("=") to mark the beginning of the attribute value.
  | If you intended to given an attribute name containing periods or spaces,
  | write the name in quotes to create a string literal.
  | 
  | Error: Invalid multi-line string
  | 
  |   on terraform_plugin_test.tf line 23, in resource "databricks_sql_table" "this":
  |   23:         "delta.feature.allowColumnDefaults = "supported"
  |   24:         "delta.feature.columnMapping" = "supported"
  | 
  | Quoted strings may not be split over multiple lines. To produce a multi-line
  | string, either use the \n escape to represent a newline character or use the
  | "heredoc" multi-line template syntax.
  | 
  | Error: Invalid multi-line string
  | 
  |   on terraform_plugin_test.tf line 24, in resource "databricks_sql_table" "this":
  |   24:         "delta.feature.columnMapping" = "supported"
  |   25:         "delta.feature.invariants"    = "supported"
  | 

@ian-norris-ncino
Copy link
Author

It looks like there is some syntax error - integration tests are failing with:

Error: Missing key/value separator
  | 
  |   on terraform_plugin_test.tf line 23, in resource "databricks_sql_table" "this":
  |   20:     properties         = {
  |   21:         "this"                        = "that"
  |   22:         "something"                   = "else"
  |   23:         "delta.feature.allowColumnDefaults = "supported"
  | 
  | Expected an equals sign ("=") to mark the beginning of the attribute value.
  | If you intended to given an attribute name containing periods or spaces,
  | write the name in quotes to create a string literal.
  | 
  | Error: Invalid multi-line string
  | 
  |   on terraform_plugin_test.tf line 23, in resource "databricks_sql_table" "this":
  |   23:         "delta.feature.allowColumnDefaults = "supported"
  |   24:         "delta.feature.columnMapping" = "supported"
  | 
  | Quoted strings may not be split over multiple lines. To produce a multi-line
  | string, either use the \n escape to represent a newline character or use the
  | "heredoc" multi-line template syntax.
  | 
  | Error: Invalid multi-line string
  | 
  |   on terraform_plugin_test.tf line 24, in resource "databricks_sql_table" "this":
  |   24:         "delta.feature.columnMapping" = "supported"
  |   25:         "delta.feature.invariants"    = "supported"
  | 

🤦‍♂️ just updated.

Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general looks good, but need to think on how the change of the type may affect apply/read (I'm not sure if it won't lead to the configuration drift).

integration test is passing, but we need to fix our build to make it overall green

Comment on lines -546 to +551
return caseInsensitiveColumnType
return normalizedColumnType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this change, need to think how changes like decimal(12, 2) -> decimal(12,2) will affect the plan/apply and then read operations

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me know what you think the best approach is and I'm happy to implement.

Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/terraform

Inputs:

  • PR number: 4422
  • Commit SHA: 0ee9ac0e2adb646c86d5af70af0308ff0bac4d64

Checks will be approved automatically on success.

@alexott alexott temporarily deployed to test-trigger-is January 29, 2025 15:25 — with GitHub Actions Inactive
@ian-norris-ncino
Copy link
Author

@alexott Any thoughts on how we can proceed?

@mgyucht
Copy link
Contributor

mgyucht commented Feb 26, 2025

Hi @ian-norris-ncino, thank you for this contribution.

I've gone over this change carefully with @alexott. While I recognize it solves your immediate issue with being able to specify a timestamp column's default value, I am concerned about merging the change as it is.

We need to be able to accurately compare the type of a column as returned by the Tables API with what is specified in a user's config. I have two concerns about this:

  1. Using regular expressions to parse the type provided by users has certain downsides. For instance: there are many attributes of columns that can be expressed in the type: how can we use regexs to properly identify those?
  2. Currently, this PR actually ignores any of those other attributes specified by the user. This causes the diff checking to be incorrect: the provider only checks whether the type matches the type of the column, but it doesn't verify that other annotations are set properly. For example, if the default condition were removed or changed to some other value, the provider would not detect that.

The current implementation is far from ideal, and I'll be the first to admit that, but I don't think this is the right direction to start addressing this. To better support this, I would propose the following pathway.

  1. Define a struct that captures a column's definition, including all attributes. It's OK if this struct doesn't include everything from the start, but it needs to be easily extensible for new attributes.
  2. Implement methods that deserialize the TF config and the Tables API response into this type.
  3. Define a struct that captures the difference between the existing and configured type.
  4. Define a method to compute this difference given the existing and configured type.
  5. Implement a method to convert a given difference into a set of SQL statements that alter the column to achieve the configured type.

I don't know how easy this will be, as I don't know if there is a complete reference for type_json. https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-syntax-aux-describe-table is an initial point, but I think there are many fields that are not documented here (such as a metadata field which includes the default values).

@ian-norris-ncino
Copy link
Author

Hi @ian-norris-ncino, thank you for this contribution.

I've gone over this change carefully with @alexott. While I recognize it solves your immediate issue with being able to specify a timestamp column's default value, I am concerned about merging the change as it is.

We need to be able to accurately compare the type of a column as returned by the Tables API with what is specified in a user's config. I have two concerns about this:

  1. Using regular expressions to parse the type provided by users has certain downsides. For instance: there are many attributes of columns that can be expressed in the type: how can we use regexs to properly identify those?
  2. Currently, this PR actually ignores any of those other attributes specified by the user. This causes the diff checking to be incorrect: the provider only checks whether the type matches the type of the column, but it doesn't verify that other annotations are set properly. For example, if the default condition were removed or changed to some other value, the provider would not detect that.

The current implementation is far from ideal, and I'll be the first to admit that, but I don't think this is the right direction to start addressing this. To better support this, I would propose the following pathway.

  1. Define a struct that captures a column's definition, including all attributes. It's OK if this struct doesn't include everything from the start, but it needs to be easily extensible for new attributes.
  2. Implement methods that deserialize the TF config and the Tables API response into this type.
  3. Define a struct that captures the difference between the existing and configured type.
  4. Define a method to compute this difference given the existing and configured type.
  5. Implement a method to convert a given difference into a set of SQL statements that alter the column to achieve the configured type.

I don't know how easy this will be, as I don't know if there is a complete reference for type_json. https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-syntax-aux-describe-table is an initial point, but I think there are many fields that are not documented here (such as a metadata field which includes the default values).

Thanks @mgyucht and @alexott for thinking deeply about this. I've starting getting deeper into an implementation and it comes with many complexities. Mainly around parsing the user provided type string into a common structure with the get table API structure. I wonder if it makes sense to add additional keys to the resource for default values, precision, scales, and interval types as to match with the get api? An example would the look like

column {
  name     = "updated_at"
  type     = "timestamp"
  default  = "current_timestamp()"
  comment  = ""
  nullable = false
}
column {
  name           = "value"
  type           = "decimal"
  type_precision = 10
  type_scale     = 0
  comment        = ""
  nullable       = false
}

There would be some considerations on parsing the type_json metadata but this would make handling the user input much easier. What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ISSUE] Issue with databricks_sql_table resource
3 participants