Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: spannerMetadataTableName missing from Dataflow Jobs UI #2301

Open
oulin-coder opened this issue Apr 3, 2025 · 0 comments
Open

[Bug]: spannerMetadataTableName missing from Dataflow Jobs UI #2301

oulin-coder opened this issue Apr 3, 2025 · 0 comments
Labels
bug Something isn't working needs triage p2

Comments

@oulin-coder
Copy link

Related Template(s)

SpannerChangeStreamsToBigQuery

Template Version

v2

What happened?

I'm not sure if this is the right place to file this bug, but here's our situation:

We use a custom version of the SpannerChangeStreamsToBigQuery template (which we updated to support null primary keys and also a few other data types like FLOAT32). Recently we started having issues making in-place updates to running jobs due to new updates being incompatible (we updated apache-beam to 2.63.0 because of a warning in the Dataflow UI that our previous version, 2.54.0, is deprecated). So we downed the existing job and restarted a replacement job.

Previously, our job shows the newly created spannerMetadataTableName in the Dataflow Jobs UI under Job Info (screenshot). However, the newly created replacement job does not show this parameter (screenshot). We also tried running gcloud dataflow jobs describe <job id> --full, but it's not in the response either.

We finally managed to find the metadata table name by digging through our logs and finding a Spanner audit log for creating a table prefixed with "Metadata_dataflow_metadata_" (see log output for full log) around the time when we started the new job (with no connection to the Dataflow job name or job ID). Given that we need this metadata table name for future in-place updates of the job, this seems prohibitively difficult.

The fact that the metadata table used to show up in Dataflow Jobs UI and no longer does after upgrading apache-beam seems like a bug.

Relevant log output

{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {},
    "serviceName": "spanner.googleapis.com",
    "methodName": "google.spanner.admin.database.v1.DatabaseAdmin.UpdateDatabaseDdl",
    "resourceName": "projects/chorus-scout/instances/scout/databases/dataflow-metadata",
    "response": {
      "commitTimestamps": [
        "2025-04-02T21:01:08.291456Z",
        "2025-04-02T21:01:08.291456Z",
        "2025-04-02T21:01:08.291456Z"
      ],
      "database": "projects/chorus-scout/instances/scout/databases/dataflow-metadata",
      "statements": [
        "CREATE TABLE IF NOT EXISTS Metadata_dataflow_metadata_83683738_c4cf_4d7f_a20e_a28501031b26 (\n  PartitionToken STRING(MAX) NOT NULL,\n  ParentTokens ARRAY<STRING(MAX)> NOT NULL,\n  StartTimestamp TIMESTAMP NOT NULL,\n  EndTimestamp TIMESTAMP NOT NULL,\n  HeartbeatMillis INT64 NOT NULL,\n  State STRING(MAX) NOT NULL,\n  Watermark TIMESTAMP NOT NULL,\n  CreatedAt TIMESTAMP NOT NULL OPTIONS (\n    allow_commit_timestamp = true\n  ),\n  ScheduledAt TIMESTAMP OPTIONS (\n    allow_commit_timestamp = true\n  ),\n  RunningAt TIMESTAMP OPTIONS (\n    allow_commit_timestamp = true\n  ),\n  FinishedAt TIMESTAMP OPTIONS (\n    allow_commit_timestamp = true\n  ),\n) PRIMARY KEY(PartitionToken), ROW DELETION POLICY (OLDER_THAN(FinishedAt, INTERVAL 1 DAY))",
        "CREATE INDEX IF NOT EXISTS WatermarkIdx_dataflow_metadata_83683738_c4cf_4d7f_a20e_a2850103 ON Metadata_dataflow_metadata_83683738_c4cf_4d7f_a20e_a28501031b26(Watermark) STORING (State)",
        "CREATE INDEX IF NOT EXISTS CreatedAtIdx_dataflow_metadata_83683738_c4cf_4d7f_a20e_a2850103 ON Metadata_dataflow_metadata_83683738_c4cf_4d7f_a20e_a28501031b26(CreatedAt, StartTimestamp)"
      ],
      "@type": "type.googleapis.com/google.spanner.admin.database.v1.UpdateDatabaseDdlMetadata"
    }
  },
  "insertId": "1u2stawa0",
  "resource": {
    "type": "spanner_instance",
    "labels": {
      "instance_id": "scout",
      "instance_config": "",
      "location": "us-central1",
      "project_id": "chorus-scout"
    }
  },
  "timestamp": "2025-04-02T21:01:08.428021563Z",
  "severity": "NOTICE",
  "logName": "projects/chorus-scout/logs/cloudaudit.googleapis.com%2Factivity",
  "operation": {
    "id": "projects/chorus-scout/instances/scout/databases/dataflow-metadata/operations/r7d05288e_079a_41e3_ac46_dc91adffec0c",
    "producer": "spanner.googleapis.com",
    "last": true
  },
  "receiveTimestamp": "2025-04-02T21:01:10.355192518Z"
}
@oulin-coder oulin-coder added bug Something isn't working needs triage p2 labels Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage p2
Projects
None yet
Development

No branches or pull requests

1 participant