Skip to content

Connection problems to Oracle in tests when another db is used in another resource first #46542

Open
@holly-cummins

Description

@holly-cummins

Describe the bug

See discussion in https://quarkusio.zulipchat.com/#narrow/channel/187038-dev/topic/Cannot.20connect.20to.20Oracle.20DB.20in.20Otel.20JDBC.20Instrumentation.

This is a future bug - I've created it with #34681. I don't think it should block that work item, so creating an issue to track things.

The problem is an interaction between #34681 and something merged between February 7th and February 17th.

To reproduce

This needs #34681.
I've worked around the issue, so to recreate it, remove

@TestProfile(OracleOpenTelemetryJdbcInstrumentationTest.SomeProfile.class)

from OracleOpenTelemetryJdbcInstrumentationTest.

It may also be necessary to set @Order(1) on the postgres test and a higher order on the oracle test, and run with

QUARKUS_TEST_CLASS_ORDERER='org.junit.jupiter.api.ClassOrderer$OrderAnnotation'  mvn -Dno-test-kubernetes  -Dno-build-cache  -Dtest-containers -Dstart-containers -f integration-tests/opentelemetry-jdbc-instrumentation clean install

What's going on?

We see connection failures to Oracle, if the test that talks to Oracle runs after a test that uses MariaDb or Postgresql.

 11:41:54.971 Oracle Database:2025-02-27T11:41:54.615234+00:00
11:41:54.971 Oracle Database:Fatal NI connect error 12637 [Time : 27-FEB-2025 11:41:54] [NS errors [12637:TNS-12637: Packet receive failed] 12532] [NT errors [0 0] 0] [Oracle errors [0 ] 0] [Connecting to: (ADDRESS=(PROTOCOL=tcp)(HOST=172.17.0.1)(PORT=51374))(service_name=freepdb1)(connection_id=8yPPIJx9RumMukcbl2X5kA==)] [PID: 225]
11:41:54.971 Oracle Database:2025-02-27T11:41:54.615993+00:00
11:41:54.971 Oracle Database:opiodr aborting process unknown ospid (225) as a result of ORA-609

which then leads to uglier errors like

Caused by: java.sql.SQLRecoverableException: ORA-17002: I/O error: java.lang.NoClassDefFoundError: Could not initialize class oracle.net.nt.Clock, connect lapse 1 ms., Authentication lapse 0 ms.

The test classloading rewrite runs augmentation, and creates the runtime classloader, eagerly, during test discovery. So it seems like a problem happens if augmentation happens, and then a test runs with oracle disabled, and then a test runs with oracle enabled. It's almost as though some state 'sticks' during the first run, at a classloader level above the Runtime classloader.

Working around the problem

There are three ways to bypass the issue, which give some clues about the nature of the problem:

  • Use @Order to force the oracle test to run first
  • Set a @TestProfile on the oracle test. This means it gets a whole new curated application and doesn't share anything with the other db's tests
  • On the PostgreSqlLifecycleManager, set quarkus.hibernate-orm.oracle.active and add in the config for the oracle datasource. It's surprising that this works, and it does suggest that some state is sticking somewhere where it shouldn't.

Since the problem seems to be some state leaking from higher classloaders into the runtimeclassloader through a restart, I tried reproducing in dev mode, but connection to oracle dbs worked perfectly after config changes.

Changes it isn't

I've hand-reverted likely looking changes in the change window, but haven't found the one that exposed the problem. It isn't

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agroalarea/jdbcIssues related to the JDBC extensionskind/bugSomething isn't working

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions