Skip to content

ClassNotFoundException using PL/Java 1.6.8 on Postgres 17.2 #523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
daddeo opened this issue Apr 4, 2025 · 8 comments
Open

ClassNotFoundException using PL/Java 1.6.8 on Postgres 17.2 #523

daddeo opened this issue Apr 4, 2025 · 8 comments

Comments

@daddeo
Copy link

daddeo commented Apr 4, 2025

We have been using PL/Java for a couple of years now and created our extension initially with 1.5.6 on Postgres 13. Over the years we have migrated to Postgres 15 then 16 and now 17.2. Until recently, we haven't had any serious issues. Our extension is fairly simple and provides XML and JSON validation services to SQL for generated data from business processes. We use install_jar to create entries in SQLJ for our extension JAR and GSON 2.10.1. We run set_classpath for each of our client schemas.

This has all worked well until 17.2 (using PL/Java 1.6.8). Now we are getting ClassNotFoundException on a static class deep within the extension inner workings.

Our SQL calls runPublishValidation(3 parameters) which uses PL/Java to call Publish.validatte(...). Within the extension Publish.validate creates an instance of the non-static class Validation and calls it's run() method. The run() method performs 10 distinct validation checks, think of them as steps. There are several static helper classes for XML and JSON handling, think of them as utility classes. These static utility classes are used extensively in each of the 10 steps.

What we are seeing is that on the 10th step, PL/Java is reporting ClassNotFoundException on Helpers.processSubstitution(). Again, this class method will have been called dozens of times prior to the 10th step invocation.

This is the call stack I get:

25 Mar 25 14:28:16 org.postgresql.pljava.sqlj.Loader Failed to load class
java.sql.SQLException: An attempt was made to call a PostgreSQL backend function after an elog(ERROR) had been issued
	at [email protected]/org.postgresql.pljava.internal.Oid._forSqlType(Native Method)
	at [email protected]/org.postgresql.pljava.internal.Oid.lambda$forSqlType$1(Oid.java:68)
	at [email protected]/org.postgresql.pljava.internal.Backend.doInPG(Backend.java:89)
	at [email protected]/org.postgresql.pljava.internal.Oid.forSqlType(Oid.java:68)
	at [email protected]/org.postgresql.pljava.jdbc.SPIPreparedStatement.setObject(SPIPreparedStatement.java:238)
	at [email protected]/org.postgresql.pljava.jdbc.SPIPreparedStatement.setObject(SPIPreparedStatement.java:223)
	at [email protected]/org.postgresql.pljava.jdbc.SPIPreparedStatement.setInt(SPIPreparedStatement.java:124)
	at [email protected]/org.postgresql.pljava.sqlj.Loader.lambda$findClass$5(Loader.java:462)
	at [email protected]/org.postgresql.pljava.sqlj.Loader.findClass(Loader.java:464)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
	at schema:dimbuild_sprvl//com.iri.udb.pljava.publish.Validation.checkAttrSpecMeta(Validation.java:211)
	at schema:dimbuild_sprvl//com.iri.udb.pljava.publish.Validation.run(Validation.java:159)
	at schema:dimbuild_sprvl//com.iri.udb.pljava.Publish.validate(Publish.java:75)
	at schema:dimbuild_sprvl//com.iri.udb.pljava.Publish.validate(Publish.java:48)
	at [email protected]/org.postgresql.pljava.internal.EntryPoints.lambda$invocable$0(EntryPoints.java:130)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
	at [email protected]/org.postgresql.pljava.internal.EntryPoints.doPrivilegedAndUnwrap(EntryPoints.java:312)
	at [email protected]/org.postgresql.pljava.internal.EntryPoints.invoke(EntryPoints.java:158)

Exception mv_spec_extract_data ERROR: java.lang.NoClassDefFoundError: com/iri/udb/pljava/common/Helpers

We have tried many different approaches to narrow down what is happening from pljava setting changes in the postgresql.conf file to flatten away the static class into the Validation class implementation to use Class.forName to dynamically load the class. Nothing helped. Then we decided to add -Djava.class.path={our jars} to pljava.vmoptions and this seems to have corrected the problem, although introduces another problem for us. We run in many different environments, on many database servers and this will be a maintenance and setup nightmare for us. We like using the SQLJ tables to manage the distribution of the extension.

As this point, I will stop typing and let you digest (and/or figure out) what I've said and ask questions.

@jcflack
Copy link
Contributor

jcflack commented Apr 5, 2025

Goodness, that's a lot to digest, and it sounds frustrating.

The best thing about it, from your description, is that it seems to be deterministic and repeatable, you can count on seeing it on the tenth step.

On first impression, it looks not so much that findClass is failing, as that something already failed and didn't get properly recovered: the error message An attempt was made to call a PostgreSQL backend function after an elog(ERROR) had been issued is the giveaway.

The sqlj storage for classes is convenient, but it is kind of at the mercy of PostgreSQL when there has been an error during a transaction. PostgreSQL is simply not going to allow any more queries via SPI, even innocent read-only ones like PL/Java trying to load a class, from the moment of that error to the moment the transaction is fully rolled back (unless there was a savepoint; a PL/Java savepoint can be rolled back and counts as recovering from an error, so queries can be made again).

It can get ugly if the code involved in the error recovery and savepoint rollback happens to need a class loaded.

So it seems that a first mystery to investigate would be: what was the PostgreSQL error that had already been ereported, causing the SPI-locked-out state that was then the cause of the exception you later got.

If that earlier error was caught in a Java catch block and has not yet had a chance to bubble out and be logged, connecting a debugger might be the best way to catch it. Are you compiling the Java code with -g?

Another thing worth remembering is that NoClassDefFoundError is not the normal thing thrown when a class you're looking for isn't found by the loader. If you're looking for Foo and the loader can't find it, you get ClassNotFoundException for Foo.

NoClassDefFoundErroris thrown during linkage, if a class got successfully loaded but some other class mentioned in its constant pool isn't getting resolved. (In this case, that might only be because of some other error that locked out queries.)

@daddeo
Copy link
Author

daddeo commented Apr 7, 2025

I do realize that this is a complex problem and I potentially overloaded you on my initial statement, but we do also get ClassNotFoundException for the same static class Helpers. It seems to vary which one we get.

I tried "flattening" the Helpers class directly into the Validation class as member functions and then the error moves to reporting it cannot find GSON classes.

I have also tried to call the Helpers method dynamically using Class.forName() and then clazz.processSubstitutions(...), but did not change the outcome.

I have pulled the SQL being executed to a vary small CTE to feed parameter 1 and 2 to the Publish.validate method, such that I wholly control the execution. I run it from DBeaver and get ClassNotFoundException, but as soon as I add my jars to the vmoptions through -Djava.class.path= then it all works.

I feel like I am missing something obvious or simple that i've overlooked or blinded myself to, but what?

@jcflack
Copy link
Contributor

jcflack commented Apr 7, 2025

PL/Java's classloader is failing at that point because an earlier PostgreSQL error has not been fully handled, leaving PostgreSQL internals in an unusable state until a rollback of the transaction or of a savepoint, which has not yet occurred. There is nothing PL/Java's classloader can or could do to succeed in that situation.

I have had to solve exactly similar issues in the past. The key is always to find out what earlier PostgreSQL error got raised and has not been handled yet. (I'm not saying that preceding error can't also turn out to be PL/Java's fault. Sometimes it does. But the key is always to first find out what that error was. Once that is known, the picture often comes together quickly.)

If that earlier error was caught in a Java catch block and has not yet had a chance to bubble out and be logged, connecting a debugger might be the best way to catch it. Are you compiling the Java code with -g?

@jcflack
Copy link
Contributor

jcflack commented Apr 7, 2025

If you are running 1.6.8, good places to set a debugger breakpoint will be:

With a C debugger such as gdb, use a breakpoint on Exception.c line 174, and use where or bt to get a stack trace when the breakpoint is reached.

With a Java debugger such as jdb, use a breakpoint on org.postgresql.pljava.internal.ServerException line 30. When the breakpoint is reached, you can use where to get a stack trace, and dump errorData to see what is known about the error.

More information on how to attach a Java debugger to a running PostgreSQL backend is here.

Once we have that information on whatever earlier error is getting in the way, it will be a lot clearer where to go next.

@jcflack
Copy link
Contributor

jcflack commented Apr 8, 2025

Hi,

Do you think you will have time to get any stack traces for me to look at?

@daddeo
Copy link
Author

daddeo commented Apr 9, 2025

Hi, not ignoring you, been really busy with fires at work and I am trying to get permissions for a replica postgres server with my client schema and rights. Corporate red tape is making this much harder for me. We are not allowed many rights on the DB hosts and I'm not entirely sure I will be able to even attach a debugger after all this. I am trying though, but it's a slog.

In the meantime, do you have any further suggestions that I might do aside from attaching? I was looking at potentially using logback to see what i can get out of it.

@jcflack
Copy link
Contributor

jcflack commented Apr 9, 2025

I understand.

Do you think the problem could be replicable in a PostgreSQL instance with your software but different data, or fake data, or not all your data?

I routinely use a test harness (like this one that PL/Java includes, which is based on this one that PostgreSQL includes) to just spin up PostgreSQL as me on some random box without making a permanent installation or using privileged ports or any of that.

Could that be bureaucratically simpler?

I do see here that if you set the logging level to DEBUG2 you should at least get a message with the name of the function where the earlier error happens. Not as good as a stack trace (and maybe even completely unenlightening without a stack trace) but at least it's something to do.

@jcflack
Copy link
Contributor

jcflack commented Apr 28, 2025

Any news?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants