Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3985: [JAVA] adding default serializable pacakge to trusted package list #3333

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

abhilakshyadobhal
Copy link

@abhilakshyadobhal abhilakshyadobhal commented Mar 8, 2025

What is the purpose of the change

With this change, the default packages will always be going to be part of trusted packages.

What is the fix?

  • Static Block Enhancement:
    The static block has been updated to always include both user-defined packages (if any) and the default packages (java.lang,java.math,java.io,java.net,org.apache.avro.reflect), while removing duplicate entries.

  • Constructor Updates:
    Constructors in SpecificDatumReader now call initializeTrustedPackages(), ensuring that every instance is correctly populated with the trusted packages.

Verifying this change

This change is already covered by existing tests, such as (please describe tests).

Documentation

  • Feature Impact:
    This commit is a bug fix for the package initialization in SpecificDatumReader and does not introduce any new features.

  • Code Documentation:
    Code comments, especially in the initializeTrustedPackages() method, have been updated to clearly explain the logic behind the inclusion of both user-defined and default packages.

JIRA Ticket

@github-actions github-actions bot added the Java Pull Requests for Java binding label Mar 8, 2025
@martin-g martin-g requested a review from jbonofre March 8, 2025 13:56
@KalleOlaviNiemitalo
Copy link
Contributor

KalleOlaviNiemitalo commented Mar 8, 2025

Previously, if the org.apache.avro.SERIALIZABLE_PACKAGES system property was set as "*", then only that was added to SERIALIZABLE_PACKAGES and to trustedPackages, and trustAllPackages() saw that trustedPackages was a one-element list containing only "*" so it returned true.

Now with this PR, trustedPackages will contain the default trusted packages as well, so it won't be a one-element list, and trustAllPackages() will return false.

That's why I think this should not be merged as is.

Is it possible to add a test case for that regression? I'm not sure how to isolate system property changes from other tests running in parallel.

@KalleOlaviNiemitalo
Copy link
Contributor

HADOOP-19315 "Bump avro from 1.9.2 to 1.11.4" was already marked as fixed. If an Avro change is still needed then perhaps that should be a separate AVRO issue in Jira.

* and any duplicate entries are avoided.</p>
*
* <p>Before adding the packages, the list is cleared to prevent duplicate entries if this method is invoked
* multiple times, ensuring that the list remains consistent and up-to-date across all instances.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "up-to-date across all instances" mean here?

  • AFAICS, SERIALIZABLE_PACKAGES is not modified after it has been initialised, so there is no question of the trustedPackages list being "up-to-date" with SERIALIZABLE_PACKAGES.
  • As the method only modified the trustedPackages list in one instance, the "across all instances" wording seems misleading.

Out of curiosity, were parts of the PR description and the javadoc comment created using generative AI? The Apache Software Foundation publishes some guidelines for that at ASF GENERATIVE TOOLING GUIDANCE.

@KalleOlaviNiemitalo
Copy link
Contributor

#3330 fixes AVRO-3985 so that trustedPackages is initialized no matter which constructor of SpecificDatumReader is used. So if that pull request is merged first, then the only user-visible change remaining in this #3333 will be that the default serializable packages are always included. At that point, this PR should be retitled.

Copy link
Contributor

@KalleOlaviNiemitalo KalleOlaviNiemitalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for #3330 to be merged first, which has an issue in the AVRO project.

Delete the initializeTrustedPackages() calls from those constructors that call this(…).

Retitle this pull request when it no longer adds initializeTrustedPackages() calls in constructors due to the changes above.

Fix the org.apache.avro.SERIALIZABLE_PACKAGES = * case so that trustAllPackages() returns true again. Add a test for that if possible.

}

private final List<String> trustedPackages = new ArrayList<>();

public SpecificDatumReader() {
this(null, null, SpecificData.get());
initializeTrustedPackages();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling initializeTrustedPackages() here is not necessary because this constructor calls another constructor of the same class this(null, null, SpecificData.get()) and that constructor will call initializeTrustedPackages(). It should be called only after the super constructor.

This PR makes similar unnecessary changes to other delegating constructors too.

Copy link
Author

@abhilakshyadobhal abhilakshyadobhal Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the PR, and will rebase once the other change #3330 gets merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KalleOlaviNiemitalo
The PR #3330 has been merged and I have rebased this branch with the main. Can you review the changes again.

@abhilakshyadobhal abhilakshyadobhal changed the title HADOOP-19315: [Java] Ensure trustedPackages is initialized in all constructors of SpecificDatumReader AVRO-3985: [JAVA] adding default serializable pacakge to trusted package list Mar 11, 2025
…icDatumReader

This commit addresses an issue in Avro's SpecificDatumReader where the trustedPackages list was not properly
initialized in every constructor. Although the issue is tracked under HADOOP-19315 in the Hadoop JIRA board,
the underlying problem exists in Avro.

Changes include:
- Modifying the static initialization block to combine both user-defined packages from the
  org.apache.avro.SERIALIZABLE_PACKAGES system property and the default packages
  ("java.lang,java.math,java.io,java.net,org.apache.avro.reflect"), ensuring that duplicates are removed.
- Updating the initializeTrustedPackages() method to call clear() before adding packages, preventing duplicate entries.
- Ensuring that initializeTrustedPackages() is invoked in every constructor of SpecificDatumReader so that each instance
  always has the correct trustedPackages list.

This improvement ensures that custom packages specified via the system property are correctly recognized as trusted,
preventing security errors during deserialization.
"java.lang,java.math,java.io,java.net,org.apache.avro.reflect").split(",");
String defaultPackages = "java.lang,java.math,java.io,java.net,org.apache.avro.reflect";

String userDefinedPackages = System.getProperty("org.apache.avro.SERIALIZABLE_PACKAGES", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply returning defaultPackages in default value of System.getProperty() ?

If the user defines his own value, maybe it would be surprising to always add defaultPackages.

Mabye more "logic" to have defaultPackages by default, but not if the user defines his own packages.

Copy link
Author

@abhilakshyadobhal abhilakshyadobhal Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's noted that we can do that.
Even though Avro already knows some default packages that are considered trustable, if a user defines custom packages and forgets to include these default ones, it may lead to errors when deserializing objects that rely on them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your point, but if the user doesn't want the default packages (for security reasons), he can't really remove these packages from the trusted ones.
Maybe more a warn message or documentation ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jbonofre I have updated the PR, pls check if it looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks good to me !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants