Skip to content

Maven update-only execution taking over 1 hour to process 126,026 records #7628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrewnewsome opened this issue May 4, 2025 · 7 comments
Labels

Comments

@andrewnewsome
Copy link

andrewnewsome commented May 4, 2025

Hello. Firstly, apologies if this has already been covered in another ticket.
Secondly, I just wanted to express my gratitude for providing this awesome tool allowing me to perform vulnerability checks within a Maven pipeline.

Finally - to my question.
I have a GitHub actions workflow which runs the Maven org.owasp:dependency-check-maven:update-only goal, with the data being persisted in a GCP hosted Postgres database. The GitHub runner is also a GCP VM instance running in the same region as the database.

The execution is taking 2 mins to download the NVD API data, but then over 1 hour to process - please see the log below.

Admittedly the DB's resources are on the low side, but I can see from the GCP database metrics that CPU utilization never exceeds 25% usage, and memory doesn't appear to exceed 50% usage.

Can you please explain why it could be taking so long to process the downloaded data?
Thanks.

Sun, 04 May 2025 09:03:43 GMT
[INFO] NVD API has 126,026 records in this update
Sun, 04 May 2025 09:03:50 GMT
[INFO] Downloaded 10,000/126,026 (8%)
...removed for brevity....
Sun, 04 May 2025 09:05:51 GMT
[INFO] Downloaded 126,026/126,026 (100%)
Sun, 04 May 2025 09:05:51 GMT
[INFO] Completed processing batch 1/64 (2%) in 10,358ms
Sun, 04 May 2025 09:05:51 GMT
[INFO] Completed processing batch 2/64 (3%) in 15,648ms
Sun, 04 May 2025 09:05:51 GMT
[INFO] Completed processing batch 3/64 (5%) in 23,817ms
Sun, 04 May 2025 09:05:51 GMT
...removed for brevity....
Sun, 04 May 2025 10:16:03 GMT
[INFO] Completed processing batch 64/64 (100%) in 4,210,792ms

UPDATE: Here is the Maven plugin configuration:

             <plugin>
                <groupId>org.owasp</groupId>
                <artifactId>dependency-check-maven</artifactId>
                <version>12.1.1</version>
                <dependencies>
                    <dependency>
                        <groupId>org.postgresql</groupId>
                        <artifactId>postgresql</artifactId>
                        <version>42.7.5</version>
                    </dependency>
                </dependencies>
                <configuration>
                    <databaseDriverName>org.postgresql.Driver</databaseDriverName>
                    <connectionString>${owasp.jdbc.url}</connectionString>
                    <databaseUser>${env.DB_USER}</databaseUser>
                    <databasePassword>${env.DB_PASSWORD}</databasePassword>
                    <autoUpdate>false</autoUpdate>
                    <nvdApiKey>${env.NVD_API_KEY}</nvdApiKey>
                    <nvdApiDelay>8000</nvdApiDelay>
                </configuration>
            </plugin>
@jeremylong
Copy link
Collaborator

Using an external database for the initial data load can take a long time. However, once it is setup you should be able to update it daily without a large processing time.

@andrewnewsome
Copy link
Author

OK, thanks @jeremylong. Can you think of anything that can be changed to increase performance? 100,000 doesn't seem like much to persist so I presume more is happening than just read API response data -> store in DB.

@jeremylong
Copy link
Collaborator

Possibly tinkering with the indexes? I haven't really spent much time with performance on the external databases. the process does a lot of lookup and then insert if it isn't found.

@aikebah
Copy link
Collaborator

aikebah commented May 4, 2025

@andrewnewsome As a side note: 100k is maybe not that much, but it's approximately 1/2-1/3 of the entire NVD listings that got a refresh (total NVD is around 300k entries), so that partly explains why this update looked more like a fresh load-from-scratch from timing perspective.

@andrewnewsome
Copy link
Author

Thanks @aikebah. Is it the download of the definitions from the NVD API which is expected to be the time consuming part? I understand this can be the case, especially if no NVD API key is used. In my case, I'm using a key and from the logs it appears the download takes around 2 mins - it is the processing of the data that is taking an hour - is this to be expected?

At this moment, I cannot rule out a slow connection to the DB, but I'd hope not given the region/location settings.
I'll test with a local DB, and compare performance. Thanks

@aikebah
Copy link
Collaborator

aikebah commented May 5, 2025

@andrewnewsome not the downloads (those all happen first, when the logs say '100%' after the two minutes the download has completed). It's the processing of the large amount of entries that takes the time due to a lot of interactions with the DB. More prominent so with a remote DB server, due to added latencies, compared to the local processing with h2 database.

@andrewnewsome
Copy link
Author

Hello @aikebah. I've attempted to investigate this more, and it does look like it is just the sheer amount of DB commands that are causing the delay, along with the latency communicating with the DB. I clearly observed reasonable timings when the postgres database was deployed locally to the Maven process. I daresay that some performance improvements could be made by batching 'inserts', but I'm not sure how much of an improvement that would cause.

I did, however, notice that the library is using the apache-commons-dbcp2 library for database connection pooling, and I think using the HikariCP library instead may offer better performance. Just a suggestion. Please let me know if I can help at all with that. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants