-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Maven update-only execution taking over 1 hour to process 126,026 records #7628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Using an external database for the initial data load can take a long time. However, once it is setup you should be able to update it daily without a large processing time. |
OK, thanks @jeremylong. Can you think of anything that can be changed to increase performance? 100,000 doesn't seem like much to persist so I presume more is happening than just read API response data -> store in DB. |
Possibly tinkering with the indexes? I haven't really spent much time with performance on the external databases. the process does a lot of lookup and then insert if it isn't found. |
@andrewnewsome As a side note: 100k is maybe not that much, but it's approximately 1/2-1/3 of the entire NVD listings that got a refresh (total NVD is around 300k entries), so that partly explains why this update looked more like a fresh load-from-scratch from timing perspective. |
Thanks @aikebah. Is it the download of the definitions from the NVD API which is expected to be the time consuming part? I understand this can be the case, especially if no NVD API key is used. In my case, I'm using a key and from the logs it appears the download takes around 2 mins - it is the processing of the data that is taking an hour - is this to be expected? At this moment, I cannot rule out a slow connection to the DB, but I'd hope not given the region/location settings. |
@andrewnewsome not the downloads (those all happen first, when the logs say '100%' after the two minutes the download has completed). It's the processing of the large amount of entries that takes the time due to a lot of interactions with the DB. More prominent so with a remote DB server, due to added latencies, compared to the local processing with h2 database. |
Hello @aikebah. I've attempted to investigate this more, and it does look like it is just the sheer amount of DB commands that are causing the delay, along with the latency communicating with the DB. I clearly observed reasonable timings when the postgres database was deployed locally to the Maven process. I daresay that some performance improvements could be made by batching 'inserts', but I'm not sure how much of an improvement that would cause. I did, however, notice that the library is using the |
Hello. Firstly, apologies if this has already been covered in another ticket.
Secondly, I just wanted to express my gratitude for providing this awesome tool allowing me to perform vulnerability checks within a Maven pipeline.
Finally - to my question.
I have a GitHub actions workflow which runs the Maven
org.owasp:dependency-check-maven:update-only
goal, with the data being persisted in a GCP hosted Postgres database. The GitHub runner is also a GCP VM instance running in the same region as the database.The execution is taking 2 mins to download the NVD API data, but then over 1 hour to process - please see the log below.
Admittedly the DB's resources are on the low side, but I can see from the GCP database metrics that CPU utilization never exceeds 25% usage, and memory doesn't appear to exceed 50% usage.
Can you please explain why it could be taking so long to process the downloaded data?
Thanks.
UPDATE: Here is the Maven
plugin
configuration:The text was updated successfully, but these errors were encountered: