Skip to content

add system config options for success / failure levels to write blocking / non-blocking activations to db #5169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bdoyle0182
Copy link
Contributor

Description

This PR adds an additional config option to allow turning off writes to the db for successful non-blocking activations. Our use case does not have any need for writes for any activations in the success case anymore and this will greatly help us scale. We don't use sequential actions, have external logging, use events topic for activation metrics, and write data to a db for an activation off our consumers for the completed* topics where needed.

This is a follow up to #4881 where this was implemented but only allowed configuration for blocking activations.

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@bdoyle0182 bdoyle0182 closed this Nov 29, 2021
@bdoyle0182 bdoyle0182 reopened this Nov 29, 2021
@bdoyle0182 bdoyle0182 closed this Nov 29, 2021
@bdoyle0182 bdoyle0182 reopened this Nov 29, 2021
protected val unstoredLogsEnabledConfig = loadConfigOrThrow[Boolean](ConfigKeys.unstoredLogsEnabled)

/**
* Checks if an activation should be stored in database and stores it.
*
* @param activation activation to store
* @param isBlockingActivation is activation blocking
* @param disableBlockingStore do not store activation if successful and blocking
* @param disableNonBlockingStore do not store activation if successful and non-blocking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Widening the signature here this way strikes me as confusing - couldn't this be better done with an enum: always store, never store, store for non blocking only, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question would be how could you do that without breaking the config that already exists for disableBlockingStore that's presumably already being used?

The other thing is I think people still want to distinguish between successful and failures so they only want to turn off writes for successes. I'm not sure what the best solution is or what options should be offered.

I think I understand your suggestion in that there could be an enum set with values ALWAYS_STORE_SUCCESSES, STORE_BLOCKING_SUCCESSES, STORE_NONBLOCKING_SUCCESSES, STORE_NO_SUCCESSES. However I think doing this would break deployments unless we kept it as two booleans in the config as currently is in the pr and built the enum from the boolean values. But I think that defeats the purpose of setting up the enum at all.

And then on our discussion for our use case specifically with only needing specific failure types written (developer and system). I think that should be a separate check which could either be an enum or just a boolean of storeApplication Errors since I think developer and system errors both are very similar in what they mean. The divider in failures to me is really just whether it's a user application error or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two orthogonal issues - one is configuration and making that backward compatible. The other is the internal interface and implementation.

I think the distinctions you want to make suggests this alternate/simpler enum:

  • STORE_ALWAYS (default)
  • STORE_ON_FAILURE
  • STORE_NON_BLOCKING

From a configuration point of view, you can continue to accept the current configuration property (perhaps with a deprecation warning) and introduce a new configuration that maps to the enum. It would be an error to specify both configurations properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that setup. The one case that I think is confusing is the existing boolean and wanting failed blocking activations to still be stored. Because setting the current boolean means store failed blocking activations and non-blocking activations. So that falls between STORE_ON_FAILURE and STORE_NON_BLOCKING

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that STORE_ON_FAILURE applies to everything. I suppose we can have two modes for this: STORE_ON_FAILURE_BLOCKING_AND_NON_BLOCKING for sync/async and STORE_ON_FAILURE_NON_BLOCKING_ONLY (and so OK there are four again :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I have. I can't think of a scenario where someone would want a subset of non blocking activations written, but not a larger subset for blocking. I can only think of wanting a subset of blocking and wanting a larger subset for non-blocking. I'll update the pr if you're good with this

STORE_ALWAYS
STORE_NON_BLOCKING_OR_ON_BLOCKING_FAILURE
STORE_ON_FAILURE
STORE_ON_NON_APPLICATION_FAILURE

Copy link
Member

@rabbah rabbah Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STORE_NON_BLOCKING_OR_ON_BLOCKING_FAILURE and STORE_ON_FAILURE are redundant for blocking invokes, no? That is, to store all non blocking I'd use STORE_NON_BLOCKING_OR_ON_BLOCKING_FAILURE and to only store non blocking failures, I'd use STORE_ON_FAILURE and to store blocking failures, i can use either. (I think the parameters should be orthogonal.)

I like the addition of STORE_ON_NON_APPLICATION_FAILURE - It further refines the on failure mode - so this would capture system errors and uncaught exceptions in the end user code.

Another crank on your suggestions nets us here, perhaps?

STORE_ALWAYS (default)
STORE_NON_BLOCKING
STORE_FAILURES
STORE_FAILURES_NOT_APPLICATION_ERRORS

Copy link
Contributor Author

@bdoyle0182 bdoyle0182 Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea so what I was originally going to say as well is that I think we're just getting close to having a separate config for each invocation type, blocking and non-blocking. In your list, the original person that created the blocking config wanted only blocking failures and all non-blocking; which is why I had STORE_NON_BLOCKING_OR_ON_BLOCKING_FAILURE. Almost everyone would want this because not storing any non-blocking means you can't run sequences I believe, we don't have any need for that feature which is why we can turn off non-blocking. But you are correct that would mean the parameters aren't orthogonal, which is why I think we're at the point of just having two separate params with the same enum.

STORE_ALWAYS
STORE_FAILURES
STORE_FAILURES_NOT_APPLICATION_ERRORS

@rabbah rabbah closed this Nov 30, 2021
@rabbah rabbah reopened this Nov 30, 2021
@codecov-commenter
Copy link

codecov-commenter commented Nov 30, 2021

Codecov Report

Merging #5169 (666b707) into master (e172168) will increase coverage by 28.43%.
The diff coverage is 71.42%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5169       +/-   ##
===========================================
+ Coverage   44.49%   72.92%   +28.43%     
===========================================
  Files         237      238        +1     
  Lines       13922    13941       +19     
  Branches      593      588        -5     
===========================================
+ Hits         6194    10166     +3972     
+ Misses       7728     3775     -3953     
Impacted Files Coverage Δ
...nwhisk/core/database/ArtifactActivationStore.scala 92.85% <ø> (+92.85%) ⬆️
...e/elasticsearch/ElasticSearchActivationStore.scala 81.76% <ø> (+22.94%) ⬆️
...isk/core/database/memory/NoopActivationStore.scala 0.00% <0.00%> (ø)
.../openwhisk/core/loadBalancer/FPCPoolBalancer.scala 33.73% <0.00%> (-0.14%) ⬇️
...he/openwhisk/core/invoker/FPCInvokerReactive.scala 0.00% <0.00%> (ø)
...ache/openwhisk/core/database/ActivationStore.scala 78.26% <69.23%> (-14.05%) ⬇️
.../scala/org/apache/openwhisk/core/WhiskConfig.scala 95.75% <100.00%> (+0.05%) ⬆️
...openwhisk/core/database/ActivationStoreLevel.scala 100.00% <100.00%> (ø)
...rg/apache/openwhisk/core/controller/Triggers.scala 81.02% <100.00%> (+81.02%) ⬆️
...isk/core/controller/actions/PrimitiveActions.scala 86.41% <100.00%> (+86.41%) ⬆️
... and 138 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e172168...666b707. Read the comment docs.

@rabbah
Copy link
Member

rabbah commented Nov 30, 2021

I shared some feedback in slack on this PR. I can post those comments here as well.

@bdoyle0182 bdoyle0182 force-pushed the add-config-to-not-save-successful-nonblocking branch from edde4b0 to ecf5d05 Compare January 13, 2022 00:31
@bdoyle0182 bdoyle0182 changed the title add config option to disable successful non-blocking activation writes to db add system config options for success / failure levels to write blocking / non-blocking activations to db Jan 13, 2022
@bdoyle0182 bdoyle0182 closed this Jan 13, 2022
@bdoyle0182 bdoyle0182 reopened this Jan 13, 2022
@bdoyle0182 bdoyle0182 force-pushed the add-config-to-not-save-successful-nonblocking branch from ecf5d05 to 7916d90 Compare January 13, 2022 00:59
@bdoyle0182 bdoyle0182 closed this Jan 13, 2022
@bdoyle0182 bdoyle0182 reopened this Jan 13, 2022
@bdoyle0182 bdoyle0182 closed this Jan 13, 2022
@bdoyle0182 bdoyle0182 reopened this Jan 13, 2022
@style95 style95 closed this Jan 13, 2022
@style95 style95 reopened this Jan 13, 2022
Copy link
Member

@rabbah rabbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I still am not convinced we need two different logging levels for blocking vs non blocking but this doesn't preclude us from still collapsing the two flags in the future.

Also while the old setting is deprecated, it is taking precedence - is this the intent? I thought the idea was to check if the new fields are set and if so, use their values (they supersede) and if not, use the value implied by the deprecated property.

@bdoyle0182 bdoyle0182 force-pushed the add-config-to-not-save-successful-nonblocking branch from 6ea9983 to 7bbac30 Compare January 19, 2022 00:36
@bdoyle0182
Copy link
Contributor Author

@rabbah Yea I believe that's what we discussed, but I made it so the store level config is required to have. I updated it such that the deprecated config is only used if it is set to true and the new blocking config has not been changed from the default value which is STORE_ALWAYS

Brendan Doyle and others added 2 commits January 26, 2022 14:53
if (disableStoreResultConfig) ActivationStoreLevel.STORE_FAILURES else ActivationStoreLevel.STORE_ALWAYS
}
}
protected val storeNonBlockingResultLevelConfig =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also be in the try and ignored/set to false if it fails?
this lgtm in general, just a question of how long you want to tolerate the deprecated flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss a date to remove it since we're not really following consistent releases and I'll send an email announcing the plan to remove it.

I don't think it needs to be moved since the existing behavior is to fail if it doesn't exist which maybe was unnecessary originally and we don't need to keep it long

@bdoyle0182 bdoyle0182 merged commit 8be1505 into apache:master Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants