-
Notifications
You must be signed in to change notification settings - Fork 85
Add modeling service for abuse prevention #551
Conversation
c699155
to
2556985
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments, feel free to ignore in the name of expeditiousness.
|
||
// This is probably overkill, but it enables us to pick a different curve in | ||
// the future, if we want. | ||
degree := 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really only a comment as your degree is quite low, and probably not relevant because I believe this just ends up being a linear regression right now, but higher order polynomials could put you in a world where you're sensitive to things like weekends skewing your data, and among other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent most of yesterday playing with models. Choosing a hire degree polynomial gives us a much higher r2 value, but it causes significantly skews in the data given how small the set is.
But yea, when I did an 11th degree polynomial, it perfectly predicted all values within 5 of the curve, but then the "next" value was 14 orders of magnitude higher.
// be over at 00:00 UTC, and we don't want to generate a partial model. | ||
ys = ys[:len(ys)-1] | ||
|
||
// Reverse the list - it came in reversed because we sorted by date DESC, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanna just ask for the results ascending above? I am not enough of an SQL hacker to know if there's a query that does what you want there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're building the regression model based on the past 21 days of data, so we need the SQL query to be "the last 21 records" (order by date DESC). I don't think there's a SQL way to ORDER BY and take the last elements.
pkg/controller/modeler/modeler.go
Outdated
|
||
// Require some reasonable number of days of history before attempting to | ||
// build a model. | ||
if l := len(ys); l < 14 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are gaps?
like we actually have 14 days of data, but some of those days are 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As implemented, they are skipped. There wouldn't be a corresponding date, so there'd be no corresponding zero.
The original query I wrote actually took that into account, using generate_series
and a cross join. However, a significant spike-drop-spike severely throws off the model (e.g. 100, 0, 80), so I'd rather just exclude zeros for now during modeling.
@mikehelmick updated PTAnotherL |
Merge conflicts. |
a3606ab
to
18bd424
Compare
Rebased @icco |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: icco, sethvargo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Part of GH-534
Release Note