-
Notifications
You must be signed in to change notification settings - Fork 212
Improve randomness uniformity #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/rollout.rb
Outdated
@@ -88,7 +88,7 @@ def id_user_by | |||
end | |||
|
|||
def user_in_percentage?(user) | |||
Zlib.crc32(user_id_for_percentage(user)) % 100_000 < @percentage * 1_000 | |||
Zlib.crc32(user_id_for_percentage(user)) < (2**32 - 1) / 100.0 * @percentage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extract (2**32 - 1) / 100.0
into the constant? It'd be very inefficient to calculate this value on each request.
@reneklacan added |
I used following simple dummy script to get an approximate improvement in accuracy require 'zlib'
RAND_BASE = (2**32 - 1) / 100.0
def impl_new(user_id, percentage)
Zlib.crc32(user_id.to_s) < RAND_BASE * percentage
end
def impl_old(user_id, percentage)
Zlib.crc32(user_id.to_s) % 100_000 < percentage * 1_000
end
COUNT = 1_000_000
dataset_1 = (1..COUNT).to_a
dataset_2 = COUNT.times.inject([7]) { |memo, i| memo + [memo[-1] + 7] }
dataset_3 = COUNT.times.inject([7]) { |memo, i| memo + [memo[-1] + rand(1..30)] }
test_cases = [
{ user_ids: dataset_1, percentage: 25 },
{ user_ids: dataset_1, percentage: 50 },
{ user_ids: dataset_1, percentage: 75 },
{ user_ids: dataset_2, percentage: 17 },
{ user_ids: dataset_2, percentage: 39 },
{ user_ids: dataset_2, percentage: 71 },
{ user_ids: dataset_3, percentage: 21 },
{ user_ids: dataset_3, percentage: 47 },
{ user_ids: dataset_3, percentage: 82 },
]
errors_old = []
errors_new = []
test_cases.each_with_index do |test_case, i|
user_ids = test_case.fetch(:user_ids)
percentage = test_case.fetch(:percentage)
real_percentage_old = user_ids.select { |uid| impl_old(uid, percentage) }.count.to_f / user_ids.count * 100
real_percentage_new = user_ids.select { |uid| impl_new(uid, percentage) }.count.to_f / user_ids.count * 100
errors_old << (real_percentage_old - percentage).abs
errors_new << (real_percentage_new - percentage).abs
puts
puts "Test Case #{i}:"
puts "- Expected: #{percentage}"
puts "- Real Old: #{real_percentage_old}"
puts "- Real New: #{real_percentage_new}"
end
puts
puts "---"
puts
puts "Average Old Error: #{errors_old.reduce(&:+)/errors_old.count}"
puts "Average New Error: #{errors_new.reduce(&:+)/errors_new.count}"
puts
puts "Total Old Error: #{errors_old.reduce(&:+)}"
puts "Total New Error: #{errors_new.reduce(&:+)}" And the output is:
So quite a nice improvement. |
@p-salido Thank you for your contribution! |
Small improvement for #58