Skip to content

Improve randomness uniformity #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2017
Merged

Conversation

p-salido
Copy link

Small improvement for #58

lib/rollout.rb Outdated
@@ -88,7 +88,7 @@ def id_user_by
end

def user_in_percentage?(user)
Zlib.crc32(user_id_for_percentage(user)) % 100_000 < @percentage * 1_000
Zlib.crc32(user_id_for_percentage(user)) < (2**32 - 1) / 100.0 * @percentage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extract (2**32 - 1) / 100.0 into the constant? It'd be very inefficient to calculate this value on each request.

@p-salido
Copy link
Author

@reneklacan added

@reneklacan
Copy link
Contributor

I used following simple dummy script to get an approximate improvement in accuracy

require 'zlib'

RAND_BASE = (2**32 - 1) / 100.0

def impl_new(user_id, percentage)
  Zlib.crc32(user_id.to_s) < RAND_BASE * percentage
end

def impl_old(user_id, percentage)
  Zlib.crc32(user_id.to_s) % 100_000 < percentage * 1_000
end

COUNT = 1_000_000

dataset_1 = (1..COUNT).to_a
dataset_2 = COUNT.times.inject([7]) { |memo, i| memo + [memo[-1] + 7] }
dataset_3 = COUNT.times.inject([7]) { |memo, i| memo + [memo[-1] + rand(1..30)] }

test_cases = [
  { user_ids: dataset_1, percentage: 25 },
  { user_ids: dataset_1, percentage: 50 },
  { user_ids: dataset_1, percentage: 75 },
  { user_ids: dataset_2, percentage: 17 },
  { user_ids: dataset_2, percentage: 39 },
  { user_ids: dataset_2, percentage: 71 },
  { user_ids: dataset_3, percentage: 21 },
  { user_ids: dataset_3, percentage: 47 },
  { user_ids: dataset_3, percentage: 82 },
]

errors_old = []
errors_new = []

test_cases.each_with_index do |test_case, i|
  user_ids = test_case.fetch(:user_ids)
  percentage = test_case.fetch(:percentage)

  real_percentage_old = user_ids.select { |uid| impl_old(uid, percentage) }.count.to_f / user_ids.count * 100
  real_percentage_new = user_ids.select { |uid| impl_new(uid, percentage) }.count.to_f / user_ids.count * 100

  errors_old << (real_percentage_old - percentage).abs
  errors_new << (real_percentage_new - percentage).abs

  puts
  puts "Test Case #{i}:"
  puts "- Expected: #{percentage}"
  puts "- Real Old: #{real_percentage_old}"
  puts "- Real New: #{real_percentage_new}"
end

puts
puts "---"

puts
puts "Average Old Error: #{errors_old.reduce(&:+)/errors_old.count}"
puts "Average New Error: #{errors_new.reduce(&:+)/errors_new.count}"

puts
puts "Total Old Error: #{errors_old.reduce(&:+)}"
puts "Total New Error: #{errors_new.reduce(&:+)}"

And the output is:

Test Case 0:
- Expected: 25
- Real Old: 24.9711
- Real New: 24.9995

Test Case 1:
- Expected: 50
- Real Old: 49.9973
- Real New: 49.9989

Test Case 2:
- Expected: 75
- Real Old: 75.00489999999999
- Real New: 74.9995

Test Case 3:
- Expected: 17
- Real Old: 16.931783068216934
- Real New: 17.000182999817

Test Case 4:
- Expected: 39
- Real Old: 38.93256106743893
- Real New: 39.003360996639

Test Case 5:
- Expected: 71
- Real Old: 70.92442907557093
- Real New: 71.01422898577101

Test Case 6:
- Expected: 21
- Real Old: 20.985379014620985
- Real New: 21.039978960021042

Test Case 7:
- Expected: 47
- Real Old: 47.019352980647014
- Real New: 47.02295297704702

Test Case 8:
- Expected: 82
- Real Old: 82.01721798278201
- Real New: 82.03641796358202

---

Average Old Error: 0.03321319306458213
Average New Error: 0.013246986986344843

Total Old Error: 0.2989187375812392
Total New Error: 0.11922288287710359

So quite a nice improvement.

@reneklacan
Copy link
Contributor

@p-salido Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants