-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
Description
Results of find(VPTree(data, metric), query, r)
may include only some of the indices for which metric(query, data[i]) == r
. Exactly which subset is included varies between runs of a test script. At first I thought there might be some kind of race condition in _find_threaded
but it happens in non-threaded trees as well.
Results are always the same for a given VPTree
instance but vary between instances created with the same arguments. Seeding the RNG before construction removes this variation, so it must be due to the random shuffle of the data before building the tree.
Reproducing example:
using VPTrees
using DataStructures: counter
function build_and_find(data, metric, query, r, threaded)
tree = VPTree(data, metric, threaded=threaded)
return sort(find(tree, query, r))
end
hamming(a, b) = count_ones(xor(a, b))
query = 0b00000001
data = [
# 3 @ D == 1
0b0000000,
0b0010001,
0b0000101,
# 3 @ D == 2
0b0000100,
0b0101001,
0b0010011,
# 4 @ D == 3
0b1001000,
0b0010010,
0b0011011,
0b1100101,
# 10 @ D >= 4
0b01011000,
0b11101010,
0b11011000,
0b11001110,
0b00111100,
0b01100100,
0b11100000,
0b00111010,
0b01001010,
0b00111010,
]
print(hamming.(query, data))
[1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 6, 5, 6, 5, 4, 4, 5, 4, 5]
counter(build_and_find(data, hamming, query, 3, false) for _ in 1:10000)
DataStructures.Accumulator{Array{Int64,1},Int64} with 11 entries:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] => 7616
[1, 2, 3, 4, 5, 6, 7, 8, 10] => 232
[1, 2, 3, 4, 5, 6, 7, 10] => 83
[1, 2, 3, 4, 5, 6, 7, 9, 10] => 585
[1, 2, 3, 4, 5, 6, 8, 10] => 13
[1, 2, 3, 4, 5, 6, 7, 8] => 47
[1, 2, 3, 4, 5, 6, 7, 9] => 15
[1, 2, 3, 4, 5, 6, 7, 8, 9] => 474
[1, 2, 3, 4, 5, 6, 8, 9] => 241
[1, 2, 3, 4, 5, 6, 9, 10] => 26
[1, 2, 3, 4, 5, 6, 8, 9, 10] => 668
counter(build_and_find(data, hamming, query, 3, true) for _ in 1:10000)
DataStructures.Accumulator{Array{Int64,1},Int64} with 11 entries:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] => 7564
[1, 2, 3, 4, 5, 6, 7, 8, 10] => 241
[1, 2, 3, 4, 5, 6, 7, 10] => 77
[1, 2, 3, 4, 5, 6, 7, 9, 10] => 574
[1, 2, 3, 4, 5, 6, 8, 10] => 21
[1, 2, 3, 4, 5, 6, 7, 8] => 51
[1, 2, 3, 4, 5, 6, 7, 9] => 23
[1, 2, 3, 4, 5, 6, 7, 8, 9] => 499
[1, 2, 3, 4, 5, 6, 8, 9] => 256
[1, 2, 3, 4, 5, 6, 9, 10] => 44
[1, 2, 3, 4, 5, 6, 8, 9, 10] => 650
Metadata
Metadata
Assignees
Labels
No labels