Skip to content

Commit 306a7cb

Browse files
authored
new fast isValidCell (#968)
* first real change * just for me * oooof clang format version thing? * fixed clang format * port over old new solution * try old * check that _isValidCell_const fails on draft PRs * try old again * ok, see if this passes * bah * everyone should like this one * who doesn't change their mind about comment formats? * playing * comment text * try inline * do you even bench? * confirm this works on my mac, but fails on other arch * some timings * timings * temporarily adding logs used to make plots * comment out _isValidCell_old * fail-fast: false * missed one! * try defined(_M_X64) * define some helper functions * more helpers * boop * notes * function names * move functions around * formatting * benchmarks * remove dev files * snake to camel * drop pos * drop some conditionals * intention more clear * organize * little nicer * remove justfile and remove fail fast * comment * bench for is base cell test * maybe working with extern * back to isBaseCellPentagonArr * TODO: #984 * changelog * remove files * remove more files
1 parent 42450ce commit 306a7cb

File tree

3 files changed

+188
-33
lines changed

3 files changed

+188
-33
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ The public API of this library consists of the functions declared in file
99

1010
### Changed
1111
- Moved `ContainmentMode` enum for `polygonToCellsExperimental` to `h3api.h`. (#958)
12+
- Faster `isValidCell` (#968)
1213

1314
## [4.2.0] - 2024-12-04
1415
### Added

src/apps/testapps/testH3Index.c

+20
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,26 @@ SUITE(h3Index) {
142142
"isValidCell failed on deleted subsequence");
143143
}
144144

145+
TEST(moreDeletedSubsequenceInvalid) {
146+
H3Index p = 0x80c3fffffffffff; // res 0 pentagon
147+
H3Index h;
148+
149+
for (int res = 1; res <= 15; res++) {
150+
t_assertSuccess(H3_EXPORT(cellToCenterChild)(p, res, &h));
151+
t_assert(H3_EXPORT(isValidCell)(h), "should be a valid pentagon");
152+
153+
for (int d = 0; d <= 6; d++) {
154+
H3_SET_INDEX_DIGIT(h, res, d);
155+
if (d == 1) {
156+
t_assert(!H3_EXPORT(isValidCell)(h),
157+
"fail on deleted subsequence");
158+
} else {
159+
t_assert(H3_EXPORT(isValidCell)(h), "should be valid");
160+
}
161+
}
162+
}
163+
}
164+
145165
TEST(h3ToString) {
146166
const size_t bufSz = 17;
147167
char buf[17] = {0};

src/h3lib/lib/h3Index.c

+167-33
Original file line numberDiff line numberDiff line change
@@ -119,50 +119,184 @@ H3Error H3_EXPORT(h3ToString)(H3Index h, char *str, size_t sz) {
119119
return E_SUCCESS;
120120
}
121121

122-
/**
123-
* Returns whether or not an H3 index is a valid cell (hexagon or pentagon).
124-
* @param h The H3 index to validate.
125-
* @return 1 if the H3 index if valid, and 0 if it is not.
126-
*/
127-
int H3_EXPORT(isValidCell)(H3Index h) {
128-
if (H3_GET_HIGH_BIT(h) != 0) return 0;
122+
/*
123+
The top 8 bits of any cell should be a specific constant:
129124
130-
if (H3_GET_MODE(h) != H3_CELL_MODE) return 0;
125+
- The 1 high bit should be `0`
126+
- The 4 mode bits should be `0001` (H3_CELL_MODE)
127+
- The 3 reserved bits should be `000`
131128
132-
if (H3_GET_RESERVED_BITS(h) != 0) return 0;
129+
In total, the top 8 bits should be `0_0001_000`
130+
*/
131+
static inline bool _hasGoodTopBits(H3Index h) {
132+
h >>= (64 - 8);
133+
return h == 0b00001000;
134+
}
133135

134-
int baseCell = H3_GET_BASE_CELL(h);
135-
if (NEVER(baseCell < 0) || baseCell >= NUM_BASE_CELLS) {
136-
// Base cells less than zero can not be represented in an index
137-
return 0;
138-
}
136+
/* Check that no digit from 1 to `res` is 7 (INVALID_DIGIT).
139137
140-
int res = H3_GET_RESOLUTION(h);
141-
if (NEVER(res < 0 || res > MAX_H3_RES)) {
142-
// Resolutions less than zero can not be represented in an index
143-
return 0;
144-
}
138+
MHI = 0b100100100100100100100100100100100100100100100;
139+
MLO = MHI >> 2;
145140
146-
bool foundFirstNonZeroDigit = false;
147-
for (int r = 1; r <= res; r++) {
148-
Direction digit = H3_GET_INDEX_DIGIT(h, r);
141+
| d | d & MHI | ~d | ~d - MLO | d & MHI & (~d - MLO) | result |
142+
|-----|---------|-----|----------|----------------------|---------|
143+
| 000 | 000 | | | 000 | OK |
144+
| 001 | 000 | | | 000 | OK |
145+
| 010 | 000 | | | 000 | OK |
146+
| 011 | 000 | | | 000 | OK |
147+
| 100 | 100 | 011 | 010 | 000 | OK |
148+
| 101 | 100 | 010 | 001 | 000 | OK |
149+
| 110 | 100 | 001 | 000 | 000 | OK |
150+
| 111 | 100 | 000 | 111* | 100 | invalid |
149151
150-
if (!foundFirstNonZeroDigit && digit != CENTER_DIGIT) {
151-
foundFirstNonZeroDigit = true;
152-
if (_isBaseCellPentagon(baseCell) && digit == K_AXES_DIGIT) {
153-
return 0;
154-
}
155-
}
152+
*: carry happened
153+
154+
155+
Note: only care about identifying the *lowest* 7.
156+
157+
Examples with multiple digits:
158+
159+
| d | d & MHI | ~d | ~d - MLO | d & MHI & (~d - MLO) | result |
160+
|---------|---------|---------|----------|----------------------|---------|
161+
| 111.111 | 100.100 | 000.000 | 110.111* | 100.100 | invalid |
162+
| 110.111 | 100.100 | 001.000 | 111.111* | 100.100 | invalid |
163+
| 110.110 | 100.100 | 001.001 | 000.000 | 000.000 | OK |
164+
165+
*: carry happened
166+
167+
In the second example with 110.111, we "misidentify" the 110 as a 7, due
168+
to a carry from the lower bits. But this is OK because we correctly
169+
identify the lowest (only, in this example) 7 just before it.
156170
157-
if (NEVER(digit < CENTER_DIGIT) || digit >= NUM_DIGITS) return 0;
171+
We only have to worry about carries affecting higher bits in the case of
172+
a 7; all other digits (0--6) don't cause a carry when computing ~d - MLO.
173+
So even though a 7 can affect the results of higher bits, this is OK
174+
because we will always correctly identify the lowest 7.
175+
176+
For further notes, see the discussion here:
177+
https://github.com/uber/h3/pull/496#discussion_r795851046
178+
*/
179+
static inline bool _hasAny7UptoRes(H3Index h, int res) {
180+
const uint64_t MHI = 0b100100100100100100100100100100100100100100100;
181+
const uint64_t MLO = MHI >> 2;
182+
183+
int shift = 3 * (15 - res);
184+
h >>= shift;
185+
h <<= shift;
186+
h = (h & MHI & (~h - MLO));
187+
188+
return h != 0;
189+
}
190+
191+
/* Check that all unused digits after `res` are set to 7 (INVALID_DIGIT).
192+
193+
Bit shift to avoid looping through digits.
194+
*/
195+
static inline bool _hasAll7AfterRes(H3Index h, int res) {
196+
// NOTE: res check is needed because we can't shift by 64
197+
if (res < 15) {
198+
int shift = 19 + 3 * res;
199+
200+
h = ~h;
201+
h <<= shift;
202+
h >>= shift;
203+
204+
return h == 0;
158205
}
206+
return true;
207+
}
208+
209+
/*
210+
Get index of first nonzero bit of an H3Index.
211+
212+
When available, use compiler intrinsics, which should be fast.
213+
If not available, fall back to a loop.
214+
*/
215+
static inline int _firstOneIndex(H3Index h) {
216+
#if defined(__GNUC__) || defined(__clang__)
217+
return 63 - __builtin_clzll(h);
218+
#elif defined(_MSC_VER) && defined(_M_X64) // doesn't work on win32
219+
unsigned long index;
220+
_BitScanReverse64(&index, h);
221+
return (int)index;
222+
#else
223+
// Portable fallback
224+
int pos = 63 - 19;
225+
H3Index m = 1;
226+
while ((h & (m << pos)) == 0) pos--;
227+
return pos;
228+
#endif
229+
}
159230

160-
for (int r = res + 1; r <= MAX_H3_RES; r++) {
161-
Direction digit = H3_GET_INDEX_DIGIT(h, r);
162-
if (digit != INVALID_DIGIT) return 0;
231+
/*
232+
One final validation just for cells whose base cell (res 0)
233+
is a pentagon.
234+
235+
Pentagon cells start with a sequence of 0's (CENTER_DIGIT's).
236+
The first nonzero digit can't be a 1 (i.e., "deleted subsequence",
237+
PENTAGON_SKIPPED_DIGIT, or K_AXES_DIGIT).
238+
239+
We can check that (in the lower 45 = 15*3 bits) the position of the
240+
first 1 bit isn't divisible by 3.
241+
*/
242+
static inline bool _hasDeletedSubsequence(H3Index h, int base_cell) {
243+
// TODO: https://github.com/uber/h3/issues/984
244+
static const bool isBaseCellPentagonArr[128] = {
245+
[4] = 1, [14] = 1, [24] = 1, [38] = 1, [49] = 1, [58] = 1,
246+
[63] = 1, [72] = 1, [83] = 1, [97] = 1, [107] = 1, [117] = 1};
247+
248+
if (isBaseCellPentagonArr[base_cell]) {
249+
h <<= 19;
250+
h >>= 19;
251+
252+
if (h == 0) return false; // all zeros: res 15 pentagon
253+
return _firstOneIndex(h) % 3 == 0;
163254
}
255+
return false;
256+
}
164257

165-
return 1;
258+
/**
259+
* Returns whether or not an H3 index is a valid cell (hexagon or pentagon).
260+
* @param h The H3 index to validate.
261+
* @return 1 if the H3 index if valid, and 0 if it is not.
262+
*/
263+
int H3_EXPORT(isValidCell)(H3Index h) {
264+
/*
265+
Look for bit patterns that would disqualify an H3Index from
266+
being valid. If identified, exit early.
267+
268+
For reference the H3 index bit layout:
269+
270+
| Region | # bits |
271+
|------------|--------|
272+
| High | 1 |
273+
| Mode | 4 |
274+
| Reserved | 3 |
275+
| Resolution | 4 |
276+
| Base Cell | 7 |
277+
| Digit 1 | 3 |
278+
| Digit 2 | 3 |
279+
| ... | ... |
280+
| Digit 15 | 3 |
281+
282+
Speed benefits come from using bit manipulation instead of loops,
283+
whenever possible.
284+
*/
285+
if (!_hasGoodTopBits(h)) return false;
286+
287+
// No need to check resolution; any 4 bits give a valid resolution.
288+
const int res = H3_GET_RESOLUTION(h);
289+
290+
// Get base cell number and check that it is valid.
291+
const int bc = H3_GET_BASE_CELL(h);
292+
if (bc >= NUM_BASE_CELLS) return false;
293+
294+
if (_hasAny7UptoRes(h, res)) return false;
295+
if (!_hasAll7AfterRes(h, res)) return false;
296+
if (_hasDeletedSubsequence(h, bc)) return false;
297+
298+
// If no disqualifications were identified, the index is a valid H3 cell.
299+
return true;
166300
}
167301

168302
/**

0 commit comments

Comments
 (0)