Open
Description
The current default hash is sensitive to whitespace, which is not great, and ought to be replaced with something better... OTOH this might work:
def robust_hash_python_code_tokenize(code):
try:
tokens = []
g = tokenize.generate_tokens(io.StringIO(code).readline)
for toknum, tokval, _, _, _ in g:
if toknum not in (tokenize.COMMENT, tokenize.NL, tokenize.ENDMARKER, tokenize.DEDENT, tokenize.INDENT):
tokens.append((toknum, tokval))
token_string = str(tokens).encode('utf-8')
return hashlib.sha256(token_string).hexdigest()
except tokenize.TokenError:
return None
See also airspeed-velocity/asv#1456 (comment)
Metadata
Metadata
Assignees
Labels
No labels