-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[ty] First cut at semantic token provider #19108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
CodSpeed WallTime Performance ReportMerging #19108 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I'd prefer for Micha or Dhruv (who know the LSP and our plans in that area better) to review this; just one initial comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great start, thank you for doing this!
I've been reviewing this for an hour and I need to take a lunch break but here are my initial comments. I'll look at the remaining parts after the lunch break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, my review is finally done, this is an awesome PR!
The test cases are extremely thorough, thank you for writing them!
I think the only required change would be to make sure that we convert the ty location values back to the LSP values in the common function for the semantic tokens handler. And, around the visitor implementation where some of the calls that might lead to duplicate tokens could be removed.
A lot of my review comments could be considered as suggestions so feel free to either apply them or discard them. It's totally fine if those are done as follow-up. I can also own them as follow-up to this PR if you prefer.
…None` means "entire document".
…er than `Option<SemanticTokens>`.
…g `tokens` as a public field.
@dhruvmanila, thanks for the thorough and insightful code review comments! I think I've addressed them all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Thank you for taking this on and addressing all the review comments!
} | ||
|
||
/// Convert to LSP modifier indices for encoding | ||
pub fn to_lsp_indices(self) -> Vec<u32> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why do we need this method. Can we not just use the contained u32
directly which is what the LSP response is expecting? The usage of to_lsp_indices
is constructing the same u32
as the one that's contained in SemanticTokenModifier
:
let token_modifiers = token
.modifiers
.to_lsp_indices()
.into_iter()
.fold(0u32, |acc, modifier_index| acc | (1 << modifier_index));
} | ||
} | ||
|
||
fn visit_body(&mut self, body: &[Stmt]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't required because the SourceOrderVisitor
already has the visit_body
method with the same logic as this.
ast::Expr::StringLiteral(string_literal) => { | ||
// For implicitly concatenated strings, emit separate tokens for each string part | ||
for string_part in &string_literal.value { | ||
self.add_token( | ||
string_part.range(), | ||
SemanticTokenType::String, | ||
SemanticTokenModifier::empty(), | ||
); | ||
} | ||
walk_expr(self, expr); | ||
} | ||
ast::Expr::BytesLiteral(bytes_literal) => { | ||
// For implicitly concatenated bytes, emit separate tokens for each bytes part | ||
for bytes_part in &bytes_literal.value { | ||
self.add_token( | ||
bytes_part.range(), | ||
SemanticTokenType::String, | ||
SemanticTokenModifier::empty(), | ||
); | ||
} | ||
walk_expr(self, expr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be implementing the visit_string_literal
and visit_bytes_literal
method which are part of the SourceOrderVisitor
trait here instead because otherwise this is going to miss adding the string literal tokens which are inside an f-string. For example, in f"foo {'nested'}"
, the 'nested'
is a StringLiteral
but this logic won't emit the string token as it's not part of the Expr::StringLiteral
but Expr::FString
fn visit_string_literal(&mut self, string_literal: &ast::StringLiteral) {
self.add_token(
string_literal.range(),
SemanticTokenType::String,
SemanticTokenModifier::empty(),
);
}
fn visit_bytes_literal(&mut self, bytes_literal: &ast::BytesLiteral) {
self.add_token(
bytes_literal.range(),
SemanticTokenType::String,
SemanticTokenModifier::empty(),
);
}
Can we also add a test case using f-strings if it's not already present?
ast::Expr::NumberLiteral(_) => { | ||
self.add_token( | ||
expr.range(), | ||
SemanticTokenType::Number, | ||
SemanticTokenModifier::empty(), | ||
); | ||
walk_expr(self, expr); | ||
} | ||
ast::Expr::BooleanLiteral(_) => { | ||
self.add_token( | ||
expr.range(), | ||
SemanticTokenType::BuiltinConstant, | ||
SemanticTokenModifier::empty(), | ||
); | ||
walk_expr(self, expr); | ||
} | ||
ast::Expr::NoneLiteral(_) => { | ||
self.add_token( | ||
expr.range(), | ||
SemanticTokenType::BuiltinConstant, | ||
SemanticTokenModifier::empty(), | ||
); | ||
walk_expr(self, expr); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These walk_expr
calls looks redundant as there are no expressions that are nested inside any of these expression variants.
ast::Expr::NumberLiteral(_) => { | |
self.add_token( | |
expr.range(), | |
SemanticTokenType::Number, | |
SemanticTokenModifier::empty(), | |
); | |
walk_expr(self, expr); | |
} | |
ast::Expr::BooleanLiteral(_) => { | |
self.add_token( | |
expr.range(), | |
SemanticTokenType::BuiltinConstant, | |
SemanticTokenModifier::empty(), | |
); | |
walk_expr(self, expr); | |
} | |
ast::Expr::NoneLiteral(_) => { | |
self.add_token( | |
expr.range(), | |
SemanticTokenType::BuiltinConstant, | |
SemanticTokenModifier::empty(), | |
); | |
walk_expr(self, expr); | |
} | |
ast::Expr::NumberLiteral(_) => { | |
self.add_token( | |
expr.range(), | |
SemanticTokenType::Number, | |
SemanticTokenModifier::empty(), | |
); | |
} | |
ast::Expr::BooleanLiteral(_) => { | |
self.add_token( | |
expr.range(), | |
SemanticTokenType::BuiltinConstant, | |
SemanticTokenModifier::empty(), | |
); | |
} | |
ast::Expr::NoneLiteral(_) => { | |
self.add_token( | |
expr.range(), | |
SemanticTokenType::BuiltinConstant, | |
SemanticTokenModifier::empty(), | |
); | |
} |
#[test] | ||
fn test_semantic_tokens_variables() { | ||
let test = cursor_test( | ||
" | ||
x = 42 | ||
y = 'hello'<CURSOR> | ||
", | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I totally missed this in my first review and it would've probably saved you some time. We've been using snapshot based testing for individual IDE capabilities like the hover example below:
ruff/crates/ty_ide/src/hover.rs
Lines 168 to 202 in 28ab61d
#[test] | |
fn hover_member() { | |
let test = cursor_test( | |
r#" | |
class Foo: | |
a: int = 10 | |
def __init__(a: int, b: str): | |
self.a = a | |
self.b: str = b | |
foo = Foo() | |
foo.<CURSOR>a | |
"#, | |
); | |
assert_snapshot!(test.hover(), @r" | |
int | |
--------------------------------------------- | |
```text | |
int | |
``` | |
--------------------------------------------- | |
info[hover]: Hovered content is | |
--> main.py:10:9 | |
| | |
9 | foo = Foo() | |
10 | foo.a | |
| ^^^^- | |
| | | | |
| | Cursor offset | |
| source | |
| | |
"); | |
} |
The assert_snapshot!
macro is important here. What happens is that we create an output format for these requests which are human readable and use assert_snapshot!(test.hover(), "")
. Then, running the tests will automatically replace the ""
(second argument) with the generated content from test.hover()
call. And, finally we'd verify that those are correct. The output format is decided by us and is based on what information is required to verify the implementation.
I don't think we need to change anything in this PR but it'd be useful to convert these tests into snapshots instead. I'll open a new issue to keep track of it and it would be a good "help wanted" issue a contributor could pick up.
debug_assert!( | ||
self.tokens.is_empty() || self.tokens.last().unwrap().start() <= range.start(), | ||
"Tokens must be added in file order: previous token ends at {:?}, new token starts at {:?}", | ||
self.tokens.last().map(SemanticToken::start), | ||
range.start() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead have a strict <
check instead of <=
? I'm asking because adding duplicate tokens (same range) wouldn't raise an assertion here.
This PR implements a basic semantic token provider for ty's language server. This allows for more accurate semantic highlighting / coloring within editors that support this LSP functionality.
Here are screen shots that show how code appears in VS Code using the "rainbow" theme both before and after this change.
The token types and modifier tags in this implementation largely mirror those used in Microsoft's default language server for Python.
The implementation supports two LSP interfaces. The first provides semantic tokens for an entire document, and the second returns semantic tokens for a requested range within a document.
The PR includes unit tests. It also includes comments that document known limitations and areas for future improvements.