Skip to content

pkg/cover: parse Rust DWARF #6000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
a-nogikh opened this issue May 5, 2025 · 8 comments
Open

pkg/cover: parse Rust DWARF #6000

a-nogikh opened this issue May 5, 2025 · 8 comments

Comments

@a-nogikh
Copy link
Collaborator

a-nogikh commented May 5, 2025

Rust splits crates into Code Generation Units, which end up in DW_TAG_compile_unit.

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer    ("clang LLVM (rustc version 1.86.0 (05f9846f8 2025-03-31))")
              DW_AT_language    (DW_LANG_Rust)
              DW_AT_name        ("drivers/android/binder/rust_binder.rs/@/rust_binder.7515a52ef019218e-cgu.0")
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/android")
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_ranges      (0x0005f850
                 [0x0000000000000010, 0x00000000000235d7)
                 [0x00000000000235f0, 0x000000000006dbe3)
                 [0x000000000006dc00, 0x000000000006dc43)
                 [0x0000000000000010, 0x000000000000063d)
                 [0x0000000000000650, 0x000000000000066a))

At the same time, all .rs files that belong to a crate are essentially squashed into the same compile unit, which goes against the assumptions of our pkg/report code that first extracts the compilation units

for _, unit := range rg.Units {
files[unit.Name] = &file{
module: unit.Module.Name,

and then maps addr2line results to them.

If we just strip the /@/.. part off the unit name, our coverage reports would only highlight the coverage from the single mentioned .rs file and not from everything that was included via mod NAME;.

We could theoretically traverse the DWARF info and collect the files mentioned in DW_AT_call_file and DW_AT_decl_file, but I wonder if there's a more elegant solution.

@a-nogikh a-nogikh changed the title pkg/report: recognize Rust DWARF pkg/report: parse Rust DWARF May 5, 2025
@dvyukov
Copy link
Collaborator

dvyukov commented May 5, 2025

We could also switch to own DWARF parsing:
#4585 (comment)

After I switched objdump to manual binary parsing, addr2line is the last slow part. Manual parsing of all DWARF looks super fast in my experiments, so we could drop tons of complexity related to lazy parsing, compile units, etc.

@a-nogikh a-nogikh changed the title pkg/report: parse Rust DWARF pkg/cover: parse Rust DWARF May 5, 2025
@a-nogikh
Copy link
Collaborator Author

a-nogikh commented May 5, 2025

There's still a question of what to look for in the DWARF info in this case :)

I wish there were some explicit records as to what .rs source files went into a CGU, but I don't see anything useful apart from DW_AT_call_file and DW_AT_decl_file.

@dvyukov
Copy link
Collaborator

dvyukov commented May 5, 2025

We won't need CGUs nor compile units, we just map all callback PCs to file:line info. Basically scan whole DWARF and extract all PC -> file:line info.

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented May 7, 2025

After an offline discussion with @dvyukov:

It would be best to teach pkg/cover/backend split a single Rust compilation unit into a set of fake per-file compilation units:

unit := &CompileUnit{
ObjectUnit: ObjectUnit{
Name: attrName.(string),
},
Module: module,
}
units = append(units, unit)
ranges1, err := debugInfo.Ranges(ent)
if err != nil {
return nil, nil, err
}

The question is still how to extract these file names and how to split the address range.

Getting the address ranges of a whole compilation unit is straigtforward - it's just in its direct DW_AT_ranges attribute:

0x0000000b: DW_TAG_compile_unit
< ... >
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_ranges      (0x0005f850
                 [0x0000000000000010, 0x00000000000235d7)
                 [0x00000000000235f0, 0x000000000006dbe3)
                 [0x000000000006dc00, 0x000000000006dc43)
                 [0x0000000000000010, 0x000000000000063d)
                 [0x0000000000000650, 0x000000000000066a))

To split it per file, it looks like we'd have to go deeper into the DWARF tree and process all entries like this

0x00000a9c:                 DW_TAG_inlined_subroutine
                              DW_AT_abstract_origin     (0x00026560 "<...>")
                              DW_AT_ranges      (0x0000e550
                                 [0x00000000000107aa, 0x00000000000107d8)
                                 [0x00000000000107e2, 0x0000000000010865))
                              DW_AT_call_file   ("/rust/kernel/sync/arc.rs")
                              DW_AT_call_line   (404)
                              DW_AT_call_column (22)

Those subprograms that are not inlined, may have low_pc and high_pc instead of specific DW_AT_ranges.

0x00062e1a:         DW_TAG_subprogram
                      DW_AT_low_pc      (0x0000000000061230)
                      DW_AT_high_pc     (0x0000000000065d86)
                      DW_AT_frame_base  (DW_OP_reg6 RBP)
                      DW_AT_linkage_name        ("< ... >")
                      DW_AT_name        ("do_work")
                      DW_AT_decl_file   ("/drivers/android/binder/transaction.rs")
                      DW_AT_decl_line   (388)
                      DW_AT_type        (0x0004f499 "core::result::Result<bool, kernel::error::Error>")
                      DW_AT_external    (true)

... and the subprogram itself may of course also have DW_TAG_inlined_subroutine that point to other files.

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented May 7, 2025

If we are to follow the LineReader approach as suggested here: #4585 (comment)

		const languageRust = 28
		language := ent.Val(dwarf.AttrLanguage)
		if language != nil && language.(int64) == languageRust {
			fmt.Printf("%s is in Rust\n", unitName)
			lr, err := debugInfo.LineReader(ent)
			if err != nil {
				panic(err)
			}
			pcs := map[uint64]bool{}
			var entry dwarf.LineEntry
			for {
				if err := lr.Next(&entry); err != nil {
					if err == io.EOF {
						break
					}
					panic(err)
				}
				pcs[entry.Address] = true
				fmt.Printf("%x: LINE: %s: %d\n", entry.Address, entry.File.Name, entry.Line)
			}
			fmt.Printf("LineReader() returned %d PCs\n", len(pcs))
		} 

This piece of code does indeed extract the PCs associated with the compilation unit as well as their source files, and these PCs are within the ranges returned by debugInfo.Ranges(ent).

The small problem is that DWARF's compilation units are relative to the build directory (at least for kernel source files), while the LineReader results are absolute:

rust/compiler_builtins.rs is in Rust
ffffffff82984350: LINE: /android/rust/compiler_builtins.rs: 33
ffffffff82984358: LINE: /android/rust/compiler_builtins.rs: 34
ffffffff8298435f: LINE: /android/rust/compiler_builtins.rs: 34

Do we want to call CleanPath() from readTextRanges?

Another concern are PCs that belong to the Rust toolchain.

ffffffff82941170: LINE: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs: 208
ffffffff829411c2: LINE: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs: 211
ffffffff829411d9: LINE: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs: 239
ffffffff82941fa0: LINE: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs: 208
ffffffff82941fae: LINE: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs: 211
ffffffff82941fce: LINE: /root/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/num.rs: 211

These same files are mentioned in many compilation units. Do we want to create fake &CompileUnit{} objects for these and collect there all PCs that are the results of inlining these library methods?

@dvyukov
Copy link
Collaborator

dvyukov commented May 8, 2025

Do we want to call CleanPath() from readTextRanges?

Since it's a hack already, I would say call whatever is necessary to make it work.

These same files are mentioned in many compilation units. Do we want to create fake &CompileUnit{} objects for these and collect there all PCs that are the results of inlining these library methods?

We won't find/show these source files anyway. So if everything works without that, I don't think it's necessary.

a-nogikh added a commit to a-nogikh/syzkaller that referenced this issue May 9, 2025
Rust compilation units are different from C in that a single compilation
unit includes multiple source files, but we still need to tell which PC
range belong to which source file.

Infer that information from the LineEntry structures.

Cc google#6000.
a-nogikh added a commit to a-nogikh/syzkaller that referenced this issue May 12, 2025
Rust compilation units are different from C in that a single compilation
unit includes multiple source files, but we still need to tell which PC
range belong to which source file.

Infer that information from the LineEntry structures.

Cc google#6000.
a-nogikh added a commit to a-nogikh/syzkaller that referenced this issue May 13, 2025
Rust compilation units are different from C in that a single compilation
unit includes multiple source files, but we still need to tell which PC
range belong to which source file.

Infer that information from the LineEntry structures.

Cc google#6000.
github-merge-queue bot pushed a commit that referenced this issue May 13, 2025
Rust compilation units are different from C in that a single compilation
unit includes multiple source files, but we still need to tell which PC
range belong to which source file.

Infer that information from the LineEntry structures.

Cc #6000.
@a-nogikh
Copy link
Collaborator Author

Let's consider it solved.

@a-nogikh
Copy link
Collaborator Author

Given the huge amount of inlining, even after this change there are a number of .rs files that have 100% coverage because of

// Special mark for header files, if a file does not have coverage at all it is not shown.
totalPCs: 1,
coveredPCs: 1,

Locally, I've also covered drivers/block/rnull.rs and, even though the fuzzer does reach it, I never see any coverage in drivers/block/rnull.rs. The covered coverage callbacks are always inlined somewhere:

0xffffffff85fb1eb7
_RNvMNtNtNtCs43vyB533jt3_6kernel5block2mq7requestINtB2_7RequestNtCsktjF9JQNZ8U_5rnull13NullBlkDeviceE6end_okB10_
/linux/rust/kernel/block/mq/request.rs:128
_RNvXs0_CsktjF9JQNZ8U_5rnullNtB5_13NullBlkDeviceNtNtNtNtCs43vyB533jt3_6kernel5block2mq10operations10Operations8queue_rq
/linux/drivers/block/rnull.rs:69
_RNvMNtNtNtCs43vyB533jt3_6kernel5block2mq10operationsINtB2_16OperationsVTableNtCsktjF9JQNZ8U_5rnull13NullBlkDeviceE17queue_rq_callbackB1e_
/linux/rust/kernel/block/mq/operations.rs:94
0xffffffff85fb1e96
_RINvNtNtCs9jEwPDbx20M_4core4sync6atomic23atomic_compare_exchangeyECsktjF9JQNZ8U_5rnull
/usr/local/rustup/toolchains/1.87.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/sync/atomic.rs:3807
_RNvMsU_NtNtCs9jEwPDbx20M_4core4sync6atomicNtB5_9AtomicU6416compare_exchange
/usr/local/rustup/toolchains/1.87.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/sync/atomic.rs:2881
_RNvMNtNtNtCs43vyB533jt3_6kernel5block2mq7requestINtB2_7RequestNtCsktjF9JQNZ8U_5rnull13NullBlkDeviceE11try_set_endB10_
/linux/rust/kernel/block/mq/request.rs:102
_RNvMNtNtNtCs43vyB533jt3_6kernel5block2mq7requestINtB2_7RequestNtCsktjF9JQNZ8U_5rnull13NullBlkDeviceE6end_okB10_
/linux/rust/kernel/block/mq/request.rs:122
_RNvXs0_CsktjF9JQNZ8U_5rnullNtB5_13NullBlkDeviceNtNtNtNtCs43vyB533jt3_6kernel5block2mq10operations10Operations8queue_rq
/linux/drivers/block/rnull.rs:69
_RNvMNtNtNtCs43vyB533jt3_6kernel5block2mq10operationsINtB2_16OperationsVTableNtCsktjF9JQNZ8U_5rnull13NullBlkDeviceE17queue_rq_callbackB1e_
/linux/rust/kernel/block/mq/operations.rs:94

I don't get why it's not one of the files with 100% fake coverage, though. If I see it appear when I addr2line the /rawcover output, syzkaller must have seen it as well.

@a-nogikh a-nogikh reopened this May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants