Skip to content

native: for in string #24613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions vlib/v/gen/native/amd64.v
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,10 @@ fn (mut c Amd64) cmp(reg Amd64Register, size Size, val i64) {
c.g.println('cmp ${reg}, ${val}')
}

fn (mut c Amd64) cmp_reg2(reg Register, reg2 Register) {
c.cmp_reg(reg as Amd64Register, reg2 as Amd64Register)
}

// `cmp rax, rbx`
fn (mut c Amd64) cmp_reg(reg Amd64Register, reg2 Amd64Register) {
match reg {
Expand Down Expand Up @@ -1098,8 +1102,12 @@ fn (mut c Amd64) push(r Register) {
c.g.write8(0x50 + i32(reg) - 8)
}
c.is_16bit_aligned = !c.is_16bit_aligned
c.g.println('push ${reg}')
c.g.stack_depth++
c.g.println('push ${reg}; stack_depth:${c.g.stack_depth}')
}

fn (mut c Amd64) pop2(r Register) {
c.pop(r as Amd64Register)
}

pub fn (mut c Amd64) pop(reg Amd64Register) {
Expand All @@ -1108,8 +1116,8 @@ pub fn (mut c Amd64) pop(reg Amd64Register) {
}
c.g.write8(0x58 + i32(reg) % 8)
c.is_16bit_aligned = !c.is_16bit_aligned
c.g.println('pop ${reg}')
c.g.stack_depth--
c.g.println('pop ${reg} ; stack_depth:${c.g.stack_depth}')
}

pub fn (mut c Amd64) sub8(reg Amd64Register, val i32) {
Expand Down Expand Up @@ -3617,7 +3625,7 @@ pub fn (mut c Amd64) allocate_var(name string, size i32, initial_val Number) i32
}

// println('allocate_var(size=$size, initial_val=$initial_val)')
c.g.println('mov [rbp-${int(n).hex2()}], ${initial_val} ; Allocate var `${name}`')
c.g.println('mov [rbp-${int(n).hex2()}], ${initial_val} ; Allocate var `${name}` size: ${size}')
return c.g.stack_var_pos
}

Expand Down
8 changes: 8 additions & 0 deletions vlib/v/gen/native/arm64.v
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,14 @@ pub fn (mut c Arm64) add_reg2(r Register, r2 Register) {
panic('Arm64.add_reg2() not implemented')
}

fn (mut c Arm64) pop2(r Register) {
panic('Arm64.pop2() not implemented')
}

fn (mut c Arm64) cmp_reg2(reg Register, reg2 Register) {
panic('Arm64.add_reg2() not implemented')
}

fn (mut c Arm64) create_string_struct(typ ast.Type, name string, str string) {
panic('Arm64.add_reg2() not implemented')
}
Expand Down
7 changes: 6 additions & 1 deletion vlib/v/gen/native/blacklist.v
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@ const whitelist = {
'string.is_capital': false
'string.is_ascii': false
'string.is_identifier': false
// 'string.is_blank': false need for in
'string.is_blank': false
'string.indent_width': false
'string.index_u8': false
'string.last_index': true
'string.last_index_u8': false
'string.contains_u8': false
}

fn (g &Gen) is_blacklisted(name string, is_builtin bool) bool {
Expand Down
2 changes: 2 additions & 0 deletions vlib/v/gen/native/gen.v
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ mut:
cmp_to_stack_top(r Register)
cmp_var_reg(var Var, reg Register, config VarConfig)
cmp_var(var Var, val i32, config VarConfig)
cmp_reg2(reg Register, reg2 Register)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use cmp_reg2 instead of cmp_reg here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because cmp_reg needs Amd64Register and not Register, so cmp_reg2 is a function that casts the Registers to Amd64Register and then calls cmp_reg. I did not change the cmp_reg function as it is already used in a lot of places, to avoid big changes. (same for pop and pop2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a place, that will need to call the interface version, instead of the CPU specific one?
i.e. why does it need to be in the public interface at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, there are a lot of calls to it fn (mut g Gen) for_in_stmt(node ast.ForInStmt) {, similar to this:

        g.code_gen.pop2(Amd64Register.rdx)
        g.println('; push address of the len:')
        g.code_gen.push(Amd64Register.rdx) // len
        g.code_gen.mov_deref(Amd64Register.rdx, Amd64Register.rdx, ast.int_type)
        g.code_gen.cmp_reg2(main_reg, Amd64Register.rdx)

That does not seem right to me - vlib/v/gen/native/stmt.c.v is supposed to have platform independent code, not pass the names of Amd64 specific registers around 🤔 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have something similar to:
main_reg := g.code_gen.main_reg()
for example:
sreg := g.code_gen.secondary_reg()
and then use that instead of Amd64Register.rdx in places, that are not supposed to be platform dependent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, that cleanup is better indeed to be in its own separate PR, not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will clean this up when I'll have a better idea on how it could be organized in a cleaner way.

cmp_zero(reg Register)
convert_bool_to_string(r Register)
convert_int_to_string(a Register, b Register)
Expand Down Expand Up @@ -131,6 +132,7 @@ mut:
mov64(r Register, val Number)
movabs(reg Register, val i64)
patch_relative_jmp(pos i32, addr i64)
pop2(r Register)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add pop2, instead of pop here?
There is no other pop method in the public interface, and it only has a single argument, not 2.

prefix_expr(node ast.PrefixExpr)
push(r Register)
ret()
Expand Down
72 changes: 70 additions & 2 deletions vlib/v/gen/native/stmt.c.v
Original file line number Diff line number Diff line change
Expand Up @@ -255,14 +255,14 @@ fn (mut g Gen) for_stmt(node ast.ForStmt) {
}

fn (mut g Gen) for_in_stmt(node ast.ForInStmt) { // Work on that
main_reg := g.code_gen.main_reg()
if node.is_range {
g.println('; for ${node.val_var} in range {')
// for a in node.cond .. node.high {

i := g.code_gen.allocate_var(node.val_var, 8, i64(0)) // iterator variable
g.println('; evaluate node.cond for lower bound:')
g.expr(node.cond) // outputs the lower loop bound (initial value) to the main reg
main_reg := g.code_gen.main_reg()
g.println('; move the result to i')
g.code_gen.mov_reg_to_var(LocalVar{i, ast.i64_type_idx, node.val_var}, main_reg) // i = node.cond // initial value

Expand Down Expand Up @@ -299,11 +299,79 @@ fn (mut g Gen) for_in_stmt(node ast.ForInStmt) { // Work on that
g.labels.addrs[end_label] = g.pos()
g.println('; label ${end_label} (end_label)')
g.println('; for ${node.val_var} in range }')
} else if node.kind == .string {
g.println('; for ${node.val_var} in string {')
// for c in my_string {

key_var := if node.key_var == '' { 'i' } else { node.key_var }
i := g.code_gen.allocate_var(key_var, 8, i64(0)) // iterator variable
c := g.code_gen.allocate_var(node.val_var, 1, i64(0)) // char variable

g.expr(node.cond) // get the address of the string variable
g.code_gen.mov_deref(Amd64Register.rdx, main_reg, ast.charptr_type)
g.println('; push address of the string chars')
g.code_gen.push(Amd64Register.rdx) // address of the string
g.code_gen.add(main_reg, g.get_field_offset(ast.string_type, 'len'))
g.println('; push address of the len:')
g.code_gen.push(main_reg) // address of the len

start := g.pos() // label-begin:

g.println('; check iterator against upper loop bound')
g.code_gen.mov_var_to_reg(main_reg, LocalVar{i, ast.i64_type_idx, key_var})
g.println('; pop address of the len:')
g.code_gen.pop2(Amd64Register.rdx)
g.println('; push address of the len:')
g.code_gen.push(Amd64Register.rdx) // len
g.code_gen.mov_deref(Amd64Register.rdx, Amd64Register.rdx, ast.int_type)
g.code_gen.cmp_reg2(main_reg, Amd64Register.rdx)
jump_addr := g.code_gen.cjmp(.jge) // leave loop i >= len

g.println('; pop address of the len:')
g.code_gen.pop2(Amd64Register.rdx) // len
g.println('; pop address of the string chars')
g.code_gen.pop2(Amd64Register.rax) // address of the string
g.println('; push address of the string chars')
g.code_gen.push(Amd64Register.rax)
g.println('; push address of the len:')
g.code_gen.push(Amd64Register.rdx) // len

g.code_gen.mov_var_to_reg(Amd64Register.rdx, LocalVar{i, ast.i64_type_idx, key_var})
g.code_gen.add_reg2(Amd64Register.rax, Amd64Register.rdx)
g.code_gen.mov_deref(Amd64Register.rax, Amd64Register.rax, ast.u8_type_idx)
g.code_gen.mov_reg_to_var(LocalVar{c, ast.u8_type_idx, node.val_var}, Amd64Register.rax) // store the char

end_label := g.labels.new_label()
g.labels.patches << LabelPatch{
id: end_label
pos: jump_addr
}
g.println('; jump to label ${end_label} (end_label)')

start_label := g.labels.new_label() // used for continue
g.labels.branches << BranchLabel{
name: node.label
start: start_label
end: end_label
}
g.stmts(node.stmts) // writes the actual body of the loop

g.labels.addrs[start_label] = g.pos() // used for continue (continue: jump before the inc)
g.println('; label ${start_label} (continue_label)')

g.code_gen.inc_var(LocalVar{i, ast.i64_type_idx, key_var})
g.labels.branches.pop()
g.code_gen.jmp_back(start) // loops

g.labels.addrs[end_label] = g.pos()
g.code_gen.pop2(Amd64Register.rdx) // len
g.code_gen.pop2(Amd64Register.rax) // address of the string
g.println('; label ${end_label} (end_label)')
g.println('; for ${node.val_var} in string }')
/*
} else if node.kind == .array {
} else if node.kind == .array_fixed {
} else if node.kind == .map {
} else if node.kind == .string {
} else if node.kind == .struct {
} else if it.kind in [.array, .string] || it.cond_type.has_flag(.variadic) {
} else if it.kind == .map {
Expand Down
6 changes: 3 additions & 3 deletions vlib/v/gen/native/tests/builtin.vv
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ assert !a.is_ascii()
assert '_9'.is_identifier() == true
assert 'a 9'.is_identifier() == false
assert 't'.is_identifier() == true
/*
assert ''.is_blank()
assert ' '.is_blank()
assert ' \t'.is_blank()
Expand All @@ -48,18 +47,18 @@ assert !a.is_ascii()
assert 'abc'.index_u8(`A`) == -1
assert 'abc'.index_u8(`B`) == -1
assert 'abc'.index_u8(`C`) == -1
/*
assert 'abcabca'.last_index('ca')? == 5
assert 'abcabca'.last_index('ab')? == 3
assert 'abcabca'.last_index('b')? == 4
assert 'Zabcabca'.last_index('Z')? == 0
x := 'Zabcabca'.last_index('Y')
assert x == none
// TODO: `assert 'Zabcabca'.index_last('Y') == none` is a cgen error, 2023/12/04
*/
assert 'abcabca'.last_index_u8(`a`) == 6
assert 'abcabca'.last_index_u8(`c`) == 5
assert 'abcabca'.last_index_u8(`b`) == 4
assert 'Zabcabca'.last_index_u8(`Z`) == 0
//
assert 'abc'.last_index_u8(`d`) == -1
assert 'abc'.last_index_u8(`A`) == -1
assert 'abc'.last_index_u8(`B`) == -1
Expand All @@ -69,6 +68,7 @@ assert !a.is_ascii()
assert 'abc abca'.contains_u8(`c`)
assert 'abc abca'.contains_u8(` `)
assert !'abc abca'.contains_u8(`A`)
/*
assert 'Abcd'.camel_to_snake() == 'abcd'
assert 'aBcd'.camel_to_snake() == 'a_bcd'
assert 'AAbb'.camel_to_snake() == 'aa_bb'
Expand Down
21 changes: 21 additions & 0 deletions vlib/v/gen/native/tests/linux.vv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

fn test_for_c_in_string() {
$if windows {
println('0')
println('97')
println('1')
println('98')
println('2')
println('99')
} $else {
s := 'abc'
for i, c in s {
println(i)
println(int(c))
}
}
}

fn main() {
test_for_c_in_string()
}
6 changes: 6 additions & 0 deletions vlib/v/gen/native/tests/linux.vv.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
0
97
1
98
2
99
2 changes: 1 addition & 1 deletion vlib/v/gen/native/tests/native_test.v
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ fn test_native() {
continue
}
}
if test == 'fibonacci_native.vv' {
if test == 'fibonacci_native.vv' || test.contains('linux') {
if user_os == 'windows' {
println('>>> SKIPPING ${test} on windows for now')
continue
Expand Down
Loading