Skip to content

Throw a JavaScript Error instead of throwing a literal #1735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aqueenan opened this issue Aug 23, 2019 · 14 comments
Closed

Throw a JavaScript Error instead of throwing a literal #1735

aqueenan opened this issue Aug 23, 2019 · 14 comments

Comments

@aqueenan
Copy link

Motivation

It is considered good practice to only throw the Error object itself or an object using the Error object as base objects for user-defined exceptions as described at https://eslint.org/docs/rules/no-throw-literal.

Proposed Solution

Instead of throw x; use throw new Error(x); (except for undefined, null and Symbol as raised in #1734).

@Pauan
Copy link
Contributor

Pauan commented Aug 23, 2019

Where are you seeing this behavior? As far as I can see, wasm-bindgen does use new Error for all internal errors it throws, and also for throw_str.

@aqueenan
Copy link
Author

I'm using wrangler (the Cloudflare tool) based on their simple rust template.

  const { return_string, return_string_error, return_undefined, return_null } = wasm_bindgen
  await wasm_bindgen(wasm)

  console.log(`TESTING string`)

  try {
    const result = return_string()
    console.log(`Got result ${result}`)
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  try {
    const result = return_string_error()
    console.log(`Got result ${result}`)
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  try {
    throw 'Raw string'
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  try {
    throw new Error('Error string')
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  console.log()
  console.log(`TESTING undefined`)

  try {
    const result = return_undefined()
    console.log(`Got result ${result}`)
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  try {
    throw undefined
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  try {
    throw new Error(undefined)
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  console.log()
  console.log(`TESTING null`)

  try {
    const result = return_null()
    console.log(`Got result ${result}`)
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  try {
    throw null
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

  try {
    throw new Error(null)
  } catch (e) {
    console.log(`Got exception ${e}`)
  }

Shows the result:

TESTING string
Got result Hello, wasm-worker!
Got exception Hello, wasm-worker!
Got exception Raw string
Got exception Error: Error string
TESTING undefined
Got exception undefined
Got exception undefined
Got exception Error
TESTING null
Got exception null
Got exception null
Got exception Error: null

None of the results from wasm_bindgen are wrapped in Error.
rust-cr.zip

@Pauan
Copy link
Contributor

Pauan commented Aug 23, 2019

Okay, I took a look at your code:

#[wasm_bindgen]
pub fn return_string_error() -> Result<String, JsValue> {
    Err(JsValue::from_str("Hello, wasm-worker!"))
}

Naturally if you return a &str from Rust then you'll get a string in JS. If you want an Error you should do this instead:

use js_sys::Error;

#[wasm_bindgen]
pub fn return_string_error() -> Result<String, JsValue> {
    Err(Error::new("Hello, wasm-worker!").into())
}

As for undefined and null... you shouldn't be returning those as Err, since as you say that's not standard practice in JS.

Instead you should do something else, such as using Option, returning Ok(JsValue::undefined()), or replacing them with meaningful error messages.

@aqueenan
Copy link
Author

In JavaScript, the only thing that should be thrown is an Error(). It's bad practice to throw other types. So if I return Err(something) from rust, the JavaScript wrapper should throw new Error(something). Instead wasm-bindgen is throwing a naked String.

I agree that what's returned by Err() should be thrown in JavaScript (for value types, at least). It seems to me that wasm-bindgen should automatically wrap errors in an Error because it should always wrap them in an Error for value types.

On the other hand, I can see that the behaviour matches JavaScript in allowing value types to be thrown. What I'm asking for in this feature request, is for wasm-bindgen to make it difficult to throw value types, against good JavaScript practice, by instead wrapping them in Error().

As for undefined and null, Ok(JsValue::undefined()) seems good, except in my real code I need to return a value or undefined (and for argument's sake let's allow null), just like standard JavaScript API methods. For example:

  let resultValues = {
    stringValue: 'The value',
    nullValue: null
  }
  console.log(`stringValue: ${typeof (resultValues['stringValue'])}`)
  console.log(`nullValue: ${typeof (resultValues['nullValue'])}`)
  console.log(`undefinedValue: ${typeof (resultValues['undefinedValue'])}`)
stringValue: string
nullValue: object
undefinedValue: undefined

Typical behaviour in JavaScript is to return null and undefined directly (not by throwing them). To get that behaviour from rust, we need to use Result<String, JsValue>, and for wasm-bindgen to return them rather than throwing them. Again, we can write a JavaScript wrapper around the wasm-bindgen generated JavaScript wrapper, but that seems a bit redundant.

The theme of both of these issues is that they add extra impedance to the development process by forcing the developer to manually add boilerplate scaffolding.

@aqueenan
Copy link
Author

P.S. In my real code I'm returning Vec so can't return it through a JsValue.

@Pauan
Copy link
Contributor

Pauan commented Aug 23, 2019

So if I return Err(something) from rust, the JavaScript wrapper should throw new Error(something).

I don't think that's a good idea:

  1. Although it's bad practice, JavaScript does allow for throwing non-Error types, and wasm-bindgen tries to allow for 100% compatibility with JavaScript.

  2. It creates extra surprising behavior beyond what JavaScript itself does.

  3. It requires runtime type checking, which has a performance cost.

  4. (This is the biggest reason) It creates terrible stack traces. The stack trace is based on the place where new Error is called. So if wasm-bindgen is calling new Error then the stack trace will be useless.

    Instead, it is best practice in JS to call new Error at the location where the error occurs, so that the stack traces are accurate. That is what my above code does: it calls Error::new at the place where the error occurs.

because it should always wrap them in an Error for value types.

new Error is intended for strings, not all values. And having Err sometimes throw and sometimes not based on the runtime type is very confusing.

wasm-bindgen to make it difficult to throw value types, against good JavaScript practice, by instead wrapping them in Error().

You can always return Result<T, Error> instead of Result<T, JsValue>, and that guarantees that you cannot "throw" value types.

Naturally if you use a type like JsValue then you are giving up all type safety, since it is fully dynamically typed.

except in my real code I need to return a value or undefined (and for argument's sake let's allow null), just like standard JavaScript API methods.

What is the issue with that? You can return an Option...

fn foo() -> Result<Option<String>, JsValue> {
    // ...
}

...or you can return a more meaningful error and then convert it at the boundaries if you need custom behavior:

enum MyError {
    SomeError,
    SomeOtherError(Error),
}

fn foo_() -> Result<String, MyError> {
    // ...
}

#[wasm_bindgen]
pub fn foo() -> Result<JsValue, JsValue> {
    match foo_() {
        Ok(s) => Ok(JsValue::from(s)),
        Err(MyError::SomeError) => Ok(JsValue::undefined()),
        Err(MyError::SomeOtherError(e)) => Err(e),
    }
}

...or you can return a JsValue and cast everything to it:

fn foo() -> Result<JsValue, JsValue> {
    if some_condition() {
        Ok(JsValue::from_str("foo"))

    } else {
        Ok(JsValue::undefined())
    }
}

Using Option is the most idiomatic approach, it matches how Rust itself behaves. Using meaningful errors is also good, rather than the dynamically typed JsValue. Using JsValue is least idiomatic, but most flexible.

@Pauan
Copy link
Contributor

Pauan commented Aug 23, 2019

P.S. In my real code I'm returning Vec so can't return it through a JsValue.

If it's not possible to convert something to JsValue, then that means it cannot be sent to JS, since JsValue represents every possible JS value.

For Vec, you have a few options:

@aqueenan
Copy link
Author

  1. It creates extra surprising behavior beyond what JavaScript itself does.

I think throwing value types instead of wrapping them in Error() is surprising behaviour, so I can't agree with that one. :-)

  1. (This is the biggest reason) It creates terrible stack traces. The stack trace is based on the place where new Error is called. So if wasm-bindgen is calling new Error then the stack trace will be useless.

That's a showstopper if it prevents the stack frame from being useful... but does an Error thrown from rustwasm actually work properly, cross-browser? It would depend on rustwasm implementing the same calling convention (cdecl/pascal/fastcall or whatever) as the JavaScript engine, or at least one that the JavaScript engine or Error implementation understood, and that could be engine-specific.

Thanks for your suggestion about returning both success and failure with Ok(), using Result<JsValue, JsValue>, and using js_sys::Uint8Array::from(). Combining them I get the following, which returns undefined or a Uint8Array, which is exactly the behaviour I want, and works seamlessly with JavaScript:

#[wasm_bindgen]
pub fn lookup(name: String) -> Result<JsValue, JsValue> {
    match binary_data_collection.get(&name) {
        Ok(bytes) => Ok(JsValue::from(js_sys::Uint8Array::from(bytes.as_ref()))),
        Err(_) => Ok(JsValue::undefined())
    }
}

My earlier code, which throws undefined or returns a Uint8Array was:

#[wasm_bindgen]
pub fn lookup(name: String) -> Result<Vec<u8>, JsValue> {
    match binary_data_collection.get(&name) {
        Ok(bytes) => Ok(bytes.to_vec()),
        Err(_) => Err(JsValue::undefined())
    }
}

I'd really prefer to return a Vec<T> but as you've mentioned that isn't yet implemented (#111).

This might be a general rust question, but when returning the binary data, is there any way to tell which operations result in a clone via memcpy and which simply reference the original memory block? If not, at what point does wasm-bindgen copy the data?

Similarly, if I return using a view:

unsafe { Ok(JsValue::from(js_sys::Uint8Array::view(&bytes))) }

does the original memory address get returned out of the wasm-bindgen generate JavaScript wrapper? (If that's the case I understand it would be more efficient, but would blow up when the wasm memory block is reallocated.)

@Pauan
Copy link
Contributor

Pauan commented Aug 24, 2019

I think throwing value types instead of wrapping them in Error() is surprising behaviour, so I can't agree with that one. :-)

JavaScript does not do auto-wrapping, if you do throw "foo" it will throw a string (similarly with throw undefined or throw null). You have to manually create a new Error object, just like with wasm-bindgen.

Why do you find it surprising that wasm-bindgen doesn't auto-wrap value types, but you don't find the same behavior in JavaScript surprising?

but does an Error thrown from rustwasm actually work properly, cross-browser? It would depend on rustwasm implementing the same calling convention (cdecl/pascal/fastcall or whatever) as the JavaScript engine, or at least one that the JavaScript engine or Error implementation understood, and that could be engine-specific.

There is work being done on implementing source maps and DWARF (or similar) for wasm, which will give full stack traces, breakpoints, etc.

There is no need for a specific calling convention or anything like that. It works on the same principles as JavaScript source maps, so it can work for any language compiled to wasm.

Right now you can use source maps to compile any language (e.g. Clojure) to JavaScript, and you will get correct stack traces, breakpoints, you can debug the original Clojure source code in the browser, etc. The same will be true for wasm.

Combining them I get the following, which returns undefined or a Uint8Array, which is exactly the behaviour I want, and works seamlessly with JavaScript

Stylistically, I would recommend doing this instead:

#[wasm_bindgen]
pub fn lookup(name: &str) -> Result<Option<Uint8Array>, JsValue> {
    match binary_data_collection.get(name) {
        Ok(bytes) => Ok(Some(Uint8Array::from(bytes.as_ref()))),
        Err(_) => Ok(None),
    }
}

And since you're not actually throwing any errors, you can simplify it even further:

#[wasm_bindgen]
pub fn lookup(name: &str) -> Option<Uint8Array> {
    let bytes = binary_data_collection.get(name).ok()?;
    Some(Uint8Array::from(bytes.as_ref()))
}

I'd really prefer to return a Vec<T>

Why is that? Using Uint8Array will almost certainly be much faster, since it doesn't need to do any allocations, and it can do a single fast memcpy, whereas a Vec would need to malloc memory in Rust, then do a much slower O(n) copy in JavaScript, and then free the memory for the Vec.

when returning the binary data, is there any way to tell which operations result in a clone via memcpy and which simply reference the original memory block?

There are two types of data: Rust data types and JS data types. JS data types never make a copy, they are sent as a u32 pointer reference, so they're very cheap.

The data types created by wasm-bindgen are JS data types, including JsValue, JsString, Array, Uint8Array, all of the js_sys types, all of the web_sys types, etc.

On the other hand, Rust data types are different, those require a full copy when sending to/from JS.

So when you send a Vec, String, &str, &[u8], etc., it has to make a copy. And similarly when receiving a Vec, String, &str, etc. from JS.

So when you convert a &[u8] using Uint8Array::from, it has to make an O(n) copy of the bytes, but this is fast because it's a Uint8Array so it can just do a memcpy directly from Rust's memory.

After you've created the Uint8Array, you can then cheaply send it to/from JS, since it's now a JS data type.

The exception is things like Uint8Array::view, which do not make a copy, they just reference the raw Rust memory, so they're super cheap. However, that's not memory safe (which is why it's an unsafe function), for two reasons:

  1. If the wasm memory is grown then it will invalidate the Uint8Array.

  2. It is referencing Rust memory which is on the stack, so if JS tries to use the Uint8Array after the stack frame is gone, then that would be a use-after-free.

@aqueenan
Copy link
Author

Why do you find it surprising that wasm-bindgen doesn't auto-wrap value types, but you don't find the same behavior in JavaScript surprising?

Surprising is too nice a word for the behaviour in JavaScript. :-D

When I'm implementing a wrapper I aim for it to feel native and encourage the user to fall forward to good practice (such as throwing an exception wrapped in Error).

So, it's more about expecting modern wrappers to make it easiest to follow current JavaScript good practice instead of making it easiest to follow what is now discouraged behaviour.

#[wasm_bindgen]
pub fn lookup(name: &str) -> Option<Uint8Array> {
    let bytes = binary_data_collection.get(name).ok()?;
    Some(Uint8Array::from(bytes.as_ref()))
}

That is nice. Thanks!

I'd really prefer to return a Vec<T>

Why is that? Using Uint8Array will almost certainly be much faster, since it doesn't need to do any allocations, and it can do a single fast memcpy, whereas a Vec would need to malloc memory in Rust, then do a much slower O(n) copy in JavaScript, and then free the memory for the Vec.

LOL because I've used binary wrappers in the past that could return something like Vec<T> efficiently, and hadn't considered that a JavaScript wrapper couldn't do that.

Thanks for pointing it out.

So, for performance (in cases like mine where the Vec won't change), I'd be better flattening T and using a few functions, each returning a supported array type, rather than a single function returning a Vec<T>.

when returning the binary data, is there any way to tell which operations result in a clone via memcpy and which simply reference the original memory block?

There are two types of data: Rust data types and JS data types. JS data types never make a copy, they are sent as a u32 pointer reference, so they're very cheap.

The data types created by wasm-bindgen are JS data types, including JsValue, JsString, Array, Uint8Array, all of the js_sys types, all of the web_sys types, etc.

On the other hand, Rust data types are different, those require a full copy when sending to/from JS.

So when you send a Vec, String, &str, &[u8], etc., it has to make a copy. And similarly when receiving a Vec, String, &str, etc. from JS.

So when you convert a &[u8] using Uint8Array::from, it has to make an O(n) copy of the bytes, but this is fast because it's a Uint8Array so it can just do a memcpy directly from Rust's memory.

After you've created the Uint8Array, you can then cheaply send it to/from JS, since it's now a JS data type.

Thanks. That's very useful information.

The exception is things like Uint8Array::view, which do not make a copy, they just reference the raw Rust memory, so they're super cheap. However, that's not memory safe (which is why it's an unsafe function), for two reasons:

  1. If the wasm memory is grown then it will invalidate the Uint8Array.
  2. It is referencing Rust memory which is on the stack, so if JS tries to use the Uint8Array after the stack frame is gone, then that would be a use-after-free.

Point 2 surprises me.

I thought the function returns an ofset into the the wasm memory containing the ofset of the return value which contains the start and length of the memory block. From what you've said I assume that it's on the stack.

Wouldn't the JavaScript side copy the start/length and convert it to a slice into the wasm memory? That wouldn't be affected by the stack frame going away.

@Pauan
Copy link
Contributor

Pauan commented Aug 24, 2019

So, it's more about expecting modern wrappers to make it easiest to follow current JavaScript good practice instead of making it easiest to follow what is now discouraged behaviour.

wasm-bindgen isn't a high level wrapper, it's designed to be very lightweight and low-level.

The idea is that high level wrappers are provided by other crates, like gloo.

As for encouraging good behavior, I don't think auto-wrapping is the right way to address that. Instead, I have a better idea:

Right now wasm-bindgen APIs return Result<T, JsValue>, which is technically correct but not ideal. Instead, I think they should be changed to return Result<T, Error>.

It would also be possible to change the main wrapper so that it requires Result<(), Error>. This makes it impossible to accidentally throw a value type (like a string).

This is using the static type system to force good behavior. This follows Rust's philosophy much better than allowing bad behavior and trying to fix it after the fact (which is your idea).

Wouldn't the JavaScript side copy the start/length and convert it to a slice into the wasm memory?

Wasm has a single large ArrayBuffer which it uses for all of its memory. In the case of Rust, it uses this ArrayBuffer for both the stack and the heap.

When you use a view function like Uint8Array::view, it takes the &[u8] and sends the pointer and length to JS, and JS just does this:

new Uint8Array(wasm.module.buffer, ptr, len)

The Uint8Array is a live view into Wasm's memory, so when Rust/Wasm changes the memory, the Uint8Array will also change.

This is what makes the view functions so cheap: they don't copy anything, they just point directly into Wasm's memory. In this case you can think of the Uint8Array as being the same as a &mut [u8] slice in Rust: it just contains an offset/length into existing memory.

But the ptr and len don't change, they'll always refer to those specific indexes in the Wasm memory.

So if the &[u8] (which is referred to with ptr and len) goes out of scope and is dropped, then the memory from ptr to ptr + len will be overwritten and used for something else, so the Uint8Array is now pointing to the wrong data.

To prevent that, in JavaScript you must copy the bytes before the &[u8] is dropped in Rust (this is what Uint8Array::from does). But Rust cannot force you to do that copying, so that's why the view function is unsafe: it's up to you to handle the memory correctly.

@aqueenan
Copy link
Author

As for encouraging good behavior, I don't think auto-wrapping is the right way to address that. Instead, I have a better idea:

Right now wasm-bindgen APIs return Result<T, JsValue>, which is technically correct but not ideal. Instead, I think they should be changed to return Result<T, Error>.

It would also be possible to change the main wrapper so that it requires Result<(), Error>. This makes it impossible to accidentally throw a value type (like a string).

This is using the static type system to force good behavior. This follows Rust's philosophy much better than allowing bad behavior and trying to fix it after the fact (which is your idea).

Sounds like a good solution.

Would you prefer to create a new issue for it, or take this one over?

So if the &[u8] (which is referred to with ptr and len) goes out of scope and is dropped, then the memory from ptr to ptr + len will be overwritten and used for something else, so the Uint8Array is now pointing to the wrong data.

To prevent that, in JavaScript you must copy the bytes before the &[u8] is dropped in Rust (this is what Uint8Array::from does). But Rust cannot force you to do that copying, so that's why the view function is unsafe: it's up to you to handle the memory correctly.

That would apply if the &[u8] is owned by an object on the stack, but not when it's owned by something on the heap.

@alexcrichton
Copy link
Contributor

Ok reading this over, sounds like this has largely been handled!

@aqueenan mind opening follow-up issues for any other action items?

@Pauan
Copy link
Contributor

Pauan commented Aug 28, 2019

New thread created: #1742

That would apply if the &[u8] is owned by an object on the stack, but not when it's owned by something on the heap.

I don't think that's quite true:

{
    let x = Box::new(vec![0, 1, 2]);
    unsafe { Uint8Array::view(&*x) }
}

In this case the &[u8] is heap allocated, but it's still unsafe to use, because the x variable will be dropped at the end of the scope, so accessing the Uint8Array is a use-after-free.

Also, the Uint8Array::view function accepts a &[u8], and the &[u8] slice is always stack allocated, and it's the lifetime of the &[u8] that matters for memory safety.

So I think it's best to think of it in terms of lifetimes, since that's how the Rust compiler/optimizer/type system itself does things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants