Skip to content

[Java] Add accessors to get type parameters from vector classes #427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asfimport opened this issue Aug 16, 2017 · 9 comments
Open

[Java] Add accessors to get type parameters from vector classes #427

asfimport opened this issue Aug 16, 2017 · 9 comments
Labels
Type: enhancement New feature or request

Comments

@asfimport
Copy link

Vector classes contain private copies of each param in the ArrowType, but does not have any public api to access them. So if given a vector you would have to get the Field from the and cast to the correct type. For example, with a TimeStampMicroTZVector and trying to get the timezone:


if field.getType.isInstanceOf[ArrowType.Timestamp] &&
          field.getType.asInstanceOf[ArrowType.Timestamp].getTimezone

It would be more convenient to have direct accessors for these type params for the vector types that have parameters:

  • DecimalVector
  • FixedSizeBinaryVector
  • ListVector
  • TimeStamps with timezones
  • FixedSizedListVector
  • Unions

Reporter: Bryan Cutler / @BryanCutler

PRs and other links:

Note: This issue was originally created as ARROW-1361. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Jacques Nadeau / @jacques-n:
I'm not sure your particular use case....

In general, I feel like the arrow pojo/field APIs are very inadequate. We built something up at Dremio to try to improve the behavior/consolidate typical handling tasks. We've discussed how much of it would be generally useful but have yet to come up with a solution. Would love your thoughts on whether any portion of what we did would be generally helpful. Our work against Arrow types: https://github.com/dremio/dremio-oss/blob/master/sabot/logical/src/main/java/com/dremio/common/expression/CompleteType.java

@asfimport
Copy link
Author

Bryan Cutler / @BryanCutler:
Thanks for sharing this @jacques-n! If I understand correctly, I would need a visitor to be able to make a call like public String getTimezone() from the given ArrowType? That might be too heavyweight for my purposes, but I can see CompleteType being useful in some other cases.

My use case is in Spark when constructing type specific writers for a given ValueVector. When the vector is a NullableTimeStampMicroTZVector, I just want to check the time zone that is set. Here is a code sample of how I currently do it (pardon the Scala):

private def createFieldWriter(vector: ValueVector): ArrowFieldWriter = {
  vector match {
    ...
    case vector: NullableTimeStampMicroTZVector =>

      val field = vector.getField()
      val timeZone = field.getType.asInstanceOf[ArrowType.Timestamp].getTimezone
      // do something with timeZone

      new TimestampWriter(vector)
    ...

Since the vector has already been casted, it would be more convenient to just access the timezone from there instead of having to also cast the type. Then it would simply to this

private def createFieldWriter(vector: ValueVector): ArrowFieldWriter = {
  vector match {
    ...
    case vector: NullableTimeStampMicroTZVector =>

      val timeZone = vector.getTimezone()
      // do something with timeZone

      new TimestampWriter(vector)
    ...

@asfimport
Copy link
Author

@asfimport
Copy link
Author

Jacques Nadeau / @jacques-n:
Random idea... what do you think about the idea of making Field be a generic type that allows getType() to return a specific type? For vectors, getField() would return a specific type such as ArrowType.Timestamp for NullableTimeStampMicroTZVector(). This allows us a generic interface that just gets more specific as you drill inward.

@asfimport
Copy link
Author

Bryan Cutler / @BryanCutler:
@jacques-n, could you please elaborate a little on what you mean by making Field be a generic type? I'm ok with any alternatives to this, but I think you should be able to get type params from a concrete vector class without having to do a cast or stringing a bunch of calls together.

@asfimport
Copy link
Author

Jacques Nadeau / @jacques-n:
I was suggesting something along:

  • we make Field into Field.
  • Field's getType() would change to a return type of T
  • ValueVector would return Field<? extends ArrowType>
  • Each individual vector would return a specific field generic type.

For example,

class NullableTimeStampMicroTZVector {
Field getField(){..}
}

Given declaration
NullableTimeStampMicroTZVector t = <>;

Then
TimeUnit.MICROSECOND == t.getField().getType().getUnit() would compile without any special casting and return true.

I find this a much easier thing to code to (especially if using code generation) as opposed to having specialized method names for each type.

I haven't thought through all the ramifications of this approach but was throwing it out there.

@asfimport
Copy link
Author

Bryan Cutler / @BryanCutler:
Thanks for clarifying @jacques-n, I am fine with that as an API, but I think that changing the definition of Field like that would impact a lot of existing code right?

What do you think about adding a method getType() in each vector to return the specific type instance? For example, NullableTimeStampMicroTZVector.getType() would return ArrowType.Timestamp. It would still do a cast internally, but still pretty simple.

@asfimport
Copy link
Author

Jacques Nadeau / @jacques-n:
On the Field change, I haven't spent that much time looking at it. Definitely could be disruptive so I'm not going that route.

I like the getType() method and prefer that over many differently named methods.

@asfimport
Copy link
Author

Bryan Cutler / @BryanCutler:
Ok, that is fine with me but I'll wait for ARROW-1463 to see how we might restructure.

@assignUser assignUser transferred this issue from apache/arrow Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant