Skip to content

Prepare a survey (or GitHub Discussion) about data sources #408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zaleslaw opened this issue Jun 19, 2023 · 12 comments
Open

Prepare a survey (or GitHub Discussion) about data sources #408

zaleslaw opened this issue Jun 19, 2023 · 12 comments
Assignees
Labels
research This requires a deeper dive to gather a better understanding
Milestone

Comments

@zaleslaw
Copy link
Collaborator

zaleslaw commented Jun 19, 2023

The draft list of data sources:

  1. SQL Databases based on JDBC
  2. XML
  3. Protobuf
  4. Parquet
  5. ORC
  6. SparkSQL
  7. different files on the FileSystem
  8. NoSQL databases (MongoDB, Cassandra, Ignite)
  9. Queues (Kafka)
  10. Amazon (S3)
  11. Arrow IPC (Feather v2)
  12. Apache Avro
@zaleslaw zaleslaw added the research This requires a deeper dive to gather a better understanding label Jun 19, 2023
@zaleslaw zaleslaw self-assigned this Jun 19, 2023
@zaleslaw zaleslaw added this to the 0.12.0 milestone Jun 19, 2023
@zaleslaw zaleslaw changed the title Prepare a survey (or GitHub Discussion about data sources) Prepare a survey (or GitHub Discussion) about data sources Jun 19, 2023
@Jolanrensen
Copy link
Collaborator

Probably move this to a discussion so people can upvote and leave others :)

@zaleslaw
Copy link
Collaborator Author

@Jolanrensen sorry, I want to have a Google Form. Add there some different questions. It's better for analysis.

@Jolanrensen
Copy link
Collaborator

sure, but it might also be nice for the community to see which types of databases other people are interested in

@zaleslaw
Copy link
Collaborator Author

Nice to prepare the notebooks with the results:)

@zaleslaw
Copy link
Collaborator Author

@Jolanrensen will you share something?

@Jolanrensen
Copy link
Collaborator

Maybe we should add Exposed to the list as a data source. It was suggested here first and seems to cover several DB types

@Jolanrensen
Copy link
Collaborator

Also, for people wanting to do heavy operations with lots of large columns, we might want to provide interop with Multik as well

@koperagen
Copy link
Collaborator

Maybe something like Google Sheets

@Jolanrensen
Copy link
Collaborator

Maybe something like Google Sheets

Like integration with their API? Could be easy, since we already have Excel support.

@koperagen
Copy link
Collaborator

koperagen commented Jun 27, 2023

Maybe something like Google Sheets

Like integration with their API? Could be easy, since we already have Excel support.

Yes, i think it might be a good step for building data processing pipelines. For example, read some data, transform with dataframe, write to a Google sheet. Or have a Google Sheet edited by a human and run dataframe processing on it when needed.
Since we have Excel support, if this integrations proves to bring too little value, we can also consider to only have a tutorial.
I mostly want to add it not because it's impossible to do now, but to bring attention to possible applications of our library

@Jolanrensen
Copy link
Collaborator

XML would also probably need OpenAPI support, similar to JSON

@belovrv
Copy link
Collaborator

belovrv commented Jul 1, 2023

I would also add yaml in the list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research This requires a deeper dive to gather a better understanding
Projects
None yet
Development

No branches or pull requests

4 participants