Skip to content

Documentation for SparkNLP Readers and Partition class #14581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

paulamib123
Copy link
Contributor

@paulamib123 paulamib123 commented May 19, 2025

Description

This PR adds documentation and examples for the Partition class, PartitionTransformer and various document readers in spark-nlp.

Motivation and Context

Helps users understand how to use Partition and Readers to read different types of Documents.

How Has This Been Tested?

Verified Scala docs on local

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@paulamib123 paulamib123 added documentation DON'T MERGE Do not merge this PR labels May 19, 2025
@paulamib123 paulamib123 removed the DON'T MERGE Do not merge this PR label May 20, 2025
@paulamib123 paulamib123 requested a review from DevinTDHa May 20, 2025 05:04
@paulamib123 paulamib123 added the DON'T MERGE Do not merge this PR label May 20, 2025
Copy link
Member

@DevinTDHa DevinTDHa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on a minor link thing affecting

src/main/scala/com/johnsnowlabs/reader/ExcelReader.scala
src/main/scala/com/johnsnowlabs/reader/HTMLReader.scala
src/main/scala/com/johnsnowlabs/reader/PowerPointReader.scala
src/main/scala/com/johnsnowlabs/reader/TextReader.scala
src/main/scala/com/johnsnowlabs/reader/WordReader.scala
src/main/scala/com/johnsnowlabs/partition/Partition.scala

Other than that, looks good to me. Thanks!

@danilojsl if this looks good to you, we can merge it into your branch!

@danilojsl danilojsl merged commit 37e6e85 into feature/SPARKNLP-1174-Adding-PartitionTransformer May 26, 2025
@danilojsl
Copy link
Contributor

I commented on a minor link thing affecting

src/main/scala/com/johnsnowlabs/reader/ExcelReader.scala src/main/scala/com/johnsnowlabs/reader/HTMLReader.scala src/main/scala/com/johnsnowlabs/reader/PowerPointReader.scala src/main/scala/com/johnsnowlabs/reader/TextReader.scala src/main/scala/com/johnsnowlabs/reader/WordReader.scala src/main/scala/com/johnsnowlabs/partition/Partition.scala

Other than that, looks good to me. Thanks!

@danilojsl if this looks good to you, we can merge it into your branch!

LGFM I merged it

DevinTDHa added a commit that referenced this pull request May 26, 2025
* Update conda meta.yaml for 6.0.1 [skip test]

* added documentation to file readers

* updated docs for partition class in scala and python

* fixed typos in sparkNLPReader and added documentation for Partition class

* added parameters to Partition class and updated read function docs in Readers

* updated readers documentation with ipynb path

* updated partition description

* fixed errors in email readers

* fixed errors in email readers

* added docs for partition transformer and pdf reader

* added docs for python partition transformer and pdf reader

* added docs for python partition transformer and pdf reader

* updated docs to render partition and reader

* reverted changes in init.py

* reverted changes in imports

* updated formatting docs for pdf reader

* updated formatting of docs for spark nlp reader

* updated formatting of docs for partition

* updated formatting of docs for partition_transformer

* updating links to notebooks and partition transformer description

---------

Co-authored-by: Devin Ha <[email protected]>
Co-authored-by: Paulami Bhattacharya <[email protected]>
@DevinTDHa DevinTDHa mentioned this pull request May 26, 2025
10 tasks
@paulamib123 paulamib123 removed the DON'T MERGE Do not merge this PR label May 26, 2025
DevinTDHa added a commit that referenced this pull request May 28, 2025
* Update conda meta.yaml for 6.0.1 [skip test]

* added documentation to file readers

* updated docs for partition class in scala and python

* fixed typos in sparkNLPReader and added documentation for Partition class

* added parameters to Partition class and updated read function docs in Readers

* updated readers documentation with ipynb path

* updated partition description

* fixed errors in email readers

* fixed errors in email readers

* added docs for partition transformer and pdf reader

* added docs for python partition transformer and pdf reader

* added docs for python partition transformer and pdf reader

* updated docs to render partition and reader

* reverted changes in init.py

* reverted changes in imports

* updated formatting docs for pdf reader

* updated formatting of docs for spark nlp reader

* updated formatting of docs for partition

* updated formatting of docs for partition_transformer

* updating links to notebooks and partition transformer description

---------

Co-authored-by: Devin Ha <[email protected]>
Co-authored-by: Paulami Bhattacharya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants