Skip to content

[PRE REVIEW]: Inscriptis - A Python-based HTML to text conversion library optimized for knowledge extraction from the Web #3487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
whedon opened this issue Jul 12, 2021 · 27 comments

Comments

@whedon
Copy link

whedon commented Jul 12, 2021

Submitting author: @AlbertWeichselbraun (Albert Weichselbraun)
Repository: https://github.com/weblyzard/inscriptis/
Version: v2.0.0
Editor: @sbenthall
Reviewers: @reality, @rlskoeser
Managing EiC: Kristen Thyng

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/6039d24c1ea4541fd544dfc398dcb5ca"><img src="https://joss.theoj.org/papers/6039d24c1ea4541fd544dfc398dcb5ca/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/6039d24c1ea4541fd544dfc398dcb5ca/status.svg)](https://joss.theoj.org/papers/6039d24c1ea4541fd544dfc398dcb5ca)

Author instructions

Thanks for submitting your paper to JOSS @AlbertWeichselbraun. Currently, there isn't an JOSS editor assigned to your paper.

@AlbertWeichselbraun if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
@whedon
Copy link
Author

whedon commented Jul 12, 2021

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Jul 12, 2021

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.08 s (1606.0 files/s, 111098.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            44            283             79           2602
Python                          50            828           1010           2035
JSON                            13              2              0            251
TeX                              1             38              0            246
reStructuredText                 5            168            143            240
Markdown                         2             88              0            234
YAML                             3             15             32             76
INI                              1              7              0             55
Bourne Shell                     1              8             11             17
make                             2              4              6             16
Dockerfile                       1              3              2             10
-------------------------------------------------------------------------------
SUM:                           123           1444           1283           5782
-------------------------------------------------------------------------------


Statistical information for the repository '4052ecee83af3d31b1884119' was
gathered on 2021/07/12.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Albert Weichselbraun           240         12158           6538           98.13
Fabian Odoni                     1             6              4            0.05
Max Goebel                       1             1              1            0.01
fabian                           2            18              6            0.13
k3njiy                           4           143            151            1.54
max                              2            15             12            0.14

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Albert Weichselbraun       3845           31.6          8.6               13.99
Fabian Odoni                  1           16.7         51.3                0.00
fabian                       15           83.3         38.9               26.67
k3njiy                       11            7.7          0.0                9.09
max                           1            6.7         31.5                0.00

@whedon
Copy link
Author

whedon commented Jul 12, 2021

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1007/s11042-019-08328-z is OK
- 10.3115/v1/D14-1162 is OK
- 10.1109/JSYST.2015.2466439 is OK
- 10.1016/j.ins.2014.03.096 is OK
- 10.3390/fi13030059 is OK
- 10.1145/3430937 is OK
- 10.1080/14740338.2018.1531847 is OK
- 10.1080/00437956.1954.11659520 is OK

MISSING DOIs

- 10.1109/hicss.2016.133 may be a valid DOI for title: Extracting Opinion Targets from Environmental Web Coverage and Social Media Streams

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented Jul 12, 2021

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@kthyng
Copy link

kthyng commented Jul 12, 2021

@sbenthall Would you be able to take this submission on?

@kthyng
Copy link

kthyng commented Jul 12, 2021

@whedon invite @sbenthall as editor

@whedon
Copy link
Author

whedon commented Jul 12, 2021

@sbenthall has been invited to edit this submission.

@AlbertWeichselbraun
Copy link

Dear all,

Michael Reiss ([email protected]) who is a research and teaching associate at the University of Zurich's Department of Communication and Media Research (IKMZ) might be a good fit.
(he was the first person to request information on how to cite Inscriptis in his work).

From the JOSS reviewer list, reviewers with good Python skills and an interest in Computational Analytics, Natural Language Processing and/or Social Sciences should be a good choice as well. I have located four additional potential reviewers based on these criteria and their position in the reviewer list (preferring reviewers at the bottom of the list):

Cheers,
Albert :)

@sbenthall
Copy link

@whedon assign me as editor

@whedon
Copy link
Author

whedon commented Jul 15, 2021

OK, the editor is @sbenthall

@sbenthall
Copy link

@ajoer @alexhanna @mbod would you be able to review this package?

@alexhanna
Copy link

Don't quite have the cycles for this now, thanks though.

@sbenthall
Copy link

@reality would you be able to review this package?

@sbenthall
Copy link

@nmstreethran would you be able to review this package?

@debuos512
Copy link

@sbenthall Sorry for the delayed response. Yes, I would be happy to review this package.

@nmstreethran
Copy link

@sbenthall Sorry, I am unable to review at the moment.

@sbenthall
Copy link

Thank you @reality I'll assign you as a reviewer.

@sbenthall
Copy link

@whedon assign @reality as reviewer

@whedon whedon assigned debuos512 and sbenthall and unassigned sbenthall Aug 2, 2021
@whedon
Copy link
Author

whedon commented Aug 2, 2021

OK, @reality is now a reviewer

@sbenthall
Copy link

@rlskoeser would you be able to review this package?

@whedon
Copy link
Author

whedon commented Aug 2, 2021

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@rlskoeser
Copy link

@sbenthall yes, I'd be glad to review this package.

@sbenthall
Copy link

Thank you, @rlskoeser

@sbenthall
Copy link

@whedon add @rlskoeser as reviewer

@whedon
Copy link
Author

whedon commented Aug 2, 2021

OK, @rlskoeser is now a reviewer

@sbenthall
Copy link

@whedon start review

@whedon
Copy link
Author

whedon commented Aug 2, 2021

OK, I've started the review over in #3557.

@whedon whedon closed this as completed Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants