Skip to content

File Recognition - Add support for files without extensions #8744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 22, 2022

Conversation

abujeda
Copy link
Contributor

@abujeda abujeda commented May 26, 2022

What this PR does / why we need it:
Use the full filename to determine the file type when the file do not have an extension.

Which issue(s) this PR closes:

Special notes for your reviewer:
Supported files in this PR are Makefile and Snakemake

Suggestions on how to test this:
Upload a file without extension as the ones suggested above.
Files with extension work as before.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No

Is there a release notes update needed for this change?:
No

Additional documentation:
No

@coveralls
Copy link

coveralls commented May 26, 2022

Coverage Status

Coverage increased (+0.006%) to 19.77% when pulling 017137d on adaybujeda:8740-file-recognition-based-on-filename into 567e506 on IQSS:develop.

@abujeda abujeda force-pushed the 8740-file-recognition-based-on-filename branch 3 times, most recently from 9f7ddc5 to 6afd466 Compare June 2, 2022 07:52
@abujeda abujeda force-pushed the 8740-file-recognition-based-on-filename branch 2 times, most recently from 497bdf7 to 575e3b5 Compare June 10, 2022 10:31
@pdurbin pdurbin self-assigned this Jul 15, 2022
@pdurbin
Copy link
Member

pdurbin commented Jul 15, 2022

@adaybujeda can you please resolve the merge conflicts in this branch?

@abujeda abujeda force-pushed the 8740-file-recognition-based-on-filename branch from 31f8ed0 to 734f467 Compare July 16, 2022 17:14
@abujeda
Copy link
Contributor Author

abujeda commented Jul 16, 2022

@pdurbin conflicts resolved - ready to go!

@pdurbin
Copy link
Member

pdurbin commented Jul 18, 2022

@adaybujeda thanks. A couple things.

FilesIT.testAddTinyFile is failing. Here's a screenshot from Jenkins:

Screen Shot 2022-07-18 at 2 54 35 PM

Is this expected? Dataverse can no longer figure out that a text file called "1char" is of type text/plain? Should we get this working again?

Less important (no need for you to fix this) but worth mentioning, in Netbeans I'm having a strange problem where I can't use "Test File" or "Run Focused Test Method" in FileUtilTest.java. However, this works fine in Intellij. This isn't specific to your changes at all. It seems like this is happening because the tests are within subclasses.

Screen Shot 2022-07-18 at 2 48 52 PM

@abujeda
Copy link
Contributor Author

abujeda commented Jul 19, 2022

Looking into the FilesIT.testAddTinyFile error...

@abujeda abujeda force-pushed the 8740-file-recognition-based-on-filename branch from 5cef645 to 5d1ae86 Compare July 19, 2022 08:28
@abujeda
Copy link
Contributor Author

abujeda commented Jul 19, 2022

The problem was that the type recognized by JHove was being overriden by MIME_TYPE_MAP.getContentType(fileName); when the file has no extension. This was not happening before.

I updated to only override the file type when Dataverse has a content type for the file name.

The failing test is now working locally.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks, @adaybujeda! I did make some small tweaks to the docs. Off to QA.

@pdurbin pdurbin removed their assignment Jul 20, 2022
@abujeda
Copy link
Contributor Author

abujeda commented Jul 20, 2022

Thanks @pdurbin!! 🚀

@pdurbin
Copy link
Member

pdurbin commented Jul 20, 2022

@kcondon as of 017137d the "Maven Unit Tests" job is marked as failed but it looks like it's because the Coveralls API was down.

The Jenkins tests ran fine (which included unit tests), including the API tests: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-8744/12/testReport/

@kcondon kcondon self-assigned this Jul 22, 2022
@kcondon kcondon merged commit b1809b1 into IQSS:develop Jul 22, 2022
@pdurbin pdurbin added this to the 5.12 milestone Jul 25, 2022
@scolapasta scolapasta added HDC: 2 Harvard Data Commons Obj. 2 HDC Harvard Data Commons labels Aug 1, 2022
@mreekie mreekie added the NIH OTA: 1.3.1 3 | 1.3.1 | Support software metadata | 5 prdOwnThis is an item synched from the product planning... label Dec 15, 2022
@mreekie mreekie added pm.GREI-d-1.3.1 NIH, yr1, aim3, task1: Support software metadata pm.GREI-d-1.3.2 NIH, yr1, aim3, task2: R & D phase biomedical workflows support labels Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HDC Harvard Data Commons HDC: 2 Harvard Data Commons Obj. 2 NIH OTA: 1.3.1 3 | 1.3.1 | Support software metadata | 5 prdOwnThis is an item synched from the product planning... pm.GREI-d-1.3.1 NIH, yr1, aim3, task1: Support software metadata pm.GREI-d-1.3.2 NIH, yr1, aim3, task2: R & D phase biomedical workflows support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request/Idea: File type recognition based on filename
6 participants