Community Batch Import Request: arabooks

I have a repository of +12K Arabic books metadata: https://github.com/avidseeker/arabooks

If there is a bot to mass-upload them that would be a great addition to OpenLibrary. It currently lacks a lot of Arabic books coverage. (There isn't even an Arabic translation for OpenLibrary: https://github.com/internetarchive/openlibrary/pull/9673)

Thanks in advance.

---
Edit: 
To complete this issue one would need to parse the TSV files found at https://github.com/avidseeker/arabooks and create JSONL files that look similar to this:

```json
{"identifiers": {"open_textbook_library": ["1581"]}, "source_records": ["open_textbook_library:1581"], "title": "Legal Fundamentals of Healthcare Law", "languages": ["eng"], "subjects": ["Medicine", "Law"], "publishers": ["University of West Florida Pressbooks"], "publish_date": "2024", "authors": [{"name": "Tiffany Jackman"}], "lc_classifications": ["RA440", "KF385.A4"]}
{"identifiers": {"open_textbook_library": ["1580"]}, "source_records": ["open_textbook_library:1580"], "title": "Introduction to Literature: Fairy Tales, Folk Tales, and How They Shape Us", "languages": ["eng"], "subjects": ["Humanities", "Literature, Rhetoric, and Poetry"], "publishers": ["University of West Florida Pressbooks"], "publish_date": "2023", "authors": [{"name": "Judy Young"}], "lc_classifications": ["PE1408"]}
```

The minimum required fields are: `title`, `authors`, `publish_date`, `source_records`, and `publish_date`. The part of the `source_records` could come from the name of the source, and an identifier, such as for [loal-en.tsv](https://github.com/avidseeker/arabooks/blob/main/loal-en.tsv) from the Library of Arabic Literature, it might be `"source_records": ["loal:9781479834129"]` for the first item in the list.

Here are the publishers of the TSV files:

1. awu-dam.tsv: Arabi Writers Union
2. lisanarb.tsv: contains a `pub` entry
3. loal-en.tsv and loal-ar.tsv: Library of Arabic Literature
4. shamela.tsv: contains a `publisher` entry. Dates need to be merged from
   shamela-dates.tsv matching same title entry.
5. waqfeya.tsv: set as `"publishers": ["????"]`, since publishers need to be
   known on one by one basis.

Specifically, the values taken from the TSV and converted into JSONL would need to follow [this schema](https://github.com/internetarchive/openlibrary-client/blob/master/olclient/schemata/import.schema.json). A script to do this for one line would look similar to [this](https://github.com/internetarchive/openlibrary/blob/153cd28a604a1152f3a1755eec346dd4c35ac728/scripts/import_open_textbook_library.py#L30-L112), but would probably use Python's `csv` module to read the TSV file, and then call `json.dumps(line)` on each line, after the data is in format specified in the import schema, and then it would be written to a JSONL file.

The output JSONL file could be tested using the endpoint from #8122, though you'd probably want to test with only a few records at a time rather than the whole file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Community Batch Import Request: arabooks #9726

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Community Batch Import Request: arabooks #9726

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions