No Deal: Investigating the Influence of Restricted Access to Elsevier Journals on German Researchers’ Publishing and Citing Behaviours
This repository contains the underlying code and data for the study:
Fraser, N., Hobert, A., Jahn, N., Mayr, P., and Peters, I. (2021). No Deal: Investigating the Influence of Restricted Access to Elsevier Journals on German Researchers’ Publishing and Citing Behaviours. arXiv:2105.12078 [cs]. https://arxiv.org/abs/2105.12078
The code and data contained in this repository is also archived via Zenodo: https://doi.org/10.5281/zenodo.4771576
An overview of the most important files and directories is provided below:
data_extraction.Rmd
documents the main processes for extracting article metadata from Dimensions. Note that using the Dimensions API requires an API key. Data from Dimensions were uploaded to Google BigQuery and matched to records from Crossref and Unpaywall. Documentation of the processing and storage of Crossref and Unpaywall datasets is contained in Jahn et al. (2021) and Hobert et al. (2021).analysis.Rmd
contains the entire manuscript written in Rmarkdown format, including all code used for analyses and visualisation.helpers.R
contains some additional functions that are imported intoanalysis.Rmd
. The final copiled manuscript is contained inanalysis.pdf
figures/
contains all figures (in .png format) generated fromanalysis.Rmd
. Figures are divided into two directories:main/
contains all the figures that are displayed in the main manuscript.supplement/
contains all supplemental figures.
queries/
contains all the SQL queries used for transforming and extracting data from Google BigQuery. R packagesDBI
andbigrquery
were used to interface directly with BigQuery fromanalysis.Rmd
.data/
contains all input and aggregated output data used throughout the analysis. As Dimensions is a proprietary data source, raw article metadata is not included. A description of the data in each sub-directory is as follows:analysis/
contains all aggregated datasets generated and used for analysis in the context of the study. Note that each dataset corresponds to (and has the same name as) a query contained in thequeries/
directory. Datasets with the prefixitems
refer to datasets used for analysis of publishing behaviour, and those with the prefixreferences
to datasets used for analysis of citing behaviour.deal/
contains a single file,deal_grid_mapping.csv
, that documents the mapping of DEAL institution information from public webpages to their GRID identifiers, which were used for retrieving metadata from Dimensions.dimensions/
contains some basic information relevant for the parsing and analysis of Dimensions data, namely a list of Dimensions Fields of Research categories (dimensions_fields_of_research.csv
) and a dataset of annual global research output (i.e. number of articles published per year -dimensions_items_per_year.csv
).scihub/
contains logs of SciHub downloads from Germany in 2017 (in .tab format). These logs are a subset of the existing dataset of Strecker (2018).
Nicholas Fraser, Postdoctoral Researcher, ZBW - Leibniz Information Centre for Economics, Kiel, Germany. [email protected]