Many institutions have reported that participation rates of article deposit in their IR are low regardless of their various efforts in outreach and engagement. Even when the deposit is mandated, the participation rate can still be quite low.
Once this hurdle was overcome, there is another challenge faced by the IR administrators, ensuring that the version submitted by the researcher is the appropriate version. If it is not, IR administrators would need to take additional steps to correspond with the researcher to obtain the appropriate version. Thus, increasing their administrative work load.
Therefore, some institutions had taken the pro-active initiative to complete the deposit on behalf of their researchers. This certainly is not a small undertaking. However, there are openly available R packages (https://ropensci.org/) that can be used to automate some of the processes. In this page, I will summarize the steps to do that.
The following packages are required, please install them beforehand.
## Warning: package 'fulltext' was built under R version 3.6.2
## Registered S3 method overwritten by 'hoardr':
## method from
## print.cache_info httr
##
## Attaching package: 'fulltext'
## The following object is masked from 'package:dplyr':
##
## collect
## -------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:plotly':
##
## arrange, mutate, rename, summarise
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename,
## summarise, summarize
## The following object is masked from 'package:purrr':
##
## compact
First thing, get a list of the DOIs of your institution’s works that you would like to deposit. This list might be very long, so it can be split into several CSV files. This is so that we do not hit the limit when querying Unpaywall API. The CSV file looks like the following.
Unpaywall (https://unpaywall.org/) is a non profit organization that aims to make scholarly works more open. They maintain a database of links to full-text articles from open-access sources all over the world. The content is harvested from legal sources including repositories run by universities, governments, and scholarly societies, as well as open content hosted by publishers themselves.
Unpaywall requests users to keep the API requests to below 100k per day and to include their email to the URL of requests. Please include your own email below.
unpaywall<- tibble()
# Creating loop so that we can query Unpaywall API based on dois in several CSV files
for (i in 1:length(list.filenames))
{
#reading dois from csv file
dois <- read.csv(list.filenames[i], header=TRUE)
vec_doi <- as_tibble(dois)
##querying unpaywall API & to
#catch error when the API does not return valid JSON or is not available
df_data <- purrr::map(vec_doi, .f = safely(function(x) oadoi_fetch(x, email="clbti@nus.edu.sg")))
df <- purrr::map_df(df_data, "result")
##getting values from best_oa_location
best_oa_evidence <- df %>%
dplyr::mutate(evidences = purrr::map(best_oa_location, "evidence") %>%
purrr::map_if(purrr::is_empty, ~ NA_character_) %>%
purrr::flatten_chr())%>%
.$evidences
best_oa_host <- df %>%
dplyr::mutate(hosts = purrr::map(best_oa_location, "host_type") %>%
purrr::map_if(purrr::is_empty, ~ NA_character_) %>%
purrr::flatten_chr())%>%
.$hosts
best_oa_license <- df %>%
dplyr::mutate(licenses = purrr::map(best_oa_location, "license") %>%
purrr::map_if(purrr::is_empty, ~ NA_character_) %>%
purrr::flatten_chr())%>%
.$licenses
best_oa_url <- df %>%
dplyr::mutate(urls = purrr::map(best_oa_location, "url") %>%
purrr::map_if(purrr::is_empty, ~ NA_character_) %>%
purrr::flatten_chr())%>%
.$urls
best_oa_url_for_pdf <- df %>%
dplyr::mutate(pdfs = purrr::map(best_oa_location, "url_for_pdf") %>%
purrr::map_if(purrr::is_empty, ~ NA_character_) %>%
purrr::flatten_chr())%>%
.$pdfs
best_oa_version <- df %>%
dplyr::mutate(versions = purrr::map(best_oa_location, "version") %>%
purrr::map_if(purrr::is_empty, ~ NA_character_) %>%
purrr::flatten_chr())%>%
.$versions
#selecting specific columns from the results
df_selection <- df %>%
select(-c(oa_locations,updated,authors))
##merging the columns together
df_final <- add_column(df_selection, best_oa_evidence, best_oa_host, best_oa_license,best_oa_url,best_oa_url_for_pdf,best_oa_version)
#to combine all the entries
unpaywall <- bind_rows(unpaywall, df_final)
Sys.sleep(60)
}
# changing year from factor to numeric, and limit to journal article only
unpaywall <- unpaywall %>%
filter(genre == "journal-article") %>%
select(-c(best_oa_location,journal_issn_l))
kable(unpaywall) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
scroll_box(width = "100%", height = "400px")
doi | data_standard | is_oa | genre | oa_status | has_repository_copy | journal_is_oa | journal_is_in_doaj | journal_issns | journal_name | publisher | title | year | best_oa_evidence | best_oa_host | best_oa_license | best_oa_url | best_oa_url_for_pdf | best_oa_version |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10.1145/903893.903896 | 2 | FALSE | journal-article | closed | FALSE | FALSE | FALSE | 0001-0782 | Communications of the ACM | Association for Computing Machinery (ACM) | The role of IT in successful knowledge management initiatives | 2003 | NA | NA | NA | NA | NA | NA |
10.1145/1314215.1314231 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-0782,1557-7317 | Communications of the ACM | Association for Computing Machinery (ACM) | Record matching in digital library metadata | 2008 | oa repository (via OAI-PMH doi match) | repository | NA | http://www.comp.nus.edu.sg/~kanmy/papers/2008-cacm.pdf | http://www.comp.nus.edu.sg/~kanmy/papers/2008-cacm.pdf | submittedVersion |
10.1145/3166068 | 2 | FALSE | journal-article | closed | FALSE | FALSE | FALSE | 0001-0782,1557-7317 | Communications of the ACM | Association for Computing Machinery (ACM) | Which is the fairest (rent division) of them all? | 2018 | NA | NA | NA | NA | NA | NA |
10.2514/1.j055575 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Establishment Times of Hypersonic Shock-Wave/Boundary-Layer Interactions in Intermittent Facilities | 2017 | oa repository (via OAI-PMH title and first author match) | repository | NA | https://eprints.soton.ac.uk/406505/1/Establishment_Times_of_Hypersonic_Shock_Wave.pdf | https://eprints.soton.ac.uk/406505/1/Establishment_Times_of_Hypersonic_Shock_Wave.pdf | acceptedVersion |
10.2514/3.12162 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Interlaminar stresses in composite laminates under out-of-plane shear/bending | 1994 | oa repository (semantic scholar lookup) | repository | NA | http://pdfs.semanticscholar.org/d33a/c33e4eff6fd5ece6c203e42d14a09f2e99bc.pdf | http://pdfs.semanticscholar.org/d33a/c33e4eff6fd5ece6c203e42d14a09f2e99bc.pdf | submittedVersion |
10.2514/1.14873 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Experimental Study of Linear Closed-Loop Control of Subsonic Cavity Flow | 2006 | oa repository (via OAI-PMH doi match) | repository | NA | http://repository.bilkent.edu.tr/bitstream/11693/23809/1/Experimental%20study%20of%20linear%20closed-loop%20control%20of%20subsonic%20cavity%20flow.pdf | http://repository.bilkent.edu.tr/bitstream/11693/23809/1/Experimental%20study%20of%20linear%20closed-loop%20control%20of%20subsonic%20cavity%20flow.pdf | submittedVersion |
10.2514/2.445 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Residual Strength of Aging Aircraft with Multiple Site Damage/Multiple Element Damage | 1998 | oa repository (semantic scholar lookup) | repository | NA | http://pdfs.semanticscholar.org/0707/c925d99c6275d0b722784ff8e1489fb8c512.pdf | http://pdfs.semanticscholar.org/0707/c925d99c6275d0b722784ff8e1489fb8c512.pdf | submittedVersion |
10.2514/1.j053565 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Prediction of Transonic Limit-Cycle Oscillations Using an Aeroelastic Harmonic Balance Method | 2015 | oa repository (via OAI-PMH doi match) | repository | NA | https://pureadmin.qub.ac.uk/ws/files/15386385/aiaa_a_hb.pdf | https://pureadmin.qub.ac.uk/ws/files/15386385/aiaa_a_hb.pdf | publishedVersion |
10.2514/1.2159 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Aerodynamic Data Reconstruction and Inverse Design Using Proper Orthogonal Decomposition | 2004 | oa repository (semantic scholar lookup) | repository | NA | http://pdfs.semanticscholar.org/2e93/a6d420b050d948c269fed2be8cbac5bc882d.pdf | http://pdfs.semanticscholar.org/2e93/a6d420b050d948c269fed2be8cbac5bc882d.pdf | submittedVersion |
10.2514/1.4799 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Logic-Based Active Control of Subsonic Cavity Flow Resonance | 2004 | oa repository (semantic scholar lookup) | repository | NA | http://pdfs.semanticscholar.org/bfdd/8ad6d04229eef6d012828589bbfd6e91adb1.pdf | http://pdfs.semanticscholar.org/bfdd/8ad6d04229eef6d012828589bbfd6e91adb1.pdf | submittedVersion |
10.2514/1.j055143 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1452,1533-385X | AIAA Journal | American Institute of Aeronautics and Astronautics (AIAA) | Nonlinear Aerodynamic and Aeroelastic Model Reduction Using a Discrete Empirical Interpolation Method | 2017 | oa repository (via OAI-PMH doi match) | repository | NA | https://pureadmin.qub.ac.uk/ws/files/123540593/aiaa_journal_DEIM_submission.pdf | https://pureadmin.qub.ac.uk/ws/files/123540593/aiaa_journal_DEIM_submission.pdf | publishedVersion |
10.1002/aic.12373 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-1541 | AIChE Journal | Wiley | Acid-sensitive magnetic nanoparticles as potential drug depots | 2010 | oa repository (via OAI-PMH doi match) | repository | NA | http://europepmc.org/articles/pmc3134249?pdf=render | http://europepmc.org/articles/pmc3134249?pdf=render | acceptedVersion |
10.1111/j.1399-0039.2011.01796.x | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-2815 | Tissue Antigens | Wiley | Natural killer cell engineering for cellular therapy of cancer | 2011 | oa repository (via OAI-PMH doi match) | repository | NA | http://europepmc.org/articles/pmc3218564?pdf=render | http://europepmc.org/articles/pmc3218564?pdf=render | acceptedVersion |
10.1111/abac.12091 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-3072 | Abacus | Wiley | Comments on Shan and Walter: ‘Towards a Set of Design Principles for Executive Compensation Contracts’ | 2016 | oa repository (via OAI-PMH doi match) | repository | NA | https://strathprints.strath.ac.uk/55436/8/Hillier_etal_Abacus_2015_CEO_compensation_that_benefits.pdf | https://strathprints.strath.ac.uk/55436/8/Hillier_etal_Abacus_2015_CEO_compensation_that_benefits.pdf | acceptedVersion |
10.2307/30040617 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4273,1948-0989 | Academy of Management Journal | Academy of Management | OWNERSHIP STRUCTURE, EXPROPRIATION, AND PERFORMANCE OF GROUP-AFFILIATED COMPANIES IN KOREA. | 2003 | oa repository (via OAI-PMH title match) | repository | cc-by-nc-nd | https://www.krm.or.kr/krmts/sdata/frbr/bizmap/2001/2001_041.pdf | https://www.krm.or.kr/krmts/sdata/frbr/bizmap/2001/2001_041.pdf | submittedVersion |
10.5465/amj.2011.0727 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4273,1948-0989 | Academy of Management Journal | Academy of Management | “I Put in Effort, Therefore I Am Passionate”: Investigating the Path from Effort to Passion in Entrepreneurship | 2015 | oa repository (via OAI-PMH doi match) | repository | NA | http://fox.leuphana.de/portal/files/7377532/Gielnik_et_al._2015_Entrepreneurial_passion.pdf | http://fox.leuphana.de/portal/files/7377532/Gielnik_et_al._2015_Entrepreneurial_passion.pdf | publishedVersion |
10.5465/amj.2009.47084665 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4273,1948-0989 | Academy of Management Journal | Academy of Management | Does Patent Strategy Shape the Long-Run Supply of Public Knowledge? Evidence from Human Genetics | 2009 | oa repository (via OAI-PMH doi match) | repository | NA | http://fmurray.scripts.mit.edu/docs/Huang.Murray_AMJ_09.16.2008_FINAL.pdf | http://fmurray.scripts.mit.edu/docs/Huang.Murray_AMJ_09.16.2008_FINAL.pdf | submittedVersion |
10.5465/amj.2013.1082 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4273,1948-0989 | Academy of Management Journal | Academy of Management | Why and When Leaders’ Affective States Influence Employee Upward Voice | 2017 | oa repository (via OAI-PMH title and first author match) | repository | NA | http://ira.lib.polyu.edu.hk/bitstream/10397/67339/2/Liu_et_al._2017_AMJ.pdf | http://ira.lib.polyu.edu.hk/bitstream/10397/67339/2/Liu_et_al._2017_AMJ.pdf | submittedVersion |
10.1134/s0001434610090324 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4346,1573-8876 | Mathematical Notes | Pleiades Publishing Ltd | A supercongruence motivated by the Legendre family of elliptic curves | 2010 | oa repository (via OAI-PMH doi match) | repository | NA | http://wain.mi.ras.ru/PS/supercong-MN2010.pdf | http://wain.mi.ras.ru/PS/supercong-MN2010.pdf | submittedVersion |
10.1021/ar900178k | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4842,1520-4898 | Accounts of Chemical Research | American Chemical Society (ACS) | Cofabrication: A Strategy for Building Multicomponent Microsystems | 2010 | oa repository (via OAI-PMH doi match) | repository | NA | http://europepmc.org/articles/pmc2857577?pdf=render | http://europepmc.org/articles/pmc2857577?pdf=render | acceptedVersion |
10.1021/ar500303m | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4842,1520-4898 | Accounts of Chemical Research | American Chemical Society (ACS) | Electronic Structure and Optical Signatures of Semiconducting Transition Metal Dichalcogenide Nanosheets | 2014 | oa repository (via OAI-PMH doi match) | repository | NA | http://repositorium.sdum.uminho.pt/bitstream/1822/39757/1/Optical%20signagures%20of%20SC%20TMD%20NSs.pdf | http://repositorium.sdum.uminho.pt/bitstream/1822/39757/1/Optical%20signagures%20of%20SC%20TMD%20NSs.pdf | publishedVersion |
10.1021/ar900183k | 2 | TRUE | journal-article | bronze | FALSE | FALSE | FALSE | 0001-4842,1520-4898 | Accounts of Chemical Research | American Chemical Society (ACS) | Attachment Chemistry of Organic Molecules on Si(111)-7 × 7 | 2009 | open (via free pdf) | publisher | NA | https://pubs.acs.org/doi/pdf/10.1021/ar900183k | https://pubs.acs.org/doi/pdf/10.1021/ar900183k | publishedVersion |
10.1121/1.4996860 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4966 | The Journal of the Acoustical Society of America | Acoustical Society of America (ASA) | Space-time domain solutions of the wave equation by a non-singular boundary integral method and Fourier transform | 2017 | oa repository (via OAI-PMH doi match) | repository | NA | http://arxiv.org/pdf/1706.07919 | http://arxiv.org/pdf/1706.07919 | submittedVersion |
10.1121/1.3625257 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4966 | The Journal of the Acoustical Society of America | Acoustical Society of America (ASA) | Passive acoustic survey of Yangtze finless porpoises using a cargo ship as a moving platform | 2011 | oa repository (semantic scholar lookup) | repository | NA | http://pdfs.semanticscholar.org/d504/f8ad0fd51d7f0cecd85fc6d2c07c77038fe9.pdf | http://pdfs.semanticscholar.org/d504/f8ad0fd51d7f0cecd85fc6d2c07c77038fe9.pdf | submittedVersion |
10.1121/1.2721658 | 2 | FALSE | journal-article | closed | FALSE | FALSE | FALSE | 0001-4966 | The Journal of the Acoustical Society of America | Acoustical Society of America (ASA) | Echolocation click sounds from wild inshore finless porpoise (Neophocaena phocaenoides sunameri) with comparisons to the sonar of riverine N. p. asiaeorientalis | 2007 | NA | NA | NA | NA | NA | NA |
10.1121/1.4929492 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4966 | The Journal of the Acoustical Society of America | Acoustical Society of America (ASA) | Echolocation signals of free-ranging Indo-Pacific humpback dolphins (Sousa chinensis) in Sanniang Bay, China | 2015 | oa repository (semantic scholar lookup) | repository | NA | http://pdfs.semanticscholar.org/447a/9f5e8997ca658e6499c5130c57e8c8bb72d9.pdf | http://pdfs.semanticscholar.org/447a/9f5e8997ca658e6499c5130c57e8c8bb72d9.pdf | submittedVersion |
10.1121/1.3021302 | 2 | FALSE | journal-article | closed | FALSE | FALSE | FALSE | 0001-4966 | The Journal of the Acoustical Society of America | Acoustical Society of America (ASA) | Comparison of stationary acoustic monitoring and visual observation of finless porpoises | 2009 | NA | NA | NA | NA | NA | NA |
10.1121/1.3442574 | 2 | TRUE | journal-article | green | TRUE | FALSE | FALSE | 0001-4966 | The Journal of the Acoustical Society of America | Acoustical Society of America (ASA) | Density estimation of Yangtze finless porpoises using passive acoustic sensors and automated click train detection | 2010 | oa repository (semantic scholar lookup) | repository | NA | http://pdfs.semanticscholar.org/f06e/dc9e6ae7464ae20408f4b925d7eb184f63b3.pdf | http://pdfs.semanticscholar.org/f06e/dc9e6ae7464ae20408f4b925d7eb184f63b3.pdf | submittedVersion |
SHERPA RoMEO is an online resource that aggregates and analyses publisher open access policies from around the world and provides summaries of self-archiving permissions and conditions of rights given to authors on a journal-by-journal basis.
To apply for API key, please refer to this page - http://www.sherpa.ac.uk/romeo/api.html
#separate the 2 ISSNs
unpaywall2 <- unpaywall %>%
separate(journal_issns, c("issn1","issn2"), sep = "\\,", remove = TRUE)
#querying romeo API using the issn1
#your_own_key : please use your own API key
romeo <- map(unpaywall2$issn1, .f = safely(function(x) rr_journal_issn(x, your_own_key)))
# pluck the data, get rid of duplicate ISSNs and the extra title column
romeo_df <- map_df(romeo, "result") %>%
filter(!duplicated(issn)) %>%
select(-title)
# join by ISSN
unpaywall_romeo1 <- unpaywall2 %>%
left_join(romeo_df, by = c("issn1" = "issn"))
# Using issn2 for those resulted in NA using issn
# add rownames as identifiers so we can bind together
unpaywall_romeo1 <- unpaywall_romeo1 %>%
rownames_to_column()
# create table of items not found by issn1 (where romeocolour is NA)
unp.na <- unpaywall_romeo1 %>%
filter(is.na(romeocolour))
# create vector of NA rownames
na_rownames <- as.integer(unp.na$rowname)
# query sherpa-romeo with issn2
romeo2 <- map(unp.na$issn2, .f = safely(function(x) rr_journal_issn(x, key=your_own_key)))
# pluck data, remove duplicates and extra title column
romeo2_df <- map_df(romeo2, "result") %>%
filter(!duplicated(issn)) %>%
select(-title)
# delete columsn with NA create new table with the values we just retrieved
unp.na2 <- unp.na %>%
select(-romeocolour, -preprint, -postprint, -pdf, -pre_embargo, -post_embargo, -pdf_embargo) %>%
left_join(romeo2_df, by = c("issn2" = "issn"))
# join NA data back
# remove NA data from unpaywall_romeo1 and replace with new data, rearrange by rowname
unpaywall_romeo2 <- unpaywall_romeo1 %>%
slice(-na_rownames) %>%
bind_rows(unp.na2) %>%
mutate(rowname = as.integer(rowname)) %>%
arrange(rowname)
Create a summary table.
# create new column to give information about publisher permissions
unpaywall_romeo2 <- read.csv("~/Documents/nBox/R/full_text/sherpa.csv")
# create new column to give information about publisher permissions
`%notin%` <- negate(`%in%`)
unpaywall_romeo_final <- unpaywall_romeo2 %>%
mutate(status = case_when(pdf == "can" & pdf_embargo == "NA" ~ "final_immediate",
pdf == "restricted" ~ "final_restricted",
postprint == "can" & post_embargo == "NA" &
pdf %notin% c("can", "restricted") ~
"postprint_immediate",
postprint == "restricted" & pdf %notin% c("can", "restricted") ~
"postprint_restricted",
preprint == "can" & pre_embargo == "NA" &
postprint %notin% c("can", "restricted") &
pdf %notin% c("can", "restricted") ~ "preprint_immediate",
preprint == "restricted" & postprint %notin% c("can", "restricted") &
pdf %notin% c("can", "restricted") ~ "preprint_restricted",
preprint == "cannot" & postprint == "cannot" & pdf == "cannot" ~
"fully_restricted"))
# create a summary table
unpaywall_romeo_summary <- unpaywall_romeo_final %>%
group_by(status) %>%
tally()
kable(unpaywall_romeo_summary) %>%
kable_styling(bootstrap_options = c("striped", "hover"))
status | n |
---|---|
final_restricted | 4 |
fully_restricted | 1 |
postprint_restricted | 3 |
NA | 22 |
For example, let’s say we only want to deposit full text that the publisher allows PDF version to be deposited immediately.
There are 2 methods to download the full text :
Using another R package called fulltext
Using direct URL download
The fulltext package made it easy to download the full text of a publication based on a DOI. They have several data sources, for e.g. PLOS, arXiv.
#change the class of DOI from factor to character
df <- to_dl %>%
mutate(doi = as.character(doi))
#splitting the data into a smaller dataframe for example 100 dois per dataframe
#get the number of dataframes to be created, in this case, it is variable s
s <- (nrow(df)-1) %/% 100
dt <- split(df, (seq(nrow(df))-1) %/% 100)
#so we will create s dataframe that is named as doi1, doi2, ... till doi(s)
for(i in 1:s){
nam <- paste("doi", i, sep = "")
assign(nam, dt [[i]])
}
#creating loop to download full text
#note: once you are done with the first dataframe, doi, change it to doi2 and so on
for(j in 1:length(doi1)){
cat(".")
res <- purrr::map(doi1, .f = safely(function(x) ft_get(x,type="pdf")))
}
#note:
#the location of the full text downloaded can be found at /Users/xxx/Library/Caches/R/fulltext
There will be warnings or errors displayed if the full text is not downloaded.
For full text that cannot be downloaded using the fulltext package, we can use the direct URL approach as Unpaywall has provided the direct URL to get the full text of the publication. Similarly, there will be a warning if the full text cannot be downloaded from the URL.
# convert the url as charcter, as by default it is assigned as factor
urls <- to_dl %>%
select(doi,best_oa_url_for_pdf) %>%
mutate(doi=as.character(doi), url=as.character(best_oa_url_for_pdf))
#set location to download the files
setwd("~/Desktop/ft/files/")
#create empty error file (error1.txt) if the url is not working
for(s in 1:nrow(urls)){
name=paste("doi",s,".pdf",sep="") #create a name for the file to be downloaded
url_n=urls$url[[s]]
tryCatch(download.file(url_n,
destfile=name, method='auto'),
error = function(e) {
file.create(paste("error", s, ".txt",sep=""))
}) #if the url error, create an empty error file and skip to next url
}
Use Crossref as a source to create the metadata file.
#get the metadata based on a list of doi from dataframe called doi1
works_list <- purrr::map(unpaywall_romeo_final$doi,
function(x) {
my_works <- cr_works(doi = x) %>%
purrr::pluck("data")
})
#unnest the list
works_df <- works_list %>%
purrr::map_dfr(., function(x) {
x[, ]
})
#unnest the authors field to combine them
authors <- works_df %>%
filter(!map_lgl(author, is.null)) %>%
unnest(author, .drop = TRUE) %>%
unite(author, c(family, given), sep = " ", remove = FALSE) %>%
dplyr::group_by(doi) %>%
filter(!is.na(author)) %>%
dplyr::summarise(all_authors=paste(author, collapse=" ; "))
#combine back with metadata
works_df1 <- works_df %>%
select(c(title,container.title, doi,issn,issue,volume,page,publisher, published.print))
combined <- left_join(x=works_df1, y=authors, by="doi")
cls <- sapply(combined, class)
newCombined <- combined %>% select(which(cls=="character"))
kable(newCombined) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
scroll_box(width = "100%", height = "400px")
title | container.title | doi | issn | issue | volume | page | publisher | published.print | all_authors |
---|---|---|---|---|---|---|---|---|---|
The role of IT in successful knowledge management initiatives | Communications of the ACM | 10.1145/903893.903896 | 0001-0782 | 9 | 46 | 69-73 | Association for Computing Machinery (ACM) | 2003-09-01 | Kankanhalli Atreyi ; Tanudidjaja Fransiska ; Sutanto Juliana ; Tan Bernard C. Y. |
Record matching in digital library metadata | Communications of the ACM | 10.1145/1314215.1314231 | 0001-0782,1557-7317 | 2 | 51 | 91-94 | Association for Computing Machinery (ACM) | 2008-02 | Kan Min-Yen ; Tan Yee Fan |
Which is the fairest (rent division) of them all? | Communications of the ACM | 10.1145/3166068 | 0001-0782,1557-7317 | 2 | 61 | 93-100 | Association for Computing Machinery (ACM) | 2018-01-23 | Gal Kobi ; Procaccia Ariel D. ; Mash Moshe ; Zick Yair |
Establishment Times of Hypersonic Shock-Wave/Boundary-Layer Interactions in Intermittent Facilities | AIAA Journal | 10.2514/1.j055575 | 0001-1452,1533-385X | 9 | 55 | 2875-2887 | American Institute of Aeronautics and Astronautics (AIAA) | 2017-09 | Vanstone Leon ; Estruch-Samper David ; Ganapathisubramani Bharathram |
Interlaminar stresses in composite laminates under out-of-plane shear/bending | AIAA Journal | 10.2514/3.12162 | 0001-1452,1533-385X | 8 | 32 | 1700-1708 | American Institute of Aeronautics and Astronautics (AIAA) | 1994-08 | Kim Taehyoun ; Atluri Satya N. |
Experimental Study of Linear Closed-Loop Control of Subsonic Cavity Flow | AIAA Journal | 10.2514/1.14873 | 0001-1452,1533-385X | 5 | 44 | 929-938 | American Institute of Aeronautics and Astronautics (AIAA) | 2006-05 | Yan P. ; Debiasi M. ; Yuan X. ; Little J. ; Ozbay H. ; Samimy M. |
Residual Strength of Aging Aircraft with Multiple Site Damage/Multiple Element Damage | AIAA Journal | 10.2514/2.445 | 0001-1452,1533-385X | 5 | 36 | 840-847 | American Institute of Aeronautics and Astronautics (AIAA) | 1998-05 | Wang L. ; Chow W. T. ; Kawai H. ; Atluri S. N. |
Prediction of Transonic Limit-Cycle Oscillations Using an Aeroelastic Harmonic Balance Method | AIAA Journal | 10.2514/1.j053565 | 0001-1452,1533-385X | 7 | 53 | 2040-2051 | American Institute of Aeronautics and Astronautics (AIAA) | 2015-07 | Yao W. ; Marques S. |
Aerodynamic Data Reconstruction and Inverse Design Using Proper Orthogonal Decomposition | AIAA Journal | 10.2514/1.2159 | 0001-1452,1533-385X | 8 | 42 | 1505-1516 | American Institute of Aeronautics and Astronautics (AIAA) | 2004-08 | Bui-Thanh T. ; Damodaran M. ; Willcox K. |
Logic-Based Active Control of Subsonic Cavity Flow Resonance | AIAA Journal | 10.2514/1.4799 | 0001-1452,1533-385X | 9 | 42 | 1901-1909 | American Institute of Aeronautics and Astronautics (AIAA) | 2004-09 | Debiasi M. ; Samimy M. |
Nonlinear Aerodynamic and Aeroelastic Model Reduction Using a Discrete Empirical Interpolation Method | AIAA Journal | 10.2514/1.j055143 | 0001-1452,1533-385X | 2 | 55 | 624-637 | American Institute of Aeronautics and Astronautics (AIAA) | 2017-02 | Yao W. ; Marques S. |
Acid-sensitive magnetic nanoparticles as potential drug depots | AIChE Journal | 10.1002/aic.12373 | 0001-1541 | 6 | 57 | 1638-1645 | Wiley | 2011-06 | Wuang Shy Chyi ; Neoh Koon Gee ; Kang En-Tang ; Leckband Deborah E. ; Pack Daniel W. |
Natural killer cell engineering for cellular therapy of cancer | Tissue Antigens | 10.1111/j.1399-0039.2011.01796.x | 0001-2815 | 6 | 78 | 409-415 | Wiley | 2011-12 | Shook D. R. ; Campana D. |
Comments on Shan and Walter: ‘Towards a Set of Design Principles for Executive Compensation Contracts’ | Abacus | 10.1111/abac.12091 | 0001-3072 | 4 | 52 | 685-771 | Wiley | 2016-12 | Beaumont Stacey ; Ratiu Raluca ; Reeb David ; Boyle Glenn ; Brown Philip ; Szimayer Alexander ; da Silva Rosa Raymond ; Hillier David ; McColgan Patrick ; Tsekeris Athanasios ; Howieson Bryan ; Matolcsy Zoltan ; Spiropoulos Helen ; Roberts John ; Smith Tom ; Zhou Qing ; Swan Peter L. ; Taylor Stephen ; Wright Sue ; Yermack David |
OWNERSHIP STRUCTURE, EXPROPRIATION, AND PERFORMANCE OF GROUP-AFFILIATED COMPANIES IN KOREA. | Academy of Management Journal | 10.2307/30040617 | 0001-4273,1948-0989 | 2 | 46 | 238-253 | Academy of Management | 2003-04-01 | Chang S. J. |
“I Put in Effort, Therefore I Am Passionate”: Investigating the Path from Effort to Passion in Entrepreneurship | Academy of Management Journal | 10.5465/amj.2011.0727 | 0001-4273,1948-0989 | 4 | 58 | 1012-1031 | Academy of Management | 2015-08 | Gielnik Michael M. ; Spitzmuller Matthias ; Schmitt Antje ; Klemann D. Katharina ; Frese Michael |
Does Patent Strategy Shape the Long-Run Supply of Public Knowledge? Evidence from Human Genetics | Academy of Management Journal | 10.5465/amj.2009.47084665 | 0001-4273,1948-0989 | 6 | 52 | 1193-1221 | Academy of Management | 2009-12 | Huang Kenneth G. ; Murray Fiona E. |
Why and When Leaders’ Affective States Influence Employee Upward Voice | Academy of Management Journal | 10.5465/amj.2013.1082 | 0001-4273,1948-0989 | 1 | 60 | 238-263 | Academy of Management | 2017-02 | Liu Wu ; Song Zhaoli ; Li Xian ; Liao Zhenyu |
A supercongruence motivated by the Legendre family of elliptic curves | Mathematical Notes | 10.1134/s0001434610090324 | 0001-4346,1573-8876 | 3-4 | 88 | 599-602 | Pleiades Publishing Ltd | 2010-10 | Chan Heng Huat ; Long Ling ; Zudilin V. V. |
Do Supplementary Sales Forecasts Increase the Credibility of Financial Analysts’ Earnings Forecasts? | The Accounting Review | 10.2308/accr.2010.85.6.2047 | 0001-4826,1558-7967 | 6 | 85 | 2047-2074 | American Accounting Association | 2010-11-01 | Keung Edmund C. |
811 Charnley hips followed for 3–17 years | Acta Orthopaedica Scandinavica | 10.3109/17453679308993619 | 0001-6470 | 3 | 64 | 252-256 | Informa UK Limited | 1993-01 | Dall Desmond M ; Learmonth Ian D ; Solomon Michael ; Davenport J Michael |
Detection and monitoring of progressive degeneration of osteoarthritic cartilage by MRI | Acta Orthopaedica Scandinavica | 10.3109/17453679509157668 | 0001-6470 | sup266 | 66 | 130-138 | Informa UK Limited | 1995-01 | Tyler Jenny A ; Watson Paul J ; Koh Hwee-Ling ; Herrod Nicholas J ; Robson Matthew ; Hall Laurance D |
A Comparison of In-Room and Video Ratings of Team Behaviors of Students in Interprofesional Teams | American Journal of Pharmaceutical Education | 10.5688/ajpe6487 | 0002-9459,1553-6467 | NA | NA | ajpe6487 | American Journal of Pharmaceutical Education | NA | Lie Désirée ; Richter-Lagha Regina ; (Sarah) Ma Sae Byul |
Comparison of Attitudes, Beliefs, and Resource-seeking Behavior for CAM Among First- and Third-Year Czech Pharmacy Students | American Journal of Pharmaceutical Education | 10.5688/aj720224 | 0002-9459,1553-6467 | 2 | 72 | 24 | American Journal of Pharmaceutical Education | 2008-09 | Pokladnikova Jitka ; Lie Desiree |
Using a Human Patient Simulation Mannequin to Teach Interdisciplinary Team Skills to Pharmacy Students | American Journal of Pharmaceutical Education | 10.5688/aj710351 | 0002-9459,1553-6467 | 3 | 71 | 51 | American Journal of Pharmaceutical Education | 2007-09 | Fernandez Rosemarie ; Parker Dennis ; Kalus James S. ; Miller Douglas ; Compton Scott |
Biology of the Bee Hoplitis (Hoplitis) monstrabilis Tkalců and Descriptions of Its Egg and Larva (Megachilidae: Megachilinae: Osmiini) | American Museum Novitates | 10.1206/646.1 | 0003-0082,1937-352X | NA | 3645 | 1-12 | American Museum of Natural History (BioOne sponsored) | 2009-07-25 | Rozen Jerome G. ; Özbek Hikmet ; Ascher John S. ; Rightmyer Molly G. |
Nests, Petal Usage, Floral Preferences, and Immatures of Osmia (Ozbekosmia) avosetta (Megachilidae: Megachilinae: Osmiini), Including Biological Comparisons with Other Osmiine Bees | American Museum Novitates | 10.1206/701.1 | 0003-0082 | NA | 3680 | 1-22 | American Museum of Natural History (BioOne sponsored) | 2010-03-04 | Rozen Jerome G. ; Özbek Hikmet ; Ascher John S. ; Sedivy Claudio ; Praz Christophe ; Monfared Alireza ; Müller Andreas |
Larval Diversity in the Bee GenusMegachile(Hymenoptera: Apoidea: Megachilidae) | American Museum Novitates | 10.1206/3863.1 | 0003-0082,1937-352X | 3863 | 3863 | 1-16 | American Museum of Natural History (BioOne sponsored) | 2016-09-23 | Rozen Jerome G. ; Ascher John S. ; Kamel Soliman M. ; Mohamed Kariman M. |
Influence of orthodontic adhesives and clean-up procedures on the stain susceptibility of enamel after debonding | The Angle Orthodontist | 10.2319/062610-350.1 | 0003-3219,1945-7103 | 2 | 81 | 334-340 | The Angle Orthodontist (EH Angle Education & Research Foundation) | 2011-03 | Joo Hyun-Jin ; Lee Yong-Keun ; Lee Dong-Yul ; Kim Yae-Jin ; Lim Yong-Kyu |
Test-retest reliability of smile tasks using three-dimensional facial topography | The Angle Orthodontist | 10.2319/062617-425.1 | 0003-3219,1945-7103 | 3 | 88 | 319-328 | The Angle Orthodontist (EH Angle Education & Research Foundation) | 2018-05-01 | Tanikawa Chihiro ; Takada Kenji |
Various resources and lots of Googling were involved in developing the above script, please feel free to contact me if you have suggestions to further improve it.