1 Introduction

Fang and Casadevall (2011) defined retracted article as an article that is no longer considered trustworthy due to intentional misconduct or unintentional human error. In their study, they found positive correlation between the journal’s retraction index with the Journal Impact Factor (JIF). Steen (2011) also did similar study and concluded that retracted articles are more likely to be published in journals with a high JIF.

In this exploration, I would like to check whether correlation can also be found between retraction index to another journal metric called Scimago Journal Ranking (SJR). The main benefit of using SJR over JIF is that SJR score has been normalized to enable comparison between different subject areas.

Retraction index here is calculated by multiplying the number of retraction notices for each journal during a given time (2009 to 2018) by 1,000, and dividing it by the number of articles published by the journal during the same time period. This definition is adapted from Fang and Casadevall (2011).

2 Data sources

PubMed indexes retraction notices and these can be retrieved freely via their API. The dataset used for this exploration was retrieved on October 2019.

SJR score is available publicly at https://www.scimagojr.com/, and for this exploration the 2018 edition is used.

3 Data import and clean up

3.1 PubMed

The retraction notices (in XML format) are retrieved from PubMed database using the following search term “retraction of publication”[PTYP].

The following information will be extracted from the XML file:

  • Journal title

  • ISSN

  • Publication year

Subsequently, the number of retraction notices from each journal for the period specified above (2009 to 2018) is calculated.

The following code will search for the total number of publications published by the above journals from the same time period, 2009 to 2018.

Subsequently, the retraction index can be calculated.

4 Top 20 journals based on the retraction index

5 Relationship between SJR score and retraction index

In this exploration, the relationship between SJR score and retraction index can be concluded as significantly correlated (p < 0.05), with correlation coefficient of -0.45.

## 
##  Spearman's rank correlation rho
## 
## data:  combined4$rate and combined4$sjr
## S = 35852, p-value = 0.0008317
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.4453986

6 How about the SJR quartile?

Based on the plot below, the median values of journal’s retraction index between different SJR quartile are not statistically significant. With inconclusive findings between the influence of SJR score and quartile with journal’s retraction index, more studies should be conducted to further analyze their relationship.

q1_q2 <- combined4 %>% 
  filter(quartile == "Q1" | quartile == "Q2") 

q1_q3 <- combined4 %>% 
  filter(quartile == "Q1" | quartile == "Q3") 

q2_q3 <- combined4 %>% 
  filter(quartile == "Q2" | quartile == "Q3") 

a <- q1_q2 %>% 
  ggplot(aes(x=quartile, y=rate, fill=quartile)) + 
  geom_boxplot(varwidth=T, notch=T)+
  geom_signif(comparisons=list(c("Q1", "Q2")),map_signif_level = TRUE,
              test = "pairwise.wilcox.test",
              y_position = 8, tip_length = 0, vjust=0.2)+
  labs(x= "",
     y= "Retraction index")+
  scale_fill_manual(values = c("Q1" = "deepskyblue","Q2" = "tomato")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
       panel.background = element_blank(), axis.line = element_line(colour = "black"))
  
b <- q1_q3 %>% 
  ggplot(aes(x=quartile, y=rate, fill=quartile)) + 
  geom_boxplot(varwidth=T, notch=T)+
  geom_signif(comparisons=list(c("Q1", "Q3")),map_signif_level = TRUE,
              test = "pairwise.wilcox.test",
              y_position = 35, tip_length = 0, vjust=0.2)+
  labs(x= "",
     y= "Retraction index")+ 
  scale_fill_manual(values = c("Q1" = "deepskyblue", "Q3" = "lightseagreen")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
       panel.background = element_blank(), axis.line = element_line(colour = "black"))
  
c <- q2_q3 %>% 
  ggplot(aes(x=quartile, y=rate, fill=quartile)) + 
  geom_boxplot(varwidth=T, notch=T)+
  geom_signif(comparisons=list(c("Q2", "Q3")),map_signif_level = TRUE,
              test = "pairwise.wilcox.test",
              y_position = 35, tip_length = 0, vjust=0.2)+
  labs(x= "",
     y= "Retraction index")+ 
  scale_fill_manual(values = c("Q2" = "tomato", "Q3" = "lightseagreen")) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
       panel.background = element_blank(), axis.line = element_line(colour = "black"))
  
gridExtra::grid.arrange(a, b ,c , ncol = 3, 
                        top = "The retraction index between different SJR quartile are not statistically significant")

7 References

  • Fang, F. C. & Casadevall, A. (2011). Retracted Science and the Retraction Index. Infection and Immunity, 79(10), 3855-3859. doi: 10.1128/IAI.05661-11

  • Saunders, N. (n.d.). PubMed retractions. Retrieved from https://github.com/neilfws/PubMed/tree/master/retractions/code/R

  • Steen, R. G. (2011). Retractions in the scientific literature: do authors deliberately commit research fraud? Journal of Medical Ethics, 37(2), 113-117. doi: 10.1136/jme.2010.038125