ArXiv Preprints by HKUST and Their Citation Advantage

ArXiv, the oldest and the most well-established preprint server turned 30 years old today. Ever wondered how HKUST researchers are making use of arXiv? Here we report our analysis and findings.

30 Years of ArXiv

On 14 Aug 1991, the first subject archive started off online as an electronic bulletin board. Through a central repository, a group of physicists specializing on high energy physics were able to share drafts of their unpublished manuscripts using an automated email distribution service. Within a few years after its launch, the site moved to the nascent WorldWideWeb as a web resource at and since then it experienced both expansion in subject coverage and content. Maintained and operated by Cornell University, arXiv now houses nearly two million scholarly preprints in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

Preprint is a version of a scholarly paper that precedes formal peer review process. Very different from final version (i.e. published version) of an article which is oftentimes copyrighted by publishers, the copyright of preprints usually remains with the authors and authors can freely share and deposit preprints both before and after their papers get published formally in journals. Thus, preprint servers such as arXiv become enormous resources for scientists and researchers to discover early research findings and to accelerate dissemination of scientific information.

HKUST researchers publish over 2,500 papers every year. How many of them have a preprint version deposited to arXiv? Furthermore, will arXiv-ed papers have more citations? Carrying these questions in mind, from Scopus we pulled out HKUST journal articles, conference papers, reviews, book chapters, editorials, notes, and letters in recent 5 years. Then, we used Unpaywall to search whether there is an arXiv version of the work freely available online.

ArXiv Preprint Submissions by HKUST Authors

Between 1 Jan 2017 and 14 July 2021, we identified 13,926 scholarly outputs by HKUST members in Scopus. Using each record’s persistent identifier (i.e., DOI), we ran queries of the work’s metadata in Unpaywall API and then we looked closely at whether or not an arXiv source is available as one of the article’s “oa_locations” (i.e., open access locations). After checking and sorting the results by year and by the availability of arXiv preprint, we got Figure 1, which shows us that:

  • HKUST research output has been rising in general
  • Year by year deposit of preprints in arXiv varies. From the beginning of 2017 to date, the overall percentage of arXiv preprint submission rate is around 11%.



Figure 1: HKUST publications with and without arXiv preprints

Since arXiv is a preprint repository mainly for the sciences while Scopus is a multidisciplinary citation database, we further analysed the subject coverage of HKUST scholarly outputs in Scopus and the arXiv preprints identified through Unpaywall API queries. A break-down is shown in Figure 2 and it confirms that the proportion of papers by subject in these two databases do vary. We’d like to highlight the differences in subject coverage:

  • Physics and Astronomy (26.6%), Computer Science (22.2%), and Engineering (15.5%) are the top 3 subject areas of HKUST preprints in arXiv.
  • In Scopus, the top 3 subject areas of HKUST publications are Engineering (18.6%), Computer Science (13.5%) and Material Science (10.8%).



Figure 2: HKUST publications in Scopus and Arxiv by Subject Area


Citation Advantage of Preprints

Preprint repositories allow early discovery of research findings. Furthermore, for papers that are published in subscription-based journals (i.e. papers with paywalls), preprint servers offer readers who don’t have a subscription an option to read an early version of the publication with very much similar content. As a result, papers with preprints online could potentially receive more immediate and early attention, citations and acknowledgements. Interested to see whether arXiv preprints give a citation advantage over non-arXiv papers, we examined the citation counts of the papers in Scopus and we compared the average citation count of papers with and without arXiv preprints. Since citation pattern could vary greatly across disciplines, we compared citation performance of arXived and non-arXived papers only within the same subject and we examined the performance in three major subject areas, namely math, physics and astronomy, and computer sciences. Divided by two groups (with arXiv preprint and without arXiv preprint), we can see the average citation per paper by year in Figure 3.


Figure 3: Average citation per paper in three subject categories

Comparison between two groups shows clearly that papers with arXiv preprints get more citations than those without preprints. As citation count takes time to accumulate, having preprint version of a paper available helps researchers to disseminate research findings more quickly than conventional journal publishing which could take months. As time goes by, having multiple access channels to your research (e.g. published version, preprint version, oh, and let us not forget author accepted manuscript which may be deposited to institutional repositories and pure “gold” open access papers which are publicly accessible in journals) could enable more readership of your work and therefore raise your research visibility. As we can see from the figure above, papers with arXiv preprint in general have higher average citations in the first couple of years since their formal publication. What’s more, papers with arXiv preprint in 2017 and 2018 in computer science receive over three times more citations than others – This could be the far-reaching benefit of enabling open access of publications.

If you have any question or suggestion about the data analysis or open access, please contact the Library’s Research Support Services.

– By Jennifer Gu, Library

published August 14, 2021