Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting

Ini merupakan tulisan Emilio Delgado López-Cózar , Nicolás Robinson-García, Daniel Torres-Salinas : Universidad de Granada, Universidad de Navarra.

Berhubung di webnya berlisensi Creative Commons Creative Commons License, maka saya share disini.

ABSTRACT

The launch of Google Scholar Citations and Google Scholar Metrics may provoke a revolution in the research evaluation field as it places within every researcher’s reach tools that allow bibliometric measuring. In order to alert the research community over how easily one can manipulate the data and bibliometric indicators offered by Google’s products we present an experiment in which we manipulate the Google Citations’ profiles of a research group through the creation of false documents that cite their documents, and consequently, the journals in which they have published modifying their H-index. For this purpose we created six documents authored by a faked author and we uploaded them to a researcher’s personal website under the University of Granada’s domain. The result of the experiment meant an increase of 774 citations in 129 papers (six citations per paper) increasing the authors and journals’ H-index . We analyse the malicious effect this type of practices can cause to Google Scholar Citations and Google Scholar Metrics. Finally, we conclude with several deliberations over the effects these malpractices may have and the lack of control tools these tools offer

KEYWORDS:  Google Citations / Google Scholar Metrics/ Scientific Journals / Scientific fraud / Citation analysis / Bibliometrics / H Index / Evaluation / Researchers

Referencia bibliográfica recomendada:  Delgado López-Cózar, Emilio; Robinson-García, Nicolás; Torres Salinas, Daniel (2012). Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting. EC3 Working Papers 6: 29 May, 2012

 

1. INTRODUCTION

If the launch of Google Scholar in 2004 (a novel search engine focused on retrieving any type of academic material along with its citations) meant a revolution in the scientific information market by allowing universal and free access to all documents available in the web, the launch of Google Scholar Citations (hereafter GS Citations)(a tool for measuring researchers’ output and impact (Cabezas-Clavijo y Torres-Salinas, 2012)) and Google Scholar Metrics (hereafter GS Metrics) (a scientific index of journals ranked according to their impact (Cabezas-Clavijo y Delgado López-Cózar, 2012)) may well be a historical milestone for the globalization and democratisation of research evaluation (Butler 2011). As well as constituting an obstacle to the traditional bibliographic databases and bibliometric indexes offered by Thomson Reuters (Web of Science and JCR) and Elsevier (Scopus and SJR), ending with their monopoly and becoming a serious competitor; Google Scholar’s new products project a future landscape with ethical and sociological dilemmas that may entail serious consequences in the world of science and research evaluation.

Without considering the technical and methodological problems that the Google Scholar products have, which are currently under study (Jacsó, 2008, 2011; Wouters y Costas, 2012; Aguillo, 2012; Cabezas-Clavijo y Delgado López-Cózar, 2012; Torres-Salinas, Ruiz-Pérez y Delgado López-Cózar, 2009) and which will be presumably solved in a near future, its irruption ends with all kinds of scientific control or filters of researchers’ activity, becoming a new challenge to the bibliometric community. Since the moment Google Scholar automatically retrieves, indexes and stores any type of scientific material uploaded by an author without any previous external control (repositories are only a technical filter as they do not review the content), it allows unprincipled people to manipulate their output, impacting directly on their bibliometric performance.

Because this type of behaviour by which one modifies its output and impact through intentional and unrestrained self-citation is not uncommon, we consider necessary to analyse thoroughly Google’s capacity to detect the manipulation of data.

This study continues the research line started by Labbé (2010). In his paper he transformed a faked researcher called Ike Antkare ( ‘I can’t care’) into the most prolific researcher in history. However, in this case we will enquire over the most dangerous aspects of gaming tools aimed at evaluating researchers and the malicious effects they can have on researchers’ behaviour. Therefore our aim is to demonstrate how easily anyone can manipulate Google Scholar’s tools. But, contrarily to Labbé, we will not emphasize the technical aspects of such gaming, but its sociological dimension, focusing on the enormous temptation these tools can have for researchers and journals’ editors, eager to increase their impact. In order to do so, we will show how the bibliometric profiles of researchers and journals can be modified simultaneously in the easiest way possible: by uploading faked documents on our personal website citing the whole production of a research group. It is not necessary to use any type of software for creating faked documents: you only need to copy and paste the same text over and over again and upload the resulting documents in a webpage under an institutional domain. We will also analyse Google’s capacity to detect retracted documents and delete their bibliographic records along with the citations they make.

This type of study by which false documents are created in order to evidence defects, biases or errors committed by authors has been used many times in scientific literature, especially in the research evaluation field. The reader is referred to the works of Peters & Ceci (1990), Epstein (1990), Sokal (1996, 1997) or Baxt et al. (1998) when demonstrating the deficiencies of the peer review method as an objective, reliable, valid, efficient and free of errors quality control tool over content published in scientific journals. Or Scigen1 , a programme created by three students from the MIT for generating random papers in the Computer Science field including graphs, figures and references. All of these works raised an intense debate within the research community.

Therefore, this paper is structured as follows. Firstly we described the methodology followed; how were the false documents created and where were they uploaded. Then we show the effect they had on the bibliometric profiles of the researchers who received the citations and we emulate the effect these citations would have had on the journals affected if GS Metrics was updated regularly. We analyse the technical effects and the dangerous these tools entail for evaluating research. Finally we conclude emphasizing their strengths and some concluding remarks.

 

2. MANIPULATING DATA: THE GOOGLE SCHOLAR EXPERIMENT

In order to analyse GS Citations’ capacity to discriminate academic works from those which aren’t and test the grade of difficulty for manipulating output and citations in Google Scholar and its bibliometric tools (GS Citations and Metrics), we created false documents referencing the whole research production of the EC3 research group (Science and Scientific Communication Evaluation) available at http://ec3.ugr.es in the easiest possible way. This way we intend to show how anyone can manipulate its output and citations in GS Citations.

Figure 1. Fake documents authored by the non-existent researcher MA PantaniContador
Figure 1. Fake documents authored by the non-existent researcher MA PantaniContador

 

 

 

 

 

 

 

 

 

Following the example set by Labbé (2010), we created a false researcher named Marco Alberto Pantani-Contador, making reference to the great fraud the Italian cyclist became at the end and the accidental causes that deprived the Spanish cyclist from winning the Tour. Thus, Pantani-Contador authored six documents (figure 1) which did not intend to be considered as research papers but working papers. In a process that lasted less than a half day’s work, we draft a small text, copied and pasted some more from the EC3 research group’s website, included several graphs and figures, translated it automatically into English using Google Translate and divided it into six documents. Each document referenced 129 papers authored by at least one member of the EC3 research group according to their website http://ec3.ugr.es. That is, we expected a total increase of 774 citations.

Afterwards, we created a simple webpage under the University of Granada domain including references to the false papers and linking to the full text, in order to let Google Scholar index the content. We excluded other services such as institutional or subjectbased repositories as they are not obliged to undertake any bibliographic control rather than a formal one (Delgado López-Cózar, 2012) and they were not included in the aims of this study.

The false documents were uploaded on 17 April, 2012. Due presumably because it was a personal website and not a repository, Google indexed these documents nearly a month after they were uploaded, on 12 May, 2012. At that time the members of the research group used as study case along with the three co-authors of this paper, received an alert from GS Citations pointing out that some MA Pantani-Contador had cited their Works. The citation explosion was thrilling, especially in the case of the youngest researchers where their citation rates were multiplied by six, notoriously increasing in size their profiles.

 

Untuk selanjutnya silahkan di baca di http://digibug.ugr.es/bitstream/10481/20469/2/scholar_en.pdf

 

 

Silakan berikan komentar, pertanyaan, maupun sanggahan. InsyaAllah dibalas secepatnya.