publications | Faizhal Arif S.

2025

UKSG

Multilingual research dissemination: Current practices and implications for bibliometrics

Faizhal Arif Santosa and Barbara S. Lancho Barrantes

Insights: the UKSG Journal, Sep 2025

Abs DOI arXiv Supp

English is widely used as a lingua franca in scholarly communication, yet preserving local languages is vital to reaching a broader audience. Disseminating research in multiple languages can help ensure equitable access, a responsibility shared by both publishers and authors. This study examines the practices of both groups to identify any notable differences. Several academic social networks, preprint servers, and repositories are analysed to evaluate the resources currently available and their existing policies. Additionally, journals that actively promote multilingual dissemination are reviewed to understand their implementation strategies and how these align with the standards set by the DOI Registration Agency (DOI RA). From the author’s perspective, differing policies across platforms can heavily influence decisions, mainly because not all platforms provide relationship metadata. Publishers face similar challenges, underscoring the urgent need for standardisation. Moreover, the lack of consistency creates opportunities for unethical practices in academia, such as counting total of citations originating from the same article in different languages. This highlights the importance of a more comprehensive approach to evaluating research beyond citation and document counts. Collaboration among publishers, authors, and other stakeholders is essential to fostering greater understanding and preventing misconceptions in the academic landscape.
JLIS.it

Artificial Intelligence in Library Studies: A Textual Analysis

Faizhal Arif Santosa

JLIS.it, Jan 2025

Abs DOI Supp

Artificial intelligence has emerged as a promising technology in the post-pandemic era, significantly impacting the library ecosystem and the direction of library and information science studies. This study aims to map AI-related research in libraries to identify opportunities and discuss future directions. Using textual analysis of data from Scopus, the study analyzed article titles with the burst detection algorithm and abstracts with scattertext and lemmatization. Six burst words were detected out of twelve frequently appearing in titles. Scattertext results showed a comparison between the service side (red) and the development side (blue) in libraries. Research increasingly focuses on AI utilization for library services and natural language processing (NLP) to enhance services. On the development side, AI involves product creation and encompasses AI literacy frameworks, policies, and their impact on libraries. AI affects studies in libraries by changing application methods, such as machine learning and NLP. Future research will become more diverse, considering the unique characteristics of each library.

2024

Proc. Assoc. Inf. Sci. Technol.

Coconut Libtool: Bridging Textual Analysis Gaps for Non‐Programmers

Faizhal Arif Santosa, Manika Lamba, Crissandra George, and 1 more author

Proceedings of the Association for Information Science and Technology, Oct 2024

Abs DOI arXiv Supp Poster Website

In the era of big and ubiquitous data, professionals and students alike are finding themselves needing to perform a number of textual analysis tasks. Historically, the general lack of statistical expertise and programming skills has stopped many with humanities or social sciences backgrounds from performing and fully benefiting from such analyses. Thus, we introduce Coconut Libtool (www.coconut-libtool.com/), an open-source, web-based application that utilizes state-of-the-art natural language processing (NLP) technologies. Coconut Libtool analyzes text data from customized files and bibliographic databases such as Web of Science, Scopus, and Lens. Users can verify which functions can be performed with the data they have. Coconut Libtool deploys multiple algorithmic NLP techniques at the backend, including topic modeling (LDA, Biterm, and BERTopic algorithms), network graph visualization, keyword lemmatization, and sunburst visualization. Coconut Libtool is the people-first web application designed to be used by professionals, researchers, and students in the information sciences, digital humanities, and computational social sciences domains to promote transparency, reproducibility, accessibility, reciprocity, and responsibility in research practices.
RLJ

Exploring topics of the female librarians: Topic modelling approach on research articles

Savira Arumdini, Ria Ariani, and Faizhal Arif Santosa

Record and Library Journal, Jun 2024

Abs DOI Supp

Female librarians often face limitations in their professional development and encounter various challenges. Previous studies have shown that while many articles focus on women librarians as a subject, few delve into the topics discussed. This research aims to find out which topics are developing in the world of libraries, with a specific focus on female librarians. This study uses topic modelling to explore abstracts from documents discussing female librarians, using BERTopic, scattertext, and VOSviewer to identify emerging topics from data obtained from Scopus. A total of 6 topics were determined, where Topic 0 and Topic 3 had the highest similarity. At the same time, keyword analysis did not reveal any particularly prominent keywords in the 2020s. The discussion on female librarians covers topics such as professional advancement, work-life balance, knowledge gaps in technology, stereotypes, and the correlation between these topics. This study provides an overview of text analysis that librarians can use to identify topics in a collection of texts, such as abstracts, and examine how different topics relate to each other, as a single document can reflect multiple topics.

2023

OPIS

Adding Perspective to the Bibliometric Mapping Using Bidirected Graph

Faizhal Arif Santosa

Open Information Science, Jan 2023

Abs DOI Supp Website

Bibliometric mapping offers easiness in analyzing the relationship between publications through the network visuals created. Several applications, such as VOSviewer, Bibliometrix, and CiteSpace, make conducting network analysis more convenient. Moreover, the relationship provided is usually in the form of an undirected graph, which negates the two-way relationship created. This study attempts to demonstrate the significance of considering two-way relationships by proposing a keyword network formed using bidirected graphs and association rules to examine the two-way relationship of two or more keywords. According to the proposed bidirected graph, a two-way graph can add value and insight by analyzing the correlation between a single keyword and several others. Two of the four metrics used, Confidence and Conviction, are sufficient to support directed graphs. In contrast, Support and Full Counting are related because they both see the occurrences of a keyword, so using undirected graphs is necessary.
ITAL

Exploring Final Project Trends Utilizing Nuclear Knowledge Taxonomy: An Approach Using Text Mining

Faizhal Arif Santosa

Information technology and libraries, Mar 2023

Abs DOI Supp

The National Nuclear Energy Agency of Indonesia (BATAN) taxonomy is a nuclear competence field organized into six categories. The Polytechnic Institute of Nuclear Technology, as an institution of nuclear education, faces a challenge in organizing student publications according to the fields in the BATAN taxonomy, especially in the library. The goal of this research is to determine the most efficient automatic document classification model using text mining to categorize student final project documents in Indonesian and monitor the development of the nuclear field in each category. The kNN algorithm is used to classify documents and identify the best model by comparing Cosine Similarity, Correlation Similarity, and Dice Similarity, along with vector creation binary term occurrence and TF-IDF. A total of 99 documents labeled as reference data were obtained from the BATAN repository, and 536 unlabeled final project documents were prepared for prediction. In this study, several text mining approaches such as stem, stop words filter, n-grams, and filter by length were utilized. The number of k is 4, with Cosine-binary being the best model with an accuracy value of 97 percent, and kNN works optimally when working with binary term occurrence in Indonesian language documents when compared to TF-IDF. Engineering of Nuclear Devices and Facilities is the most popular field among students, while Management is the least preferred. However, Isotopes and Radiation are the most prominent fields in Nuclear Technochemistry. Text mining can assist librarians in grouping documents based on specific criteria. There is also the possibility of observing the evolution of each existing category based on the increase of documents and the application of similar methods in various circumstances. Because of the curriculum and courses given, the growth of each discipline of nuclear science in the study program is different and varied.
ISTL

Tips from the Experts Prior Steps into Knowledge Mapping: Text Mining Application and Comparison

Faizhal Arif Santosa

Issues in Science and Technology Librarianship, Mar 2023

Abs DOI Supp Website

Bibliometrics is increasingly being used by the knowledge community and librarians to easily analyze patterns in knowledge. In the field, the use of data from databases that provide bibliometric information is not always completely clean, so pre-processing is required. Several previous studies have shown that bibliometric analysis begins with a simple pre-processing step. The goal of this research is to use text mining to perform pre-processing to find the basic terms of the keywords that appear – to essentially construct a controlled vocabulary for a bibliographic dataset. The method used in this study is cleaning keywords with the stemming method using RapidMiner software. Bibliometrix was used to compare the results. A total of 85 keywords were combined into basic words. Using the built process, this study discovers differences in the network built between raw data and data that has been pre-processed, resulting in differences in the analysis that will be produced. The built process can also be reused in a variety of real-world situations.

2022

RLJ

Clustering of Librarians’ Initial Knowledge on the Theme of Training

Faizhal Arif Santosa and Dedi Suprianto

Record and Library Journal, Dec 2022

Abs DOI Supp

The implementation of librarian competency development through training in Sidenreng Rappang Regency, South Sulawesi Province, was carried out by dividing librarians based on the location of their agency’s work area. In practice, there are training barriers, namely differences in absorption of the material due to limited training time and differences in initial knowledge of the training material. Purpose: According to the librarian’s prior knowledge of the training to be held in future, this study attempts to determine the best grouping and number of participants. The methodology used in this research is Cross Industry Standard Process for Data Mining (CRISP-DM) which consists of 6 stages. The data collection technique used a questionnaire with a linear numerical scale from a score of 0 to 10 to 97 librarians in Sidenreng Rappang Regency. Data were analyzed using the K-Means algorithm to determine the number of groups and the number of librarians in each group and evaluated using the Davies-Bouldin index (DBI) algorithm to determine the most optimal group division. According to this study, the best number of groups for training in the processing of library materials is two under a DBI value of 0.68983. With a DBI value of 0.69431, the best number of groups is two in the library promotion training. the library service training had the best number of groups of 2 with a DBI value of 0.65698. Meanwhile, for INLISLite-based automation training, the best number of groups is two groups with a DBI value of 0.65500.