po polskuпо русскиauf deutschin English

Artificial Intelligence Fundamental Research Laboratory

of the Institute of Computer Science of Polish Academy of Sciences


Our Team


Our Research

Our team at the Artificial Intelligence Fundamental Research Laboratory has been conducting intensive research on the leading challenges of Artificial Intelligence (also called Computational Intelligence) for four decades. Artificial Intelligence (AI) is a branch of computer science that deals with solving problems for which there are no algorithmic solutions or they are computationally too complex. In this spirit, the Team participated in the development of a system for analyzing data on the health effects of the Chernobyl disaster, a system supporting the diagnosis of hand injuries, a system for distributed knowledge extraction from medical data, a system for pro-ecological optimization of the power supply of Polish power plant network, a system for assessing candidates for the pilot profession, the first Polish large-scale semantic internet search engine, consumer price development evaluation system and many others.

Research on specific applications of AI was coupled with the development of inference and learning theories for uncertain and incomplete information (including Bayesian networks and Dempster-Shafer theory), the development of optimization methods inspired by nature (including immune networks, herd, genetic and extreme optimization algorithms), methods of extracting knowledge from numerical data, text and hypertext (new algorithms for cluster analysis and classification, including in the field of graph spectral analysis, new methods for extracting relationships of hierarchical concepts and simple relationships from natural language texts), new semantic methods of plagiarism detection and others.

Currently, the Team has undertaken the hottest and most important challenge of developing Explainable Artificial Intelligence (XAI) methods. XAI is a response to industry objections that artificial intelligence methods such as deep neural networks, evolutionary algorithms and other operate on the principle of a "black box", while only transparent methods are trusted by business. Our Team took on a particularly difficult challenge, i.e. achieving explainability in the field of cluster analysis of text documents, especially those clustered using spectral methods. The basic difficulty lies in the lack of a coherent axiomatic system for cluster analysis. What is more grievvant, ​spectral methods detach the representation of clusters from the textual content of documents. Our achievements in this area include:

A Couple of Publications

  1. M.A. Kłopotek: Spectral Clustering Versus Watanabe Theorem. Studia Informatica. Systems and Information Technology Vol. 30 No. 1 (2025). pp. 21-34. LINK, DOI 10.34739/si.2024.30.02
  2. {Bart{\l}omiej Starosta and Mieczys{\l}aw A. K{\l}opotek and S{\l}awomir T. Wierzcho{'n} : Approaches to {Explain}ability of Output of Graph Spectral Clustering Methods. IN: Dariusz Mikułowski and Artur Niewiadomski: Design and Implementation of Artificial Intelligence Systems. University of Siedlce, 2025, series: Intelligent Systems and Information Technology. ISBN 978-83-68355-16-1. pp= 5--31.
  3. Mieczysław A. Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Dariusz Czerski, Piotr Borkowski: Dependence of Spectrogram from Graph Spectral Clustering in Text Document Domain on Word Distribution Models. Proceedings of the 2nd Conference on Intelligent Systems and Information Technologies (ISIT 2024), Siedlce, 23-25.9.2024AD, Poland, pp.31-36. link.
  4. Mieczysław A. Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Dariusz Czerski, Piotr Borkowski: Towards Explaining the Spectrogram of Graph Spectral Clustering in Text Document Domain. Saeed, K., Dvorský, J. (eds) Computer Information Systems and Industrial Management. 23rd International Conference, CISIM 2024, Bialystok, Poland, September 27–29, 2024, Proceedings CISIM 2024 DOI https://link.springer.com/chapter/10.1007/978-3-031-71115-2_26 Lecture Notes in Computer Science, vol 14902. Springer, Cham. pp. 372-386.
  5. Borkowski P, Kłopotek MA, Starosta B, Wierzchoń ST, Sydow M (2023) Eigenvalue based spectral classification. PLoS ONE 18(4): e0283413. https://doi.org/10.1371/journal.pone.0283413
  6. Bartłomiej Starosta: Set-theoretic relations for metasets. Journal of Experimental & Theoretical Artificial Intelligence, 2022, Vol. online, s. 1-15 , LINK.
  7. Mieczyslaw A. Klopotek, Robert A. Klopotek: Towards Continuous Consistency Axiom. Applied Intelligence (2022) DOI https://doi.org/10.1007/s10489-022-03710-1 Springer Verlag, Earlier version: CoRR abs/2202.06015 (2022) [i45]

Our Search Engine for Polish Internet

stopped due to financial problems

The research group developed a massively parallel search engine NEKST to work with the Polish Internet resources in a novel way. Our specialty is systematizing online resources, and making their systematics perceivable to the user. Systematization is understood as automatic distribution of online resources into thematic groups, highlighting thematic channels in websites, labeling and categorizing documents and their groups. From the user's point of view, this translates into not only a more precise document identification - systematization enables also contextual search of both individual documents and their groups, such as channels or services, and diversification of the search engine response.

By diversification we mean variation in response, so that the user can see not only the best documents, but also the variety and thematic ambiguity, such as, for example, in the classic question regarding "game", which may either refer to playing or represent a term understandable for hunters only.

Taking into account the context is important when looking for a document that is comprehensible only in the context of other documents in a particular thematic channel. For example, when asking a search engine about the tires - which is fairly common in autumn and spring seasons - we would expect to receive in response links to websites of tire manufacturers or tire shops rather than, e.g., to sites about hard work that makes us tired. Making use of the context will allow the search engine to return links to documents with contents containing the word "tire" in which the word "car" does not occur.

Systematization understood in this way will be a useful tool for many groups of users. Scientists and entrepreneurs will be able to look for potential partners or competitors in the market. On the other hand, systematizing will help them identify interesting research areas or gaps in the market that can be exploited.

Our NEKST system, developed as part of the POIG.01.01.02-14-013/09 project, is an advanced technological solution enabling large-scale retrieval and semantic processing of data from the Polish Internet. There are approximately 2.5 million websites in Poland, storing over two billion documents.

This volume of documents presents a challenge for both data collection, indexing, and retrieval, especially since NEKST goes beyond traditional text processing, enriching it with semantic indexing, categorization, classification, fact retrieval, automatic knowledge graph creation from online documents, duplicate document and website detection, search for documents similar not only lexically but also semantically, innovative document analysis methods that allow for the identification of the origin of individual document fragments through rapid comparison with all online resources, and the ability to respond to queries in natural language (Polish), which required the development of globally innovative algorithms and methodologies.

The system not only utilizes human-generated semantic resources but also discovers IS-A semantic relationships from its own document database using advanced text analysis methods and proprietary solutions complementary to those known from the literature.

The search engine's design was closely linked to the development of new high-performance syntactic algorithms for Polish language analysis, cluster analysis methods for documents and websites, a proprietary high-performance document database, new fast document ranking methods that eliminate the practical shortcomings of classic PageRank, as well as the creation of engineering solutions for spider systems, indexes, and more.

The multi-scale nature, coverage of the entire Polish Internet, and semantic classification make the relevant components of the NEKST system a valuable tool for providing reference data to the Uniform Anti-Plagiarism System (JSA), which has been a mandatory tool for verifying the originality of all diploma theses (bachelor's, engineer's, master's) and doctoral dissertations in Poland since 2019. Creating a reference set requires, on the one hand, combing the entire Polish Internet, and on the other hand, filtering out irrelevant documents (e.g. stores and many others), which has a significant impact on the speed of JSA.

NEKST data is crucial for detecting plagiarism from Polish online sources, when they constitute a significant portion of theses. The system enables effective searches of Polish online resources, a goal previously difficult to achieve due to the limited performance capabilities of standard search engines. Between September 1, 2023, and August 31, 2024, the JSA system examined 319,656 works, of which 62,748 works (19.6%) contained results from NEKST sources, and 1.7% of the works had a degree of borrowing exceeding 70%.

Thanks to JSA with the NEKST component, the number of serious plagiarism cases has decreased by one-third in just three years. This will undoubtedly result in a general improvement in the level of education nationwide and, in the future, accelerate technological development and economic growth.

In summary, usage of the NEKST system has:

  1. Impact on the Polish higher education system: Data from the NEKST system is used by the Uniform Anti-plagiarism System (JSA), which is a mandatory tool for verifying the originality of all bachelor's, master's and doctoral theses in Poland since 2019. Between September 1, 2023, and August 31, 2024, 319,656 theses were examined in the JSA system, of which 62,748 (19.6%) contained results from NEKST sources, including 1.7\% with a reprehensible level of borrowing. This means that almost one in five theses contained borrowings from Polish online sources, and almost one in twenty contained reprehensible borrowings, as detected thanks to using NEKST data.
  2. Impact on education quality: The NEKST system enables detection of plagiarism from Polish internet sources, which is particularly important since plagiarism very often originates from Polish internet sources. Before the introduction of NEKST, the process of checking whether a thesis contains borrowings from Polish internet sources was computationally demanding and difficult to implement due to limited performance capabilities of standard full-text search engines.
  3. Impact on academic standards: The use of NEKST data in the JSA system contributes to maintaining high academic standards through effective plagiarism detection, which in turn affects the quality of education and the credibility of the Polish higher education system.
  4. Impact on legal compliance: The use of NEKST data in the JSA system enables better enforcement of the Higher Education Act, fulfilling its requirements in the field of combating plagiarism.

Our Books