Journal influence
Higher Attestation Commission (VAK) - К1 quartile
Russian Science Citation Index (RSCI)
Bookmark
Next issue
№2
Publication date:
16 June 2024
Latest issue articles
Order result by: Public date | Title | Authors |
1. Methods and means of term extraction from texts for terminological tasks [№1 за год]
Authors: Bolshakova, E.I., Semak, V.V.
Visitors: 1045
The paper describes the current state in the field of automatic term extraction from specialized natural language texts, including scientific and technical documents. Practical applications of methods and tools for extracting terms from texts include creation of terminological dictionaries, thesauri, and glossaries of problem oriented domains, as well as extraction of keywords and construction of subject indexes for specialized documents. The paper overviews approaches to automatic recog-nition and extraction of terminological words and phrases, which cover traditional statistical methods and methods based on machine learning by using term features or with modern neural network transformer-based language models. A comparison of the approaches is presented, including quality assessments for term recognition and term extraction. The most well-known software tools for automating term extraction within the statistical approach and learning by features are indicated. Authors' studies on term recognition based on neural network language models are described, being applied to Russian scientific texts on mathematics and programming. The data set with terminological annotations created for training term recognition models is briefly characterized, the dataset covers the data from seven related domains. The term recognition models were developed on the basis of pre-trained neural network model BERT, with its additional training (fine-tuning) in two ways: as binary classifier of candidate terms (previously extracted from texts) and as classifier for sequential labeling words in texts. For the developed models, the quality of term recognition is experimentally evaluated, and a comparison with the statistical approach was carried out. The best quality is demonstrated by binary classification models, significantly surpassing the other considered approaches. The experiments also show the applicability of the trained models to texts in close scientific domains.
2. Automated identification of hype technologies: Semantic analysis [№1 за год]
Authors: Loginova. I.V., Piekalnits, A.S., Sokolov, A.V.
Visitors: 1177
The research focuses on inflated public expectations of new technologies, or hypes. The paper presents the results of the development and testing an automated methodology for the automated identification of hypes among technological topics based on their textual trace in the digital technology field. The amount of new technological developments in the world is constantly growing, however their real potential for practical application varies greatly. Therefore, it is important to understand reliable factors to distinguish trends from hypes. Typically, industry and technology experts suggest that the possible signs of hypes include the following ones: absence of a stable business model, an unformed or obviously limited consumer market, and a large number of more effective alternatives. Identifying hypes in the technology agenda remains a difficult analytical task. This is due to the terminological inconsistency, the expert nature of the task, insufficiently developed methodological approaches, and the lack of specific technical tools. The method described in this paper involves the extraction of terms referring to technologies using natural language processing and computational linguistics techniques. These terms are extracted from several dozens of millions of different types of text documents, such as scientific publications, patents, and market analytics. The method also includes calculating an objective measure of each technology's “hype” and constructing a visual map that illustrates the technology landscape that allows separating sustainable trends from potential hypes. Decision makers can use such hype maps in conjunction with other analytical results to identify priority development areas, analyze current and forecast future trends, and in risk management.
3. Genetic algorithm for placing requirements in a flow-type production process planning problem [№1 за год]
Authors: Kibzun, A.I., Rasskazova, V.A.
Visitors: 1056
The paper discusses the problem of planning flow-type production processes. In terms of a cascade scheme, the complex solution covers the stage of assigning preparatory units and the subsequent stage of forming detailed technological routes to fulfill a given set of requirements on time and taking into account the constraints on permissible processing durations at each processing stage. This scheme comes as a part of a problem-oriented computing complex. However, due to a number of natural reasons, the problem may become inconsistent right at the stage of assigning preparatory units. One of the ways to overcome these difficulties is to develop and implement penalty function algorithms to find the maximum joint subsystems in inconsistent optimization problems. The paper proposes an ideologically different approach for this purpose. It is based on considering the preliminary stage of requirement placement in such a way that the subsequent stages of problem-solving process are guaranteed to be solvable. The requirement placement is formalized as a search for an optimal mapping that minimizes “potential” workload on preparatory units during the planning period. To solve this problem, the authors of the paper have developed a genetic algorithm, which resulted in a significant advantage in terms of speed in comparison with fundamental approaches of mathematical programming (for example, integer linear programming models). In order to reduce the risk of population extinction at each iteration of the genetic algorithm, the authors apply the rule of unconditional migration of a representative with the lowest criterion value. This approach also provides effective convergence indices of the algorithm in terms of the number of iterations without significant improvement of the objective function. The developed genetic algorithm is implemented as a stand-alone module of a computing system for solving process manufacturing scheduling problems. The authors conducted a computational experiment using this module in terms of a comparative analysis of the solution quality of the initial complex problem.
4. Extracting structured data from the Chronicle of the Life and Work of A.S. Pushkin: Hybrid approach [№1 за год]
Authors: Kokorin, P.P., Kotov, A.A., Kuleshov, S.V., Zaytseva, A.A.
Visitors: 977
The paper discusses the problem of creating a software infrastructure for systematization, annotation, storage, search and publication of manuscripts and other digital materials. The research focuses on the materials related to the life and work of A.S. Pushkin. These materials form an important part of the scientific and educational resource “Pushkin Digital”. The problem is relevant due to the need to preserve the Russian author's heritage under conditions of digital transformation of philo-logical, source and bibliographic studies into their works. This is a part of the national projects of the Russian Federation “Education”, “Culture”, “Science and Universities”. It is especially important to extract a structured text from bitmap images of pages from A.S. Pushkin's Chronicle of Life and Work volumes to use it in the developing systems of storage, systematization, publication of library, archival, museum, phonographic and other funds and collections and partial automation of philological, source and bibliographic research. The paper proposes a hybrid approach based on the a priori data about the structure of page layout elements, OCR technologies (text recognition based on Tesseract library) and verification methods. The peculiarity of the developed verification methods is using regular expressions for extracting structured data from prerecognized text and automated text processing pipeline in the GitLab assembly system. The paper demonstrates satisfactory results of the proposed hybrid approach. The approach minimizes the manual post-processing of the obtained data by proof-reading the results posted on the research and education resource. The results are useful not only in the research and educa-tional resource Pushkin Digital under development, but also in other projects, which require recognition and automated processing of large volumes of digitized author's texts, archival and other paper documents.
5. Modeling the reliability of software components of cyber-physical systems [№1 за год]
Authors: Privalov, A.N., Larkin, E.V., Bogomolov, A.V.
Visitors: 905
This relevance of the research is due to the reliability of software components of cyber-physical systems being a key component of their effective functioning. Its appropriate mathematical modeling is essential for the economy digitalization. The paper aims to eliminate the disadvantages of known approaches to modeling the reliability of software components, when the estimates of reliability characteristics are based on empirical data on the errors detected during the testing of programs. Hence, the testing results mostly depend on test duration and on the completeness of the processed data area coverage by the sub-area of the data generated during testing. This reduces the efficiency of reliability estimation. The research focuses on the reliability modeling methods for software components of cyber-physical systems. The reliability is characterized by the delay time in the feedback loop between components. The authors of the paper used methods of software engineering, reliability theory, probability theory and Markov processes. The main result is mathematical reliability models of software components of cyber-physical systems. They combine semi-Markov models of software components, generations of their faults and failures. The developed mathematical models are based on the structural-parametric semi-Markov model of software faults and failures. Its parameters are determined by the computational complexity and requirements to the software taking into account its functional purpose. The authors obtained formalized descriptions of Poisson flows of faults and failures of software components of a cyber-physical system. The practical relevance of the paper is due to its application for determining the reliability of software components at all stages of a cyber-physical system life cycle, where elements interact, self-adjust and adapt to changes using standard software-implemented protocols.
6. Planning computations in real-time systems: Efficient algorithms for constructing optimal schedules [№1 за год]
Authors: Kononov, D.A. , Furugyan, M.G.
Visitors: 813
The paper discusses issues related developing one of the main blocks of a real-time computing system, specifically the computation scheduling block. The authors propose algorithms for constructing optimal schedules for different cases depending on the number of processors and characteristics of works and computing system resources. For the single-processor case with interruptions and directive intervals, they improved the relative urgency algorithm using a heap for data storage. This contributed to lowering the algorithm computational complexity. The authors also developed an algorithm for a problem with a partial order of job execution. It bases on the precorrection of ready moments and directive deadlines and on the reduction of the original task to a task without precedence relations. For the multiprocessor case with interruptions and directive intervals, the authors proposed an approximate algorithm that is based on a generalization of the single-processor relative urgency algorithm to the multi-processor case. The authors performed a comparative analysis with the exact stream algorithm. They proved that the problem is NP-hard when interruption and switching time costs are taken into account. For the multiprocessor case without interruptions and switches with a common directive interval for all works and identical processors, the authors developed a pseudo-polynomial algorithm based on a limited search of options. The authors also created an approximate algorithm for a system with renewable and non-renewable resources, as well as for a complex with a mixed set of works (both continuous and allowing interruptions and switching). The algorithm is based on network modeling and reducing the problem under study to the search for a stream with certain properties in a special network.
7. A system of verifiable software component specifications with embedding and extraction [№1 за год]
Author: Shapkin, P.A.
Visitors: 871
This paper focuses on specification and verification of software systems and their components. It researches a unified specification language that correlates with both random testing systems and static verification tools based on type systems. A variety of programming languages, configuration systems, deployment and other tools require developers to make efforts to integrate them. Verifiable component specifications help to simplify the task. The paper proposes an approach to a unified specification representation integrated with systems for both static type checking and dynamic testing. This solution relies on methods of applicative computing and type theory. It is a conceptual framework for building specifications embedded in various software environments. The lack of static verification capabilities due to limited type systems is eliminated by dynamic testing to some extent. The author implements testing by interpreting specifications into definitions for property-based random testing systems. The practical significance of the proposed approach is automation of the process of constructing typed wrappers, or facades, which are essential for using components from less typed environments in programming languages with more expressive type systems. The approach automates both the verification of such wrappers and the methods of their construction by defining specification refinement operations. In practice, this allows detecting errors in typing of third-party components at early development stages. The paper gives examples of program specifications with side effects. A basis for specifications is category theory formalizations. The author also analyzes approaches to translating specifications into other representations and to iteratively improving specifications by transforming them.
8. Simulation modeling of physical protection systems in AKIM environment [№1 за год]
Authors: Senichenkov, Yu.B., Sharkov, I.K.
Visitors: 1001
The paper discusses the methodology of building simulation models in the domestic software package AKIM. The models focus on solving the problem of analyzing the security level of existing and designed systems of physical protection of objects. They also use statistical experiment to form estimates of such systems’ effectiveness. The authors give a review of existing modern approaches to solving a similar problem. Most approaches apply Markov chains to search for vulnerable paths, as well as attack and defense graphs to assess the system effectiveness. Alternatively, it is suggested to build a simulation model without building an attack and defense graph, relying only on a physical defense system plan. The model in the AKIM environment consists of base class instances that model real elements of a physical protection system. As a result, there is a plan for models of agents and guards to move, simulating real attacks. The approach allows describing in detail the functions, reactions and capabilities of the system at the level of its elements and specifying the actual parameters of intruders and guards. This ensures accuracy and completeness of the analysis without simplification or exclusion of important details. Demonstration examples show that efficiency estimates of system protection models obtained by AKIM software package are close to efficiency estimates of system models built using Markov chains. In this case, the considered method of building simulation models allows overcoming the difficulties associated with using Markov chains: the need to use expert estimates of the coefficients of the transition matrix, large size matrices, the complexity of model modification.
9. Author's metric for assessing proximity of programs: Application for vulnerability search using genetic de-evolution [№1 за год]
Authors: Buynevich, M.V., Izrailov, K.E.
Visitors: 867
The paper is relevant due to the tasks in the field of information security that require comparison of programs in their different representations, for example, in text assembly code (e.g., for vulnerability search or authorship verification). The paper presents a proximity metric for two texts in the form of a character string list, which is a development of its previous author's version. The main result of the current study (as a part of the main study aimed at genetic de-evolution of programs) is the metric itself, as well as its characteristics and peculiarities revealed through experiments. The paper presents the metric in analytical form implemented in Python. The metric takes at the input two lists of character lines for comparison, and the coefficients of taking into account the element position from the beginning of the list and the character sequence. The calculation result is a numeric value in the range from 0 to 1. Metric's novelty is in a sufficiently accurate and sensitive assessment of two texts' proximity regardless of data representation formats. The current metric version differs from the previous one by taking into account the mentioned coefficients. Theoretical significance lies in the development of comparing methods for arbitrary texts that are a list of character lines containing information, which appears sequentially according to a certain logic (requires position consideration). Besides the general purpose of comparative tools like this, the metric is practically relevant due to the possibility of determining the proximity of two programs. These programs have a binary representation of the machine code. It is pre-transformed into a textual representation of an assembly code.
10. A framework for automating equipment remaining life prediction when building proactive decision support systems [№1 за год]
Authors: K.S. Zadiran, Volkova, D.A., M.V. Shcherbakov
Visitors: 961
The paper describes a framework for automating research in designing proactive decision support systems. In particular, it investigates the problem of time series analysis and prediction in order to create tools for automating prediction of various processes in asset management systems, including maintenance and repair. The authors identify the role of automation processes in asset management in these systems. They highlight the main factors influencing the choice of a program to implement a predictive analytics system. The authors propose an algorithm for solving the problem of predicting remaining useful life based on analyzing production asset data using artificial intelligence components. The proposed software solution is based on CRISP-DM. It is not a separate software product and can be embedded in existing software to support the possibility of modifying methods. The framework loads and preprocesses data, builds predicting models, predicts time series and evaluates the forecast. The developed framework has a flexible modular architecture for adding new methods of analysis and predicting. The possibility to redefine and implement own data sources, preprocessing stages, forecasting models and metrics on the basis of existing base classes extends the variability and increases the efficiency of its functioning. There is the example of using the framework to solve the problem of analyzing time series and determining equipment remaining useful life. It demonstrates the efficiency of the developed product in the field of data exploration and artificial intelligence.
| 1 | 2 | Next → ►