This application enables to reverse-engineer a microservices architecture from a data perspective.
%%% Cite the paper
@inproceedings{andre2025a,
title = {Data Access-centered Understanding of Microservices Architectures},
author = {Andr{\'e}, Maxime and Rivière, Etienne and Cleve, Anthony},
booktitle = {Proceedings of the 22nd International Conference on Software Architecture (ICSA 2025): NEMI Track},
year = {2025},
organization = {IEEE Computer Society Press},
doi = {https://doi.org/10.1109/icsa-c65153.2025.00007}
}Here is a summary of the features currently supported.
The static analysis by AST feature enables the developer to (1) retrieve one or more GitHub/GitLab microservices, (2) statically analyze them through an AST, (3) identify, through heuristics, data access code fragments linked to certain API or database technologies and likely to change during the evolution phase, implying then the propagation of changes in components (e.g. other microservices or databases), (4) extract thanks to NLP the data concepts of those data access code fragments, (5) link the data access code fragments with related data concepts, (6) compare data concepts with other ones in the same microservice, (7) associate same data concepts, (8) present the result as a report, in a defined model, designed to help developers understand microservices in a conceptual data approach, so that they can pay attention to them when co-evolving API and database accesses. This report aims to provide developers with a valuable basis for software evolution tasks, such as re-documentation, visualization, quality assessment, improvement recommendations, impact analysis, or change propagation. This analysis is slower compared to the static analysis by NLP/TR, but it retrieves more information.
Here is a summary of languages and technologies currently supported:
| Language | Technology | Implementation status |
|---|---|---|
| JavaScript/TypeScript | MongoDB Redis Express |
🌕 🌕 🌕 |
INPUT
Invoke the static analysis by using the POST /static/ast
root with a ZIP file inside the request as a form-data with the key file and the ZIP file as value, and not as a
binary file. This zip file can be generated from GitHub/GitLab repositories thanks to DENIM Downloading.
Options, such as the language of the analysis, must be given in the options field with the key options as follows:
{
"language": "<language>" // The language of the analysis
}Optional hints can be provided through the body to guide the analysis with in/out (i.e., include, exclude) keywords hints (e.g., grammar of a library, list of conceptual schema concepts, etc.). This body should be structured as follows:
{
"hints": {
"int": ["<int term 1>", "<int term 2>", "...", "<int term X>"], // Inclusion keywords
"out": ["<out term 1>", "<out term 2>", "...", "<out term X>"] // Exclusion keywords
}
}OUTPUT
Consult the response object:
[
{
// A repository
"directories": [
{
// A directory
"location": "https://github.com/<user>/<repository>",
"directories": [
// ...
],
"files": [
{
// A file
"location": "https://github.com/<user>/<repository>/.../<file path>.js",
"linesOfCode": <LoC>,
"codeFragments": [
{
// A code fragment
"location": "https://github.com/<user>/<repository>/.../<file path>.js#Lx1Cx1-Lx2y2",
"technology": {
"name": "<technology>" // E.g., javascript-api-express-call, javascript-db-mongo-call, javascript-db-redis-call.
},
"operation": {
"name": "<operation>" // E.g., CREATE, READ, UPDATE, DELETE, OTHER
},
"method": {
"name": "<method>" // E.g., post, get, findOne, sadd, etc.
},
"sample": {
"content": "<sample>" // E.g., a Redis key, a MongoDB object, etc.
},
"concepts": [
{
"name": "<concept>" // E.g., a route resource concept name, a Redis key name, a MongoDB, collection name, etc.
}
],
"heuristics": "<heuristics>", // The matching heuristics tracing.
"score": "<score>" // The computed likelihood score.
}
]
} // ...
]
} // ...
]
} // ...
]NOTE: The response format is the same as the one produced by the static analysis by NLP/TR because it relies on the same model.
NOTE: This analysis is slower compared to the static analysis by NLP/TR, but it retrieves more information.
The static analysis by NLP/TR (Natural Language Processing and Text Retrieval) feature enables the developer to (1) retrieve one or more GitHub/GitLab microservices, (2) browse the entire architecture considering source file as text, (3) identify data access code fragments linked to certain API or database technologies and likely to change during the evolution phase, implying then the propagation of changes in components (e.g. other microservices or databases), by performing lexical and statistical analysis to extract candidate code fragments and concepts from source files, (4) extract extra data, (5) filter and prioritize code fragments with statistical relevance metrics (e.g., TF-IDF, dominance, variation) and some hints, (6) compare data concepts with other ones in the same microservice, (7) associate same data concepts, (8) present the result as a report, in a defined model, designed to help developers understand microservices in a conceptual data approach, so that they can pay attention to them when co-evolving API and database accesses. This report aims to provide developers with a valuable basis for software evolution tasks, such as re-documentation, visualization, quality assessment, improvement recommendations, impact analysis, or change propagation. This analysis is faster compared to the AST analysis but it retrieves less information.
Here is a summary of languages and technologies currently supported:
| Language | Technology | Implementation status |
|---|---|---|
| JavaScript/TypeScript | Any | 🌕 |
| Java | Any | 🌕 |
INPUT
Invoke the static analysis by using the [POST /static/nlptr]
(http://locahost:3000/static/nlptr) root with a ZIP file inside the request as a
form-data with the key file and the ZIP file as value, and not as a binary file. This zip file can be generated
from GitHub/GitLab repositories thanks to DENIM Downloading.
The language is automatically detected.
Optional hints can be provided through the body to guide the analysis with in/out (i.e., include, exclude) keywords hints (e.g., grammar of a library, list of conceptual schema concepts, etc.). This body should be structured as follows:
{
"hints": {
"int": ["<int term 1>", "<int term 2>", "...", "<int term X>"], // Inclusion keywords
"out": ["<out term 1>", "<out term 2>", "...", "<out term X>"] // Exclusion keywords
}
}OUTPUT
Consult the response object:
[
{
// A repository
"directories": [
{
// A directory
"location": "https://github.com/<user>/<repository>",
"directories": [
// ...
],
"files": [
{
// A file
"location": "https://github.com/<user>/<repository>/.../<file path>.js",
"linesOfCode": <LoC>,
"codeFragments": [
{
// A code fragment
"location": "https://github.com/<user>/<repository>/.../<file path>.js#Lx1",
"technology": {
"name": "<technology>" // E.g., javascript-any-any-any
},
"operation": {
"name": "?"
},
"method": {
"name": "?"
},
"sample": {
"content": "<sample>" // i.e., the line of code
},
"concepts": [
{
"name": "<concept>" // E.g., a route resource concept name, a Redis key name, a MongoDB, collection name, etc.
}
],
"heuristics": "<heuristics>", // The matching heuristics tracing.
"score": "<score>" // The computed likelihood score.
}
]
} // ...
]
} // ...
]
} // ...
]NOTE: This response format is the same as the one produced by the static analysis by AST because it relies on the same model.
NOTE: This analysis is faster compared to the static analysis by AST, but it retrieves less information (i.e. see "?").
See INSTALL file.
Manual test suites are set up thanks through the Postman tool.
The tests are specified in the /test/manual directory and are named following the *.test.js pattern.
file key of the form-data, must be downloaded again from the /test/integration/asset directory.
Unit test suites are set up thanks to the Jest framework.
The tests are specified in the /test/unit directory and are named following the *.test.js pattern.
The configuration of Jest is stated in the /package.json file.
The tests running computes the code coverage.
-
Launch the unit tests.
npm run test_unit
Integration test suites are set up thanks to the SuperTest framework.
The tests are specified in the /test/integration directory and are named following the *.test.js pattern.
The configuration of Jest is stated in the /package.json file.
- Launch the application on Docker (cf. Dockerize the application).
-
Launching integration tests.
npm run test_integration
An autogenerated documentation is available thanks to SwaggerUI at http://localhost:3000/docs.
-
Generate the documentation.
npm run swagger
A CI/CD process is set up thanks to GitLab CI/CD. Learn more about GitLab CI/CD via this page.
This one is described in the .gitlab-ci.yml.
-
Lint the application.
npm run lint
-
Formatting the application.
npm run format
- JavaScript
- Docker
- CodeQL is used for static code analysis.
- Wink is used for concept extraction powered by NLP.
- Natural is used for concept extraction powered by NLP.
- @xenova/transformers is used for concept extraction powered by NLP and transformers.
- multer for downloading files.
- adm-zip for unzipping ZIP files.
- sloc for counting the number of lines of code.
- expressjs is a backend NodeJS framework.
- body-parser is used for parsing REST API request body.
- dotenv is used for retrieving environment variables.
- cors is used for managing CORS.
- swagger-autogen is used for SWAGGER documentation.
- swagger-ui-express is used UI SWAGGER documentation.
- npm is the package manager used.
- GitLab CI/CD is the CI/CD continuous tool used.
- Docker Desktop is the containerization technology used.
- Postman is the tool for testing manually the API.
For finding locations of code fragments related to some API or database technologies, some heuristics are defined based on patterns and rules matching according to the documentation of the technologies.
API (Express) Likelihood Score Heuristics.
| ID | Description |
|---|---|
| E0 | The method call contains a data access concept. |
| E1 | According to the Express documentation, the method call has an Express-like method name (e.g., get, post, put, delete, ...). |
| E2 | According to the Express documentation, the method call has an string as first argument. |
| E3 | According to the Express documentation, the method call has an Express route-like string as first argument. |
| E4 | According to the Express documentation, the method call has a function as second argument. |
| E5 | According to the Express documentation, the method call has an Express-like receiver name (e.g., app). |
| E6 | According to the Express documentation, the method call has an Express-like import around (in the same file). |
| E7 | According to the Express documentation, the method call has an Express-like client assignment around (in the same file). |
| E8 | According to the Express documentation, the method call is linked to an Express-like client assignment around (in the same file). |
DB (Redis) Likelihood Score Heuristics.
| ID | Description |
|---|---|
| R0 | The method call contains a data access concept. |
| R1 | According to the Redis documentation, the method call has a Redis-like method name (e.g., get, set, del, scan, keys, sadd, rpush, setnx, ...). |
| R2 | According to the Redis documentation, the method call has a string as first argument. |
| R3 | According to the Redis documentation, the method call has an Redis-like receiver name (e.g., client, redisClient). |
| R4 | According to the Redis documentation, the method call has a Redis-like import around (in the same file). |
| R5 | According to the Redis documentation, the method call has a Redis-like client assignment around (in the same file). |
| R6 | According to the Redis documentation, the method call is linked to an Redis-like client assignment around (in the same file). |
DB (MongoDB) Likelihood Score Heuristics.
| ID | Description |
|---|---|
| M0 | The method call contains a data access concept. |
| M1 | According to the MongoDB documentation, the method call has a MongoDB-like method name (e.g., findOne, insertMany, updateOne, deleteMany, ...). |
| M2 | According to the MongoDB documentation, the method call has a string, an object or an array as first argument. |
| M3 | According to the MongoDB documentation, the method call has an MongoDB-like receiver name (e.g., db, collection). |
| M4 | According to the MongoDB documentation, the method call has a MongoDB-like import around (in the same file). |
| M5 | According to the MongoDB documentation, the method call has a MongoDB-like client assignment around (in the same file). |
| M6 | According to the MongoDB documentation, the method call is linked to a MongoDB-like client assignment around (in the same file). |
The resulting report follows that model:
For finding locations of code fragments related to some API or database technologies, some NLP and Text Retrieval statistical computations (e.g., TF-IDF, dominance, variation, BERT sentence transformers, etc.) are applied to terms extracted.
The resulting report follows that model:
If you want to contribute to the project by supporting new technologies or heuristics, please consider the following instructions:
- Any query file must be added in the
/querydirectory. - Excepting the
/testand/evaluation, none other directories must be impacted. - The file
qlpack.ymlcannot be modified. - Any query file must respect the naming conventions
<Type of detection><Technology name><Type of code fragment>.query.ql. - Any helping method used for the queries must be added in the
utils.qllfile. - Any helping method or class must be named clearly (no abbreviations), especially integrating the type of detection, technology, and type of code fragment.
- More generally, any contribution must follow the conventions and keep the shape of previous contributions.
- Any contribution must be tested (unit and integration tests) and evaluated (evaluation). See
/testand/evaluationdirectories. - All the tests and the CI/CD pipeline must pass before definitively integrating the contribution.
- Any contribution must be documented, especially by updating the
README.mdfile. - Any contribution must be approved via the pull request mechanism.
The complete data of our evaluation is detailed in the /evaluation directory. The first folder /ast is dedicated to the AST-based approach. The second folder /nlptr is dedicated to the NLP & TR-based approach. The commands npm run evaluation_* perform several evaluation scripts to compute various metrics such as precision, recall, F1, and frequency. Please consult evaluation scripts and output files for further details.
