Features/refine knowledge service#1341
Conversation
…atures/refine-knowledge-service
…atures/refine-knowledge-service
…atures/refine-knowledge-service
…atures/refine-knowledge-service
Review Summary by QodoRefactor knowledge service architecture to support multiple knowledge types with improved separation of concerns
WalkthroughsDescription• **Refactored knowledge service architecture** to support multiple knowledge types (Document, QuestionAnswer, SemanticGraph, Taxonomy) through dynamic service resolution instead of static dependencies • **Renamed terminology** throughout codebase from "vector" to "collection" for consistency and clarity • **Extracted file handling logic** into dedicated IKnowledgeFileOrchestrator service for better separation of concerns • **Implemented type-specific knowledge services**: DocumentKnowledgeBase, QuestionAnswerKnowledgeBase, SementicGraphKnowledgeBase, and TaxonomyKnowledgeBase • **Reorganized vector knowledge base** into modular partial classes: Collection.cs, Data.cs, Index.cs, and Snapshot.cs • **Fixed naming inconsistencies** across repositories and storage services: corrected typos (VectorStroageProviders → VectorStorageProviders), renamed models (KnowledgeDocMetaData → KnowledgeFileMetaData), and standardized parameter names • **Created comprehensive options and request models** for knowledge operations with proper inheritance hierarchy • **Updated controller endpoints** to use dynamic knowledge service resolution based on knowledgeType parameter • **Simplified vector search parameters** from VectorSearchParamModel to `Dictionary<string, string?>` • **Added database provider support** throughout knowledge operations for multi-database scenarios Diagramflowchart LR
A["Old Monolithic<br/>KnowledgeService"] -->|"Refactored into"| B["IKnowledgeService<br/>Interface"]
B -->|"Implemented by"| C["DocumentKnowledgeBase"]
B -->|"Implemented by"| D["QuestionAnswerKnowledgeBase"]
B -->|"Implemented by"| E["SementicGraphKnowledgeBase"]
B -->|"Implemented by"| F["TaxonomyKnowledgeBase"]
C -->|"Extends"| G["VectorKnowledgeBase"]
D -->|"Extends"| G
H["IKnowledgeFileOrchestrator"] -->|"Extracted from"| A
I["KnowledgeBaseController"] -->|"Uses dynamic<br/>resolution"| B
J["KnowledgeHook"] -->|"Uses dynamic<br/>resolution"| B
File Changes1. src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.cs
|
Code Review by Qodo
1. BuildSearchOptions reuses request collections
|
| private KnowledgeExecuteOptions BuildSearchOptions(IKnowledgeService kg, KnowledgeExecuteRequest? request) | ||
| { | ||
| var searchParam = request?.SearchParam?.ToDictionary(x => x.Key, x => x.Value?.ConvertToString()); | ||
|
|
||
| if (kg.KnowledgeType.IsEqualTo(KnowledgeBaseType.SemanticGraph)) | ||
| { | ||
| return new GraphKnowledgeSearchOptions | ||
| { | ||
| DbProvider = request?.DbProvider, | ||
| SearchParam = searchParam, | ||
| SearchArguments = request?.SearchArguments, | ||
| GraphId = request?.GraphId | ||
| }; | ||
| } | ||
|
|
||
| if (kg.KnowledgeType.IsEqualTo(KnowledgeBaseType.Taxonomy)) | ||
| { | ||
| return new TaxonomyKnowledgeSearchOptions | ||
| { | ||
| DbProvider = request?.DbProvider, | ||
| Limit = request?.Limit ?? 5, | ||
| Confidence = request?.Confidence ?? 0.5f, | ||
| SearchParam = searchParam, | ||
| SearchArguments = request?.SearchArguments, | ||
| DataProviders = request?.DataProviders, | ||
| MaxNgram = request?.MaxNgram | ||
| }; | ||
| } | ||
|
|
||
| return new KnowledgeExecuteOptions | ||
| { | ||
| DbProvider = request?.DbProvider, | ||
| Fields = request?.Fields, | ||
| FilterGroups = request?.FilterGroups, | ||
| Limit = request?.Limit ?? 5, | ||
| Confidence = request?.Confidence ?? 0.5f, | ||
| WithVector = request?.WithVector ?? false, | ||
| SearchParam = searchParam, | ||
| SearchArguments = request?.SearchArguments | ||
| }; |
There was a problem hiding this comment.
1. buildsearchoptions reuses request collections 📘 Rule violation ⛨ Security
BuildSearchOptions assigns caller-provided collections/dictionaries (e.g., Fields, FilterGroups, SearchArguments, DataProviders) directly onto option objects, allowing shared mutable state between layers. This can let downstream mutations leak back into request objects and across components.
Agent Prompt
## Issue description
`BuildSearchOptions` propagates request-owned collections/dictionaries by reference, risking shared mutable state.
## Issue Context
The API request model (`KnowledgeExecuteRequest`) contains mutable collections (`Fields`, `FilterGroups`, `SearchArguments`, `DataProviders`). These should be copied before being stored/returned in options passed downstream.
## Fix Focus Areas
- src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.cs[395-434]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| public async Task<bool> RecoverCollectionFromSnapshot([FromRoute] string collection, IFormFile snapshotFile, [FromForm] string knowledgeType, [FromForm] string? dbProvider = null) | ||
| { | ||
| var kg = _services.GetServices<IKnowledgeService>() | ||
| .FirstOrDefault(x => x.KnowledgeType.IsEqualTo(knowledgeType)); | ||
|
|
||
| if (kg == null) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| var fileName = snapshotFile.FileName; | ||
| var binary = FileUtility.BuildBinaryDataFromFile(snapshotFile); | ||
| var done = await _knowledgeService.RecoverVectorCollectionFromSnapshot(collection, fileName, binary); | ||
| var done = await kg.RecoverCollectionFromSnapshot(collection, fileName, binary, new KnowledgeSnapshotOptions { DbProvider = dbProvider }); |
There was a problem hiding this comment.
2. recovercollectionfromsnapshot missing file guard 📘 Rule violation ☼ Reliability
RecoverCollectionFromSnapshot uses snapshotFile.FileName and reads file content without guarding snapshotFile for null/empty, risking null dereference and unexpected exceptions at the API boundary. This should return a safe fallback (e.g., false) on invalid input.
Agent Prompt
## Issue description
The snapshot recovery endpoint assumes `snapshotFile` is non-null and non-empty.
## Issue Context
At API boundaries, missing/malformed multipart form data can result in `snapshotFile` being null or empty.
## Fix Focus Areas
- src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.cs[339-351]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| [HttpPost("/knowledge/collection/{collection}/file/upload")] | ||
| public async Task<UploadKnowledgeResponse> UploadKnowledgeFiles([FromRoute] string collection, [FromBody] KnowledgeUploadRequest request) | ||
| { | ||
| var fileOrchestrator = GetKnowledgeFileOrchestrator(request.FileOrchestrator); | ||
| var response = await fileOrchestrator.UploadFilesToKnowledge(collection, request.Files, request.Options); | ||
| return response; | ||
| } | ||
|
|
||
| [HttpPost("/knowledge/collection/{collection}/file/form")] | ||
| public async Task<UploadKnowledgeResponse> UploadKnowledgeFiles( | ||
| [FromRoute] string collection, | ||
| [FromForm] IEnumerable<IFormFile> files, | ||
| [FromForm] string? orchestrator = null, | ||
| [FromForm] KnowledgeFileHandleOptions? options = null) | ||
| { | ||
| if (files.IsNullOrEmpty()) | ||
| { | ||
| return new UploadKnowledgeResponse(); | ||
| } | ||
|
|
||
| var docs = new List<ExternalFileModel>(); | ||
| foreach (var file in files) | ||
| { | ||
| var data = FileUtility.BuildFileDataFromFile(file); | ||
| docs.Add(new ExternalFileModel | ||
| { | ||
| FileName = file.FileName, | ||
| FileData = data | ||
| }); | ||
| } | ||
|
|
||
| var fileOrchestrator = GetKnowledgeFileOrchestrator(orchestrator); | ||
| var response = await fileOrchestrator.UploadFilesToKnowledge(collection, docs, options); | ||
| return response; | ||
| } | ||
|
|
||
| [HttpDelete("/knowledge/collection/{collection}/file/{fileId}")] | ||
| public async Task<bool> DeleteKnowledgeFile([FromRoute] string collection, [FromRoute] Guid fileId, [FromQuery] KnowledgeFileRequest? request = null) | ||
| { | ||
| var fileOrchestrator = GetKnowledgeFileOrchestrator(request?.FileOrchestrator); | ||
| var options = !string.IsNullOrWhiteSpace(request?.DbProvider) ? new KnowledgeFileOptions { DbProvider = request.DbProvider } : null; | ||
| var response = await fileOrchestrator.DeleteKnowledgeFile(collection, fileId, options); | ||
| return response; | ||
| } | ||
|
|
||
| [HttpDelete("/knowledge/collection/{collection}/file")] | ||
| public async Task<bool> DeleteKnowledgeFiles([FromRoute] string collection, [FromBody] GetKnowledgeFilesRequest request) | ||
| { | ||
| var fileOrchestrator = GetKnowledgeFileOrchestrator(request.FileOrchestrator); | ||
| var response = await fileOrchestrator.DeleteKnowledgeFiles(collection, request); | ||
| return response; | ||
| } | ||
|
|
||
| [HttpPost("/knowledge/collection/{collection}/file/page")] | ||
| public async Task<PagedItems<KnowledgeFileViewModel>> GetPagedKnowledgeFiles([FromRoute] string collection, [FromBody] GetKnowledgeFilesRequest request) | ||
| { | ||
| var fileOrchestrator = GetKnowledgeFileOrchestrator(request.FileOrchestrator); | ||
| var data = await fileOrchestrator.GetPagedKnowledgeFiles(collection, request); | ||
|
|
There was a problem hiding this comment.
3. uploadknowledgefiles lacks null/empty guards 📘 Rule violation ☼ Reliability
Multiple file endpoints dereference [FromBody] request objects (and their collections like Files) without guarding for null/empty, risking null dereferences or empty-sequence failures at the API boundary. These should return safe fallbacks (e.g., empty response/false) when inputs are invalid.
Agent Prompt
## Issue description
File endpoints assume `[FromBody]` request and its properties are always non-null/non-empty.
## Issue Context
Model binding can produce null request bodies and null/empty collections, which should be handled explicitly with safe fallbacks.
## Fix Focus Areas
- src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.File.cs[17-75]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| private async Task<IEnumerable<string>> SaveToKnowledgebase(string collectionName, string knowledgebaseProvider, IEnumerable<string> contents, Dictionary<string, VectorPayloadValue>? payload = null) | ||
| { | ||
| if (contents.IsNullOrEmpty()) | ||
| { | ||
| return Enumerable.Empty<string>(); | ||
| } | ||
|
|
||
| var dataIds = new List<string>(); | ||
| var vectorDb = GetVectorDb(); | ||
| var vectorDb = GetVectorDb(knowledgebaseProvider); | ||
| var textEmbedding = await GetTextEmbedding(collectionName); | ||
|
|
||
| for (int i = 0; i < contents.Count(); i++) |
There was a problem hiding this comment.
4. savetoknowledgebase misses vectordb guard 📘 Rule violation ☼ Reliability
SaveToKnowledgebase calls GetVectorDb(knowledgebaseProvider) and then uses vectorDb.Upsert(...) without verifying the DB instance is non-null, risking runtime null dereference at an integration boundary. It should explicitly handle a missing/invalid provider (safe fallback or clear exception).
Agent Prompt
## Issue description
`vectorDb` may be null for an unknown/misconfigured `knowledgebaseProvider`, but the code uses it unconditionally.
## Issue Context
`GetVectorDb(provider)` can fail to resolve a provider; integration code should either return a safe fallback (e.g., no-op with error log) or fail fast with a clear exception.
## Fix Focus Areas
- src/Plugins/BotSharp.Plugin.KnowledgeBase/Services/File/KnowledgeFileOrchestrator.cs[436-456]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| created = await vectorDb.CreateCollection(collectionName, options: new() | ||
| { | ||
| Provider = options.LlmProvider, | ||
| Model = options.LlmProvider, | ||
| Dimension = options.EmbeddingDimension |
There was a problem hiding this comment.
5. Wrong embedding model used 🐞 Bug ≡ Correctness
VectorKnowledgeBase.CreateCollection assigns VectorCollectionCreateOptions.Model = options.LlmProvider instead of options.LlmModel, so collection creation can be configured with the wrong embedding model. This can lead to collections being created with incorrect embedding settings and downstream retrieval/embedding mismatches.
Agent Prompt
### Issue description
`VectorKnowledgeBase.CreateCollection` passes the wrong value for `VectorCollectionCreateOptions.Model` (`LlmProvider` instead of `LlmModel`). This can misconfigure collection creation for vector DB providers that rely on `Model`.
### Issue Context
`CollectionCreateOptions` explicitly separates `LlmProvider` and `LlmModel`, and the code already uses `LlmModel` correctly when persisting `TextEmbedding` config.
### Fix Focus Areas
- src/Plugins/BotSharp.Plugin.KnowledgeBase/Services/Base/Vector/VectorKnowledgeBase.Collection.cs[70-75]
- src/Infrastructure/BotSharp.Abstraction/Knowledges/Options/CollectionCreateOptions.cs[3-8]
### Expected change
Set `Model = options.LlmModel` in the `vectorDb.CreateCollection` call.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| private IGraphDb? GetGraphDb(string? provider = null) | ||
| { | ||
| var db = _services.GetServices<IGraphDb>().FirstOrDefault(x => x.Provider.IsEqualTo(provider)); | ||
| return db; |
There was a problem hiding this comment.
6. No default graph provider 🐞 Bug ≡ Correctness
SementicGraphKnowledgeBase.GetGraphDb only matches the passed provider and has no fallback when provider is null, so semantic-graph ExecuteQuery returns empty results when DbProvider is omitted. The controller builds GraphKnowledgeSearchOptions with DbProvider = request?.DbProvider (nullable), making this failure path reachable.
Agent Prompt
### Issue description
`SementicGraphKnowledgeBase.GetGraphDb` returns null when `provider` is null, but the API/controller allows `DbProvider` to be omitted, causing semantic-graph knowledge queries to return empty results.
### Issue Context
`GraphKnowledgeService` already demonstrates the correct behavior: default to `_settings.GraphDb.Provider` when provider is null.
### Fix Focus Areas
- src/Plugins/BotSharp.Plugin.KnowledgeBase/Services/Graph/SementicGraphKnowledgeBase.cs[56-61]
- src/Infrastructure/BotSharp.OpenAPI/Controllers/KnowledgeBase/KnowledgeBaseController.cs[395-407]
- src/Plugins/BotSharp.Plugin.KnowledgeBase/Graph/GraphKnowledgeService.cs[29-35]
### Expected change
Inject `KnowledgeBaseSettings` into `SementicGraphKnowledgeBase` and implement the same fallback logic as `GraphKnowledgeService` (or, alternatively, select the first registered `IGraphDb` when provider is null).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
…atures/refine-knowledge-service
No description provided.