The c_analyzer module is a specialized code analysis component for C language files within the CodeWiki dependency analysis system. It uses Tree-Sitter, a general-purpose incremental parsing library, to parse and extract structural information from C source code, including function definitions, struct declarations, global variables, and their relationships.
- Parse C Code: Uses Tree-Sitter to efficiently parse C syntax and build abstract syntax trees (ASTs)
- Extract Constructs: Identifies top-level C language constructs (functions, structs, global variables)
- Build Dependency Graph: Extracts relationships between identified constructs (function calls, variable usage)
- Enable Documentation: Provides the foundation for automated documentation generation by making code structure machine-readable
- Recursive AST traversal for comprehensive node extraction
- Multi-type relationship detection (function calls, variable access)
- System function filtering to exclude standard library calls
- Cross-file relationship support for call graph construction
- Component ID generation for unique node identification
High-Level Architecture:
CLI Layer (CLIDocumentationGenerator)
↓
Backend Services (DocumentationGenerator)
↓
AnalysisService (Orchestrator)
↓
Language Analyzers
├─ TreeSitterCAnalyzer (c_analyzer)
├─ TreeSitterCppAnalyzer (cpp_analyzer)
└─ Other Language Analyzers
↓
Produces: Node objects and CallRelationship objects
↓
Dependency Analysis Services
├─ RepoAnalyzer (Coordinates repository-wide analysis)
└─ CallGraphAnalyzer (Resolves relationships)
Module Dependencies:
- Consumes from:
AnalysisService - Produces:
NodeandCallRelationshipobjects (from models/core.py) - Feeds to:
CallGraphAnalyzerfor relationship resolution - Coordinates by:
RepoAnalyzerfor repository-wide analysis
The core component of the c_analyzer module, responsible for parsing individual C files and extracting their structural information.
Processing Pipeline:
Input Files
│
├─ File Path
├─ File Content (string)
└─ Repository Path (context)
│
↓
TreeSitterCAnalyzer.__init__()
│
├─ Store metadata (path, content, repo context)
├─ Initialize nodes list []
├─ Initialize relationships list []
└─ Call _analyze()
│
↓
_analyze() Method
│
├─ Create Tree-Sitter parser (C language)
├─ Parse file content → AST (Abstract Syntax Tree)
└─ Call _extract_nodes() and _extract_relationships()
│
↓
Output
├─ List[Node] - extracted constructs
└─ List[CallRelationship] - dependencies
Location: codewiki/src/be/dependency_analyzer/analyzers/c.py
def __init__(self, file_path: str, content: str, repo_path: str = None)Parameters:
file_path(str): Absolute or relative path to the C source filecontent(str): Complete file content as a stringrepo_path(str, optional): Root path of the repository for relative path calculation
Initialization Flow:
- Stores file metadata (path, content, repo context)
- Initializes empty lists for nodes and relationships
- Triggers
_analyze()to process the file
Main orchestration method that:
- Creates Tree-Sitter parser with C language configuration
- Parses file content into AST
- Calls
_extract_nodes()for initial node discovery - Calls
_extract_relationships()to establish dependencies
Recursion-based AST traversal that identifies top-level C constructs:
| Node Type | Extracted As | Example |
|---|---|---|
function_definition |
"function" | int parse_expr() |
struct_specifier |
"struct" | struct Point { ... } |
type_definition |
"struct" (typedef) | typedef struct { ... } Point; |
declaration (global scope) |
"variable" | int global_var = 0; |
Process:
- Recursively traverse AST from root
- Match node type patterns
- Extract name, line numbers, source code
- Create
Nodeobjects for functions and structs - Maintain
top_level_nodesdict for fast lookup - Store variables for relationship analysis
Node Type Matching:
function_definition
├─ function_declarator (contains function name)
│ └─ identifier (the function name)
└─ (function body)
struct_specifier
├─ type_identifier (struct name)
└─ (struct body)
type_definition (typedef struct)
├─ struct_specifier
└─ type_identifier (typedef name)
declaration (global variable)
└─ init_declarator or identifier
└─ identifier (variable name)
Filtering:
- Only functions and structs added to
self.nodes - Variables tracked internally for relationship analysis
Recursive relationship discovery that identifies:
-
Function Calls (unresolved, cross-file):
- Pattern:
call_expressionnodes - Process: Extract function name → Filter system functions → Create CallRelationship
- Example:
parse_statement()call creates relationship
- Pattern:
-
Global Variable Access (resolved, local file):
- Pattern:
identifiernodes referencing global variables - Process: Check if identifier is in
top_level_nodes→ Create resolved CallRelationship - Example:
global_configaccess creates relationship
- Pattern:
System Functions Excluded:
- I/O:
printf,scanf,fopen,fclose,fread,fwrite - Memory:
malloc,free,memcpy,memset - String:
strlen,strcpy,strcmp - Process:
exit,abort - Graphics (SDL):
SDL_Init,SDL_CreateWindow, etc.
Resolution Status:
- Resolved (
is_resolved=True): Local file relationships - Unresolved (
is_resolved=False): Cross-file functions (resolved later by CallGraphAnalyzer)
Walks up the AST parent chain to find the enclosing function_definition node.
_get_module_path():
- Converts absolute paths to repository-relative
- Removes file extensions (.c, .h)
- Converts path separators to dots
- Example:
src/utils/helpers.c→src.utils.helpers
_get_relative_path():
- Gets repository-relative file path
- Handles ValueError for paths outside repo
_get_component_id(name):
- Generates unique component identifier
- Format:
relative/path/to/file.c::component_name - Example:
src/parser.c::parse_expression
_is_global_variable(node):
- Checks if declaration is at file scope
- Walks parent chain up AST
- Returns false if inside function or struct
- Returns true if reaches file root
_is_system_function(func_name):
- Classifies function as system/library
- Uses hardcoded list of common C functions
Source: codewiki/src/be/dependency_analyzer/models/core.py
Represents a single code construct extracted from C source.
Fields:
id: Unique identifier (e.g.,src/parser.c::parse_expression)name: Simple component name (e.g.,parse_expression)component_type: Type classification (function,struct,variable)file_path: Absolute file pathrelative_path: Repository-relative pathsource_code: Extracted source code snippetstart_line/end_line: 1-indexed line numbershas_docstring: AlwaysFalsefor C (no docstrings)docstring: Empty for Cparameters:Nonefor C (not extracted)node_type: Same ascomponent_typebase_classes:Nonefor C (no inheritance)class_name:Nonefor Cdisplay_name: Human-readable name (e.g.,function parse_expression)component_id: Duplicate ofid
Example Node for C function:
Node(
id="src/parser.c::parse_expression",
name="parse_expression",
component_type="function",
file_path="/home/user/project/src/parser.c",
relative_path="src/parser.c",
source_code="int parse_expression(...) {\n ...\n}",
start_line=42,
end_line=156,
has_docstring=False,
docstring="",
node_type="function",
display_name="function parse_expression"
)Source: codewiki/src/be/dependency_analyzer/models/core.py
Represents a dependency between two code constructs.
Fields:
caller: Component ID of calling entity (fully qualified)callee: Component ID or simple name of called entitycall_line: Line number where dependency occursis_resolved: Whether callee is a fully qualified ID
Examples:
Unresolved function call:
CallRelationship(
caller="src/parser.c::parse_expression",
callee="parse_statement", # Simple name
call_line=87,
is_resolved=False # Resolved later by CallGraphAnalyzer
)Resolved global variable usage:
CallRelationship(
caller="src/parser.c::parse_expression",
callee="src/parser.c::global_config", # Fully qualified
call_line=95,
is_resolved=True # Local file relationship
)Goal: Identify all top-level C constructs
Process:
- Recursively traverse AST from root node
- For each node, check
node.typeagainst known patterns - Extract node name and metadata
- Build
Nodeobject with complete information - Add to
self.nodesif it's a function or struct - Store all types in
top_level_nodesdict for relationship analysis
Constructs Identified:
- Functions: All
function_definitionnodes at file scope - Structs: All
struct_specifierand typedef'd structs - Variables: Global
declarationnodes (for relationship tracking)
Goal: Identify dependencies between extracted nodes
Process:
- Recursively traverse AST again
- Identify relationship patterns:
- Function calls:
call_expressionnodes - Variable access:
identifiernodes in function scope
- Function calls:
- For each relationship:
- Find containing function
- Check if target is system function (filter if yes)
- Determine resolution status (local vs cross-file)
- Create
CallRelationshipobject
- Add to
self.call_relationships
Resolution Determination:
- Resolved: Target found in
top_level_nodes(same file) - Unresolved: Target is simple function name (need cross-file resolution)
Module: dependency_analyzer/analysis/analysis_service.py
The AnalysisService:
- Detects C files (.c, .h extensions)
- Instantiates TreeSitterCAnalyzer with file content
- Collects
nodesandcall_relationshipsfrom analyzer - Aggregates results from all language analyzers
Module: dependency_analyzer/analysis/call_graph_analyzer.py
Processes extracted relationships:
- Takes all unresolved CallRelationships
- Matches callee names to actual Node IDs
- Resolves cross-file and cross-module dependencies
- Builds complete call graphs
Module: dependency_analyzer/analysis/repo_analyzer.py
Orchestrates file-level analysis:
- Discovers all C files in repository
- Reads file contents
- Instantiates TreeSitterCAnalyzer for each file
- Aggregates results into repository-wide analysis
-
Macro Analysis:
- Preprocessor directives (#define, #include) not analyzed
- No macro expansion or dependency tracking
-
Type Inference:
- Parameter and return types not extracted
- Function signatures available via source code only
-
Struct/Union Members:
- Internal structure not decomposed
- Members not extracted as separate nodes
-
Pointer Resolution:
- Function pointers not tracked
- Indirect calls not identified
-
Inline Assembly:
- ASM blocks ignored
- Low-level dependencies not visible
Extracted ✅:
- Top-level function definitions
- Struct/union definitions
- Global variable declarations
- Function-to-function calls
- Global variable usage
Not Extracted ❌:
- Local variables
- Function parameters
- Return types
- Type information
- Macro definitions
- Typedef names (except for struct typedefs)
- Function pointers
- Nested functions
-
System Function List:
- Hardcoded list may be incomplete
- New library functions not automatically filtered
-
Global Variable Detection:
- Based on scope analysis
- May miss function-local static variables
-
Cross-File References:
- Depend on CallGraphAnalyzer resolution
- Unmatched calls create incomplete graphs
File: src/calculator.c
#include <stdio.h>
int result = 0;
int add(int a, int b) {
result = a + b;
return result;
}
void print_result() {
printf("Result: %d\n", result);
}
int main() {
int sum = add(5, 3);
print_result();
return 0;
}Nodes Extracted (3 functions):
| Component ID | Name | Type | Line Range |
|---|---|---|---|
src/calculator.c::add |
add | function | 5-8 |
src/calculator.c::print_result |
print_result | function | 10-12 |
src/calculator.c::main |
main | function | 14-18 |
Relationships Extracted:
| Caller | Callee | Line | Resolved |
|---|---|---|---|
src/calculator.c::add |
src/calculator.c::result |
7 | ✓ Yes |
src/calculator.c::main |
add |
16 | ✗ No |
src/calculator.c::main |
print_result |
17 | ✗ No |
src/calculator.c::print_result |
printf |
11 | Filtered |
Notes:
printfexcluded (system function filter)resultreference resolved immediately (same file)- Function calls unresolved (will be resolved by CallGraphAnalyzer)
- language_analyzers: Parent module containing all language-specific analyzers
- cpp_analyzer: Similar C++ analyzer following same patterns
- dependency_analysis_services: Services that coordinate analyzer usage
- call_graph_analyzer: Processes and resolves call relationships
- dependency_analyzer_models: Data models (Node, CallRelationship, AnalysisResult)
from codewiki.src.be.dependency_analyzer.analyzers.c import TreeSitterCAnalyzer
# Read C file
with open("src/parser.c", "r") as f:
content = f.read()
# Create analyzer
analyzer = TreeSitterCAnalyzer(
file_path="src/parser.c",
content=content,
repo_path="/home/user/project"
)
# Access results
print(f"Found {len(analyzer.nodes)} top-level constructs")
print(f"Found {len(analyzer.call_relationships)} relationships")
# Print nodes
for node in analyzer.nodes:
print(f" {node.display_name} at {node.relative_path}:{node.start_line}")
# Print relationships
for rel in analyzer.call_relationships:
status = "resolved" if rel.is_resolved else "unresolved"
print(f" {rel.caller} → {rel.callee} ({status})")from codewiki.src.be.dependency_analyzer.analyzers.c import analyze_c_file
# Convenience function wrapper
nodes, relationships = analyze_c_file(
file_path="src/parser.c",
content=content,
repo_path="/home/user/project"
)
print(f"Analyzed {len(nodes)} constructs")
print(f"Found {len(relationships)} dependencies")- Simple Functions: Single file with multiple functions
- Structs: Struct definitions and usage
- Global Variables: Declaration and usage across functions
- System Calls: Verify filtering of printf, malloc, etc.
- Cross-File References: Unresolved relationships
- Edge Cases: Empty files, single-line functions, comments in code
- Compare extracted nodes against source file
- Verify component IDs are unique and properly formatted
- Check line numbers match actual source
- Validate that relationships reference existing nodes
- Confirm system functions are filtered
- Verify cross-file relationships are marked as unresolved
- Incremental Parsing: Tree-Sitter supports incremental updates for faster re-analysis
- Single-Pass Extraction: Node and relationship extraction in one traversal
- Memory Usage: AST and source code stored in memory; suitable for files <1MB
- Scalability: Tested with files up to several MB; linear performance with file size
- Analyzer Framework: Tree-Sitter v0.20+
- Language Binding:
tree_sitter_cPython binding - C Standard: C99 and later
- Last Updated: 2024