Skip to content

Commit 344504d

Browse files
Lisa CaoLisa Cao
Lisa Cao
authored and
Lisa Cao
committed
add gravitino RFC
Signed-off-by: Lisa Cao <lnc@Lisas-MacBook-Pro.local>
1 parent 31d0851 commit 344504d

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# RFC: Data Catalog Service
2+
3+
This RFC Proposes an Apache Gravitino microservice to act as a lakehouse connector for OPEA applications to integrate with.
4+
5+
## Author(s)
6+
7+
[Lisa N. Cao](https://github.com/lisancao/)
8+
9+
## Status
10+
11+
`Under Review`
12+
13+
## Objective
14+
15+
OPEA's applications, even mature ones such as ChatQ&A are difficult to set up due to the complexity of the project, but also because enterprise organizations have a hard time consolidating all their data they want to train on in one place. Currently data sources have to be manually connected to OPEA applications and managed separately, which creates a barrier for governance. Apache Gravitino can provide data management, tagging, and access controls for OPEA applications and agentic workflows that will act as guardrails to source data make the OPEA applications appropriate for enterprise and production use case, but also provide a standard set of APIs for lineage, lifecycle management, and data discovery.
16+
17+
## Motivation
18+
19+
Apache Gravitino's metastore model is designed for enterprise and AI use cases, making it the ideal fit for the project. Coupled with Gravitino's MCP server andflexible deployment options, it will be a good choice for an interface for OPEA applications to access source data safely.
20+
21+
## Design Proposal
22+
23+
Deploy Gravitino as a catalog microservice that is meant to service the Generative AI Components (GenAIComps), with a focus on Langchain and LlamaIndex Framework support. This can generally be done using a Gravitino server and leveraging the Python client, but requires that the user sets up Gravitino on their end. Perhaps it can be integrated with the Dataprep and guardrails.
24+
25+
26+
## Alternatives Considered
27+
28+
N/A
29+
30+
## Compatibility
31+
32+
list possible incompatible interface or workflow changes if exists.
33+
34+
## Miscellaneous
35+
36+
- Performance Impact, such as speed, memory, accuracy.
37+
- Engineering Impact, such as binary size, startup time, build time, test times.
38+
- Security Impact, such as code vulnerability.
39+
- TODO List or staging plan.

0 commit comments

Comments
 (0)