Rather than just statistical indexing and training, first take the knowledge based documents and source texts, and analyze them with public knowledge.
This means its not using a fact checking agent on just any LLM.
Instead, the fact checking agent works on a Fact-Retrieval LLM which was made
from the original texts instead of on a regular full LLM, with pre-trained data and RAG augmenting with our extra knowledge.
Here's how it works.
Create a Fact Retrieval LLM: First extract the fields of knowledge into knowledge graphs (or knowledge trees and networks) from what was found in the analyzed texts.
Summarize the knowledge in "knowledge components".
Pair it with symbolic reasoning, grammatical coherence, contextual validity, analogies from other field, and links into the original text in case of a deeper delving.
These new knowledge components are being searched rather than searching raw LLM models.
Constant (or perhaps once a day) refresh and revisioning is done.
If things are not understood, it knows to quote and say what it did not understand. If there are open questions those are marked. It learns along with experts and with you.
It checks the validity, usefulness and contextual information in each component of knowledge and gathers more around it.
It knows to ask for help if it (the AI) needs it. When it received help, it shares the sources or access paths with its (AI) colleagues, or at least stores the knowledge.
The model is then trained with what it has studied and knows (And with what you have studied and know). This includes remembering mistakes and fixing the knowledge base after encountering errors or after receiving personalized feedback.
This new type of model is a model of comprehension, storing and indexing structured information. Instead of using words that it doesn't understand, but rather it is created by connecting many different previously acquired small models with rules of logic and coherence embedded in them. Logic and coherence can then be inferred directly from the models and verified programmatically using logic, semantics, coherence, pragmatics and generalities and by comparing with contextual information, all pointed to directly in the model itself.
The original model (the raw LLM) is not stored or used. Instead we use a specialized model that has a links back into the sources of original material. These can be accessed ad hoc if needed, although in a slower way.
Some direct quotes of text (phrases) are left in the model but not all. This (I think) is what we do in our brains. We don't remember the whole bible, but some quotes we can take out as is, and even them we don't always use in our immediate assessment. We can always open the book to "remind ourselves".
This creates a "personal experience" with your "AI persona" that include opinions that may change over time, but also importantly aware of who you are. This way, instead of prompting "you are a dentist who believes in vegetarianism" she knows who she is already. She is your friend. And she knows who you are and explored already what you appreciate and what you don't.
She can always tell you: I never learned dentistry, but if I have access to the following, I may be able to give you a comprehensive answer, since I did specialize in brain research and do have deep knowledge in biology.
If this already exists, I'll delete the idea.