KΛLYX
/
Marketplace
⌘K
Sign in
Q
LOADING
Marketplace
·
Evaluator Templates
KΛLYX · Marketplace · sovereign store
Evaluator Templates
LLM evaluation harnesses, golden-set builders, and rubric-scoring packs for model comparison.
Products
7
Featured
3
Tags
4
Featured only
7 of 7
all
cost
models
rag
safety
v1.0.0
RAG Quality Eval
Retrieval-augmented generation harness with retrieval + synthesis scoring.
rag
v1.0.0
Hallucination Detector
Fact-grounded contradiction detection over LLM outputs.
safety
v1.0.0
Safety Rubric
PII leakage, jailbreak, and misuse probes.
safety
v1.0.0
Cost / Latency Pareto
Model selection front with cost vs. latency vs. score.
cost
Featured
v1.0.0
Contains key details
An LLM-as-a-judge evaluator for checking whether generated text contains key details.
models
Featured
v1.0.0
ROUGE Score
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scoring — a set of metrics used to evaluate the quality of machine-generated text, particularly in tasks like summarization and translation. Higher ROUGE scores indicate a closer match to the reference text.
models
Featured
v1.0.0
Rubric Grader
A general purpose LLM-backed evaluator for grading generated text based on a dynamic marking rubric.
models