KΛLYXSovereign Intelligence

Data Foundation

Platform LibraryPROFESSIONAL

Data WorkbenchPROFESSIONAL

OntologyPROFESSIONAL

Operations

Cyber OpsOPERATOR

Cyber WorkspaceOPERATOR

IncidentsOPERATOR

Reports Review Inbox

Intelligence

GenesisPROFESSIONAL

Agent StudioOPERATOR

Logic FunctionsOPERATOR

Citizen IntelOPERATOR

Sovereign Vision

Evidence Ops

WitnessOPERATOR

Anon EvidenceOPERATOR

Governance

AdminOWNER

Operations Log Roadmap Plans Sign in

KΛLYX/Marketplace

⌘K

Q

LOADING

Marketplace·Evaluator Templates

KΛLYX · Marketplace · sovereign store

Evaluator Templates

LLM evaluation harnesses, golden-set builders, and rubric-scoring packs for model comparison.

Products

7

Featured

3

Tags

4

7 of 7

RAG Quality Eval

Retrieval-augmented generation harness with retrieval + synthesis scoring.

Hallucination Detector

Fact-grounded contradiction detection over LLM outputs.

PII leakage, jailbreak, and misuse probes.

Cost / Latency Pareto

Model selection front with cost vs. latency vs. score.

Contains key details

An LLM-as-a-judge evaluator for checking whether generated text contains key details.

ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scoring — a set of metrics used to evaluate the quality of machine-generated text, particularly in tasks like summarization and translation. Higher ROUGE scores indicate a closer match to the reference text.

A general purpose LLM-backed evaluator for grading generated text based on a dynamic marking rubric.