Rubrica Request a demo
Trusted AI grading for education

Grading your faculty can trust, running on your own campus.

Language models give the same answer different grades, and they ask you to send student work to an outside company. Rubrica is a grading model that scores the same way every time and runs inside your institution, so student data stays where it belongs.

Runs on your own servers Keeps student data on campus Same answer, same score

A research collaboration across five universities

University of ArizonaUniversity of Arizona
Stanford UniversityStanford University
Carnegie Mellon UniversityCarnegie Mellon
Northeastern UniversityNortheastern
Boise State UniversityBoise State

The problem

Why universities still don't trust AI to grade

AI grading already exists inside the tools schools use. Faculty still don't rely on it, because today's tools fall short in several ways at once.

📉

The scores aren't consistent

Move a step around, change the wording, write more or less, or add a harmless extra sentence, and the model gives the same correct answer a different grade. A grade that changes with phrasing is hard to stand behind.

One answer, four equivalent rewrites, scored by a general model

10 0 expected ≈ 8 Position Length Style Distractor
🔒

Student data can't leave campus

Sending coursework to a third party API runs straight into FERPA and student privacy. After recent breaches took whole campuses offline, keeping data inside the institution stopped being optional.

  • No student work leaves your network
  • Runs on infrastructure you already control
  • Every score is logged and reviewable
💬

It's only a number

Existing tools hand back a grade with no reasoning and no feedback. Students learn nothing from it, and faculty can't see how the score was reached.

⚖️

It can be unfair

Models often reward polished phrasing and penalize non native writing, even when the reasoning is the same. That becomes an equity problem the moment it touches a grade.

📑

Grades are hard to defend

When a student appeals, a number from a black box is not an answer. Faculty need to see the evidence behind every score.

📈

Cost grows with scale

Per call API pricing climbs fast across thousands of students and assignments. A model you host yourself keeps cost flat as you grow.

The solution

A grading model trained to stay consistent

We adapt open weight models with a training method built for scoring, then run it where your data already lives.

01

Consistent

We train the model so that answers meaning the same thing receive the same score, no matter how they are written.

02

Grounded in evidence

The model is guided to read the parts of an answer that actually matter, instead of being swayed by length, style, or filler.

03

Private

It runs on your own GPUs. Student work never leaves campus, which keeps you on the right side of FERPA.

More than a score

A grade should explain itself

Other tools return a number and stop there. Rubrica gives the grade, the reasoning behind it, and feedback the student can act on.

The score

A consistent grade against your rubric, the same for any answer that means the same thing.

🔎

The rationale

Which rubric criteria were met, and the exact part of the answer each judgment is based on. Faculty can check it and defend it.

The feedback

Specific, usable notes on what to fix next, so the student actually learns from the grade instead of just receiving it.

Why it's different

Not another prompt, a real fix

We fix the cause, not the prompt

Most tools try to patch inconsistent grading with cleverer prompts. We trace it to how the model attends to an answer, and change that through training.

A first benchmark for grading consistency

We built a way to measure how much a grader drifts across equivalent answers, so schools can compare models on something that finally matters to them.

Built for institutions, not consumers

Universities are where we start. The same engine scores open ended answers anywhere they are graded at scale.

The vision

Grading is the start. Trust is the platform.

Every place AI touches education needs to be reliable and private. We begin with grading because that is where inconsistency and privacy risk cost the most, then bring the same trusted model to the rest of the institution.

Now

Reliable, private grading

Consistent scoring for STEM short answers and assignments.

Next

Assessment integrity

Consistency checks, appeals support, and records you can audit.

Later

Private tutoring and feedback

Self hosted feedback that guides students without sending their work away.

Built on research

A research team across five universities

Rubrica is built by researchers from five universities studying rubric drift, the reason language models give the same answer different scores. The work includes a new benchmark for grading robustness and a training method to reduce it, and it is active and ongoing.

University of Arizona University of Arizona
Stanford University Stanford University
Carnegie Mellon University Carnegie Mellon University
Northeastern University Northeastern University
Boise State University Boise State University

Bring trusted grading to your campus

We are working with a small group of universities and programs on early pilots, built together with their faculty.

Request a demo

or email jingshao@arizona.edu