You upload student work. Hawkings grades it against your rubric. You get a numeric score, per-criterion breakdown, markdown feedback, inline annotations, and next-step insights — with terminal guarantees and full reproducibility.Documentation Index
Fetch the complete documentation index at: https://docs.hawkings.education/llms.txt
Use this file to discover all available pages before exploring further.
Four resources
| Resource | ID prefix | Mutable | Owns |
|---|---|---|---|
Assignment | asg_ | yes | What is being graded against. References a Rubric and a model. |
Rubric | rub_ | versioned | Criteria, weights, scale, guidance. Validatable on its own. |
Submission | sub_ | no | The student’s work. Immutable once accepted. |
Evaluation | eval_ | state machine | An async job: one submission, one rubric, one model. |
Submission may have many Evaluations — re-grades, model
upgrades, calibration runs, human-override cycles. The Evaluation
is what you wait on. The Submission is what you keep.
Why split Submission from Evaluation? Re-grading is the most
common real operation: rubrics evolve, models improve, teachers ask
for second opinions. With one combined record the second grade
overwrites the first and you lose history. With separate records you
keep both, diff them, and roll back.
End-to-end in five calls
The complete grading flow, top to bottom, nothing hidden:The Evaluation lifecycle
EveryEvaluation lives on this state machine. All three terminal
states are guaranteed: the platform itself transitions stuck jobs to
failed within 30 minutes. There is no pending forever.
null:
| State | Always populates |
|---|---|
succeeded | result.score, result.feedback_markdown, result.breakdown |
failed | failure_reason.type, failure_reason.message, failure_reason.request_id |
canceled | canceled_at, canceled_by |
failure_reason.type is a closed enum. Switch on it:
Assignment
The long-lived configuration of what is being graded.Operations
POST /v1/assignments — create.
GET /v1/assignments/{id} — retrieve.
PATCH /v1/assignments/{id} — update (rubric, model, human_review, ocr).
GET /v1/assignments?external_id=… — list, filterable.
Course context
Assignments may reference courseMaterials (textbook chapters, lecture
slides, the syllabus). The grader uses these as authoritative context.
A claim that contradicts the course materials is flagged in the
breakdown — even if it’d score well in isolation.
Rubric
A first-class, validatable, versioned resource.Operations
POST /v1/rubrics — create.
POST /v1/rubrics/validate — dry-run lint, returns warnings without saving. Use in CI.
POST /v1/rubrics/{id}/versions — replace contents; bumps version.
POST /v1/rubrics/{id}/calibrate — attach golden examples.
Lint warnings
The validation pass catches what we see in the wild:blocking warnings prevent creation. warning warnings persist and
surface in the dashboard, the SDK response, and assignments.preview().
Calibration
Upload 5–20 hand-graded examples. The platform uses them as few-shot anchors so the AI scores like your institution scores, not like a generic model would.agreement_score (0–1, the rank correlation
between AI scores and your hand scores on a held-out subset). Below
0.7 means the rubric is too subjective; below 0.5 means the rubric is
not measuring what you think it’s measuring.
Submission
Immutable, multi-modal, preflight-checked at the door.Operations
POST /v1/submissions — create. Synchronous preflight: parses every file,
counts extracted text, runs OCR if enabled, detects language. Blocking
issues fail the request with a 422 before the submission is persisted.
GET /v1/submissions/{id} — retrieve.
GET /v1/submissions?assignment=asg_… — list.
Multi-modal natively
Submissions accept any of: text, document files (pdf, docx, md, html), audio (mp3, wav, m4a — transcribed), video (mp4 — transcribed + keyframe analysis), code (zip or single file — language-aware extraction), images (jpg, png — OCR if enabled).Preflight at the door
Preflight is synchronous and blocking. The submission is rejected before persistence if:- The file is encrypted or password-protected.
- A PDF contains only images and OCR is disabled.
- Extracted text is below the assignment’s minimum length.
- Total payload exceeds the configured ceiling.
Evaluation
The async grading job. The artifact you wait on.Operations
POST /v1/evaluations — enqueue.
GET /v1/evaluations/{id} — retrieve.
POST /v1/evaluations/{id}/cancel — cancel queued or running.
Result envelope (on success)
Result envelope (on failure)
Reproducibility
Every evaluation pins three things:model— the exact model version (claude-sonnet-4-6-20260301, not the floating alias).rubric_version— the rubric as it was at the moment of evaluation.seed— deterministic re-runs return identical output (within model determinism limits).
(submission, rubric_version, model, seed) returns
the same result. This is what makes grading auditable.
Two flags worth knowing about
flags is an array; absent flags mean “not detected”. Today we ship:
"likely_ai_generated"— the submission was probably written by an LLM. We surface this for the teacher’s awareness; we don’t act on it."off_topic"— the response doesn’t engage with the prompt. The score is still computed but is meaningless; the teacher should look.
Batch
Grade a whole class at once. Optimal for the teacher’s “submit all” moment after a class deadline.evaluation_batch.completed webhook fires when the batch reaches its
terminal state, rather than one event per evaluation.
Preview
Before processing 100 real submissions, run one through the rubric with a sample answer. No persistence, no cost charged to the production ledger.Teacher review
The post-AI workflow is a first-class resource, not a side-effect.Operations
POST /v1/evaluations/{id}/reviews — accept, override, or reject.
Submission has a derived final_score that follows precedence:
latest accepted/overridden review → evaluation result → null.
human_review: "required", final_score is
null until a review exists, regardless of the evaluation status.
Webhooks
Production integrators subscribe instead of polling. Every event is signed with HMAC-SHA256.| Event | When |
|---|---|
submission.received | Submission persisted, preflight passed. |
evaluation.queued | Job accepted. |
evaluation.succeeded | Terminal: scored, feedback ready. |
evaluation.failed | Terminal: see failure_reason.type. |
evaluation.canceled | Terminal: caller invoked evaluations.cancel(). |
evaluation_review.created | Teacher reviewed an evaluation. |
evaluation_batch.completed | Batch reached its terminal state. |
rubric.warning | A lint warning surfaced post-grading. |
Signature verification
The signature header isHawkings-Signature: t=<timestamp>,v1=<hmac>.
verifyWebhook throws on signature mismatch or timestamp older than
5 minutes (replay protection).
Delivery & retry
Non-2xx responses retry on exponential backoff for 24 hours. Dedupe on your end byevent.id. The dashboard at
app.hawkings.education/webhooks shows every delivery, replayable
with one click.
Idempotency
EveryPOST accepts an Idempotency-Key header. Same key + same body
returns the same response — even on the 100th retry.
IdempotencyError. Records live
24 hours.
External IDs
You have IDs in your LMS. We have ours. The rules:- URL keys are always Hawkings IDs.
asg_…,sub_…,eval_…,rub_…. Globally unique, prefixed, never reused. external_idis a queryable field on every resource. Your reference, scoped to your workspace.- Lookup by external_id:
GET /v1/assignments?external_id=moodle-assignment-29returns a list (always; even size 1) so duplicates are detectable.
Errors
Every error response uses the same envelope:type is a closed enum: authentication_error, permission_error,
not_found, invalid_request_error, rate_limit_error,
idempotency_error, api_error, service_unavailable.
code is finer-grained and stable. Switch on code for app logic;
switch on type for retry decisions.
fields is present on every validation error, keyed by JSON-pointer path.
request_id is always present, on every response (success and
failure). Include it in every support ticket.
Test mode
Use a sandbox API key (prefixhk_test_) to integrate without
spending real LLM tokens. Sandbox evaluations:
- Return deterministic mock results computed from a hash of the submission.
- Skip the LLM entirely (
usage.cost_usdis0). - Complete in under 100 ms.
- Honor the full state machine, including
failedoutcomes, so you can exercise every branch of your integration.