Moderation

Automatic, configurable content moderation and anti-abuse — non-AI, in-core, free.

4 min readUpdated May 31, 2026

Cariosan moderates message content automatically — no manual report/review queue. Filtering is non-AI (wordlists + patterns + counters), runs in the normal server at zero extra cost, and is off by default (a workspace opts in). AI toxicity classification is a separate, optional upgrade on the roadmap; it never gates the core.

How it works

Every send is evaluated before it persists. If a rule matches, the workspace's configured action is applied:

Action	Effect
`flag`	Message is delivered; an action is recorded + a `message.moderated` webhook fires.
`mask`	Matched spans are replaced with `***`; the masked text is what's stored and broadcast.
`reject`	Send fails with `422 MESSAGE_REJECTED`; no message is created.
`auto_mute`	Reject and mute the sender for a window (mute machinery lands with block/mute).

A clean message, or a workspace that hasn't enabled moderation, passes straight through unchanged.

Rules

profanity — a built-in Indonesian + English wordlist, matched as whole words (so "pantai" never trips "tai"). Extend it per workspace with a custom blocklist.
link — explicit URLs (http(s)://, www.).
phone — contact-number leakage (Indonesian 08…/+62… mobiles and generic long digit runs) — the classic marketplace "DM me at 08…" pattern.
flood — the same message repeated by one user in a channel more than N times within a short window.

profanity and flood are on by default once moderation is enabled; link and phone are opt-in (they're higher-false-positive and use-case dependent).

Blocking & muting

Two complementary controls sit alongside the automatic filter:

Block — an end-user capability. client.blockUser(externalId) hides that user's messages from your channel history and search, workspace-wide; unblockUser reverses it. listBlocked() returns your list — use it to also filter the live WebSocket stream client-side (history/search are filtered server-side). Blocking yourself is rejected.
Mute — a moderation/admin control. A user is silenced in a channel: their sends are rejected (403 MUTED) until the mute expires. Written automatically by the auto_mute action above (and by channel admins later). Muting isn't an end-user action — to stop seeing someone, use block.

block.ts

await client.blockUser("user_spammer");
const blocked = await client.listBlocked(); // [{ user_id, external_id, name }]
await client.unblockUser("user_spammer");

Audit trail

Every automatic action is recorded in a purpose-built moderation_actions log — rule, action, offending user, the message (when one was created), and a redacted excerpt — powering the dashboard feed and the message.moderated webhook. Nothing requires a human in the loop.

Configuration

Moderation settings are per-workspace (enabled rules, action, custom blocklist, flood thresholds) and will be managed from the Cariosan Cloud dashboard. Until that ships they default to disabled, so existing deployments are unaffected until you turn moderation on.

Not AI (yet)

These filters are deterministic and free. Toxicity/AI moderation is an optional, self-hostable add-on planned later — it extends, and never replaces, this in-core layer.

Was this page helpful?

How it works

Rules

Blocking & muting

Audit trail

On this page