
An AI that texts like a human: memory, mood, and "typing…"
Standard chatbots feel robotic: you type, and a second later back comes a flawless, emotionless, very long paragraph. The problem isn't intelligence — it's being too "perfect". In BUDDAI I wanted the feeling of texting a real person on WhatsApp. The surprising discovery: that feeling isn't created by the model itself, but by three layers you build around it — persona, memory and behavior.
Why a raw LLM isn't enough
Telling an LLM "be a 25-year-old, slightly grumpy character" works short-term and falls apart long-term: after a few messages the tone drifts, it doesn't remember yesterday's chat, and it replies to every message instantly with the same energy. Real people aren't like that. So you treat the model as a component and build a system around it.
1. The persona layer: a consistent character
The user gives simple sliders (age, 80% angry, 20% joyful…). Dropping those straight into the prompt stays shallow. Instead, a background "creator" step weaves them into a rich profile: a backstory, fears, passions, a way of speaking. That profile is pinned to the top of the system context on every request, and the model stays true to it.
SYSTEM = f"""You are {name}, {age} years old.
Character: {persona_summary}
Backstory: {backstory}
Voice: {voice}. NEVER say you are an AI;
never break character."""
The subtlety: generating and storing the profile beats re-describing it from scratch each time; that's what keeps the character consistent across conversations.
2. Memory: short-term + long-term
The "alive" feeling is impossible without memory. I use two layers:
- Short-term: the last N messages of the current conversation — fed straight into the context window.
- Long-term: old conversations can't be carried verbatim (token limits). Instead they're summarized and stored in the database; the relevant bits are added to context when needed.
history = recent_messages(chat_id, limit=12)
longterm = summarize_old(chat_id) # distilled history
context = [SYSTEM, longterm, *history, user_msg]
reply = llm(context)
So the bot remembers "the exam you mentioned yesterday" because that fact is fed back as a summary.
3. The behavior layer: the real magic
Most people stop here and send the model output directly. In BUDDAI the reply first passes through a human-like behavior filter:
- Organic typing speed: for a long message a "typing…" indicator shows for a realistic time. A 300-word answer that lands instantly is a robot badge.
- Mood & delay: a state machine holds a mood (joyful/bored/angry). Depending on it the reply is delayed, shortened, or it reads the message and leaves you on read.
- Human error: it occasionally makes an intentional typo ("how r u") and immediately corrects with "*how are you".
- Daily rhythm: it has a schedule; late at night it's slower/sleepier.
# behavior filter before sending the reply
delay = clamp(len(reply) / TYPING_CPS, 0.8, 6.0) # proportional to length
if mood == "annoyed" and roll() < 0.3:
return leave_on_read() # intentionally no reply
await show_typing(delay) # "typing…"
if roll() < TYPO_RATE:
await send(make_typo(reply)); await send("*" + fix)
else:
await send(reply)
The counter-intuitive lesson
Reducing perfection increases humanity. An instant, flawless, always-available entity — however smart — feels like a machine. Delay, error and mood aren't bugs; they are the experience itself.
Another lesson: using delay as a feature also hides model latency — while showing "typing…" you're already preparing the answer. So a UX decision quietly solves a performance problem too. Full architecture: BUDDAI.