28 KiB
Beautiful TUI Chat for Local AI Models
Talk to Yuuki models in your terminal.
Streaming responses. Conversation history. Zero cloud required.
|
A full chat experience in your terminal. Select models interactively. |
What is yuy-chat?
yuy-chat is a terminal user interface (TUI) application for chatting with local AI models. Built with Rust and powered by ratatui, it provides a polished, keyboard-driven interface for real-time conversations with Yuuki models -- without ever leaving the terminal.
It connects to proven inference backends (llama.cpp and llamafile) and optionally to the HuggingFace Inference API for cloud-based generation. Model discovery, conversation management, preset switching, and streaming are all handled out of the box.
yuy-chat is the companion tool to yuy (the CLI for downloading and managing models). Together they form the complete local inference toolkit for the Yuuki project.
Features
Interactive ChatReal-time streaming responses displayed word by word. Multi-line input with Shift+Enter. Scrollable message history with keyboard navigation. Model SelectorAuto-discovers Conversation HistorySave conversations as JSON files. Load previous chats from a built-in conversation browser. Delete old sessions you no longer need. HuggingFace CloudOptional API integration for cloud-based inference. Configure your HF token in the settings screen. Local and cloud models appear side by side in the selector. |
Generation PresetsThree built-in modes -- Creative (0.8 temp), Balanced (0.6 temp), and Precise (0.3 temp). Cycle between them with a single keypress. Custom presets planned for v0.2. Settings ScreenConfigure models directory, HuggingFace token, default preset, history saving, and UI theme -- all from within the TUI. Cross-PlatformRuns on Termux (Android), Linux, macOS, and Windows. Same binary, same interface, same experience. Mobile-first defaults for constrained hardware. Lightweight~8 MB binary. ~20 MB idle RAM. ~50 ms startup. Built with Rust for zero-overhead performance and memory safety. |
Installation
Prerequisites
- Rust 1.70 or later (1.75+ recommended)
- An inference runtime: llama.cpp or a
.llamafilemodel - AI models in GGUF or Llamafile format (use yuy to download them)
From Source
git clone https://github.com/YuuKi-OS/yuy-chat
cd yuy-chat
cargo build --release
cargo install --path .
Termux (Android)
pkg install rust git
git clone https://github.com/YuuKi-OS/yuy-chat
cd yuy-chat
cargo build --release -j 1
cargo install --path .
Note: First compilation takes longer on ARM due to CPU constraints. Use
-j 1to avoid thermal throttling. Incremental builds are fast (~10 sec).
Verify
yuy-chat --version
Quick Start
# 1. Get a model (using yuy CLI)
yuy download Yuuki-best
# 2. Install a runtime
pkg install llama-cpp # Termux
brew install llama.cpp # macOS
# 3. Launch the chat
yuy-chat
The interface opens in the model selector. Pick a model with arrow keys and press Enter. Start typing and hit Enter to send messages. That's it.
Keyboard Reference
Model Selector
| Key | Action |
|---|---|
Up / k |
Previous model |
Down / j |
Next model |
Enter |
Select model |
R |
Refresh model list |
Q |
Quit |
Chat
| Key | Action |
|---|---|
Enter |
Send message |
Shift+Enter |
New line |
Ctrl+Enter |
Force send |
Up / Down |
Scroll history (when input is empty) |
Ctrl+C |
Open menu |
Ctrl+L |
Clear chat |
Ctrl+S |
Save conversation |
Backspace |
Delete character |
Menu
| Key | Action |
|---|---|
1 |
Change model |
2 |
Cycle preset |
3 |
Save conversation |
4 |
Load conversation |
5 |
Clear chat |
6 |
Settings |
Q / Esc |
Back to chat |
Settings
| Key | Action |
|---|---|
Up / Down |
Navigate settings |
Enter |
Edit setting |
Esc |
Back to menu |
Generation Presets
| Preset | Temperature | Top P | Best For |
|---|---|---|---|
| Creative | 0.8 | 0.9 | Stories, brainstorming, creative writing |
| Balanced | 0.6 | 0.7 | General chat, explanations (default) |
| Precise | 0.3 | 0.5 | Code, math, factual answers |
Cycle presets during a chat session via Ctrl+C then 2, or set a default in the configuration file.
Supported Formats and Runtimes
Model Formats
|
Inference Runtimes
|
yuy-chat auto-detects the appropriate runtime based on the selected model's format. For GGUF models, it searches for llama-cli, llama, or main binaries in PATH.
HuggingFace Integration
Cloud-based inference is optional. To enable it:
- Get a token from huggingface.co/settings/tokens
- Open Settings in yuy-chat (
Ctrl+Cthen6) - Navigate to "HuggingFace Token" and paste it
Cloud models then appear in the selector alongside local models:
> Yuuki-best-q5.gguf 2.3 GB [Local]
Yuuki-3.7-q4.gguf 1.8 GB [Local]
Yuuki-best (via API) Cloud [HuggingFace]
| Local | Cloud | |
|---|---|---|
| Speed | Faster (no network) | Depends on connection |
| Privacy | 100% offline | Data sent to HF API |
| Storage | Requires disk space | None |
| Availability | Always | Requires internet |
Configuration
Config File
Location: ~/.config/yuy-chat/config.toml
models_dir = "/home/user/.yuuki/models"
default_preset = "Balanced"
save_history = true
theme = "Dark"
# Optional
# hf_token = "hf_xxxxxxxxxxxxx"
Priority Order
- TUI settings -- changes made in the settings screen
- Config file --
~/.config/yuy-chat/config.toml - Defaults -- sensible defaults based on platform detection
Directory Layout
~/.config/yuy-chat/
config.toml # configuration
conversations/ # saved chats
conversation-20260206-143022.json
conversation-20260206-150133.json
~/.yuuki/models/ # models (shared with yuy CLI)
Yuuki-best/
yuuki-best-q4_0.gguf
Yuuki-3.7/
yuuki-3.7-q5_k_m.gguf
Platform-specific paths
| Platform | Config path |
|---|---|
| Linux | ~/.config/yuy-chat/config.toml |
| macOS | ~/.config/yuy-chat/config.toml |
| Windows | C:\Users\{user}\AppData\Roaming\yuy-chat\config.toml |
| Termux | /data/data/com.termux/files/home/.config/yuy-chat/config.toml |
Architecture
User
|
v
+-------------------------------------------------------------+
| yuy-chat (Rust) |
| |
| main.rs Entry point + event loop |
| | crossterm polling (100ms) |
| v |
| app.rs State machine |
| | ModelSelector | Chat | Menu | |
| | Settings | ConversationList |
| v |
| ui/ Rendering layer (ratatui) |
| | selector.rs, chat.rs, menu.rs, |
| | settings.rs, conversations.rs |
| v |
| models/ Business logic |
| scanner.rs, runtime.rs, hf_api.rs |
+-------+----------------------------+-----------------------+
| |
v v
+------------------+ +-------------------+
| External APIs | | Local Storage |
| HuggingFace | | ~/.config/ |
| Inference API | | ~/.yuuki/models/ |
+--------+---------+ +-------------------+
|
v
+--------------------------------+
| Inference Runtimes |
| llama.cpp | llamafile |
+--------------------------------+
Source Layout
yuy-chat/
Cargo.toml # manifest and dependencies
README.md
TECHNICAL.md # full technical specification
src/
main.rs # entry point, event loop, terminal setup
app.rs # application state, message handling
config.rs # config load/save, presets, themes
conversation.rs # message storage, JSON persistence
models/
mod.rs # module declarations
scanner.rs # auto-discovery of local + HF models
runtime.rs # subprocess management, streaming
hf_api.rs # HuggingFace Inference API client
ui/
mod.rs # module declarations
selector.rs # model selection screen
chat.rs # main chat interface
menu.rs # options menu
settings.rs # configuration screen
conversations.rs # saved conversations browser
Data Flow
User Input --> Event Loop --> App State --> Business Logic --> UI Render
^ |
+-----------------------------+
(state mutation loop)
Design Patterns
| Pattern | Implementation |
|---|---|
| State machine | AppState enum drives which screen is active and how events are routed |
| Async streaming | Tokio channels (mpsc) pipe inference output chunk-by-chunk to the UI |
| Subprocess isolation | llama.cpp runs in a spawned Child process with piped stdout |
| Double buffering | ratatui handles minimal redraws automatically |
| Lazy loading | Models and conversations are loaded on-demand, not at startup |
Technical Specifications
Project Metrics
Language: Rust 2021 Edition
Lines of code: ~1,453
Rust source files: 15
Modules: 5
Public functions: ~45
Data structures: 12
Enums: 8
Direct dependencies: 16
Binary size (release): ~8 MB
Performance
| Operation | Time |
|---|---|
| Startup (no models) | ~50 ms |
| Startup (10 models) | ~200 ms |
| Model scan (10 models) | ~100 ms |
| Render frame | ~1-2 ms |
| Send message (pre-inference) | ~5 ms |
| Save conversation | ~10 ms |
Memory Usage
| State | RAM |
|---|---|
| Idle (no model) | ~20 MB |
| Model loaded | ~50 MB |
| Active inference | ~100-500 MB |
| Peak (large models) | ~1 GB |
System Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| Rust | 1.70+ | 1.75+ |
| RAM | 512 MB | 2 GB |
| Disk | 50 MB (binary) | 100 MB |
| CPU | ARM/x86 (32/64-bit) | x86_64 or ARM64 |
| Terminal | Unicode support | Modern terminal emulator |
Dependencies
| Crate | Purpose |
|---|---|
ratatui |
Terminal UI framework |
crossterm |
Cross-platform terminal control |
tokio |
Async runtime |
reqwest |
HTTP client (HuggingFace API) |
serde + serde_json + toml |
Serialization |
chrono |
Timestamps for conversations |
walkdir |
Recursive model directory scanning |
dirs |
Cross-platform home directory |
anyhow + thiserror |
Error handling |
colored |
Terminal colors |
tracing |
Logging |
which |
Binary detection in PATH |
Platform Support
| Platform | Status | Notes |
|---|---|---|
| Termux (Android) | Full support | Primary target. ARM64 tested |
| Linux x86_64 | Full support | Ubuntu 22.04+ tested |
| Linux ARM64 | Full support | Raspberry Pi 4 tested |
| macOS Intel | Full support | Catalina+ tested |
| macOS Apple Silicon | Full support | M1/M2 tested |
| Windows 10/11 | Full support | Windows 11 tested |
| FreeBSD | Untested | Should work |
Termux (Android) -- Primary Target
Optimizations applied automatically when Termux is detected:
- Single-threaded compilation (
-j 1) to prevent thermal throttling - Conservative I/O patterns for mobile storage
- Simplified progress indicators for narrow terminal widths
Detection method:
std::env::var("PREFIX")
.map(|p| p.contains("com.termux"))
.unwrap_or(false)
macOS
- Metal GPU acceleration available through llama.cpp
- Homebrew for runtime installation (
brew install llama.cpp) - Full keyboard support in Terminal.app and iTerm2
Windows
- Windows Terminal recommended for best rendering
- Backslash path handling automatic
- CUDA acceleration via llama.cpp for NVIDIA GPUs
Conversation Format
Conversations are saved as JSON files in ~/.config/yuy-chat/conversations/.
Filename convention: conversation-{YYYYMMDD}-{HHMMSS}.json
{
"messages": [
{
"role": "user",
"content": "Explain async/await in Rust",
"timestamp": "2026-02-06T14:30:22.123Z"
},
{
"role": "assistant",
"content": "async/await in Rust allows you to write...",
"timestamp": "2026-02-06T14:30:25.456Z"
}
],
"created_at": "2026-02-06T14:30:22.123Z",
"updated_at": "2026-02-06T14:35:10.789Z"
}
Security
Current
- HTTPS only -- all HuggingFace API calls use TLS (rustls, no OpenSSL)
- No shell injection -- subprocesses use
Command::arg(), never string interpolation - Scoped file access -- all reads/writes within
~/.config/yuy-chat/and~/.yuuki/ - Process isolation -- llama.cpp runs as a separate subprocess with piped I/O
Known Limitations
- HuggingFace tokens are stored in plaintext in
config.toml - File permissions are not enforced on config files
Planned (v0.2+)
- System keyring integration for token storage
- File permission enforcement (
0o600for sensitive files) - Encrypted token storage on Termux via libsodium
- Input size limits and sanitization
Troubleshooting
No models found
# Check if models exist
ls ~/.yuuki/models/
# Download a model using yuy CLI
yuy download Yuuki-best
# Or place a .gguf file manually
cp your-model.gguf ~/.yuuki/models/
llama.cpp not found
# Termux
pkg install llama-cpp
# macOS
brew install llama.cpp
# Verify
which llama-cli
Permission denied on llamafile
chmod +x ~/.yuuki/models/*.llamafile
Slow responses
- Use a smaller quantization (
q4_0instead ofq8_0) - Check available RAM (
free -hortop) - Switch to the Precise preset (shorter outputs)
- Ensure no other heavy processes are running
UI rendering issues
- Use a terminal with Unicode support
- On Windows, use Windows Terminal (not CMD)
- On Termux, ensure terminal encoding is UTF-8
- Try resizing the terminal window
Roadmap
v0.2 -- Enhanced UX
- Syntax highlighting for code blocks
- Copy/paste support
- Export conversations to Markdown
- Custom system prompts
- Vim keybindings mode
- Custom user-defined presets
v0.3 -- Power Features
- Multiple chat tabs
- Search in conversation history
- Token usage statistics
- Model comparison mode
- Template system for prompts
v1.0 -- Ecosystem
- Plugin system
- Custom themes (user-defined color schemes)
- Conversation branching
- Multi-modal support (images)
- REST API server mode
Contributing
Development Setup
git clone https://github.com/YuuKi-OS/yuy-chat
cd yuy-chat
# install Rust if needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# build and verify
cargo build
cargo test
cargo fmt -- --check
cargo clippy
Commit Convention
<type>(<scope>): <subject>
Types: feat | fix | docs | style | refactor | test | chore
feat(chat): add multi-line input support
- Detect Shift+Enter for newlines
- Update input rendering for wrapped text
- Add cursor position tracking
Closes #12
Pull Request Checklist
- Tests pass (
cargo test) - Code is formatted (
cargo fmt) - No clippy warnings (
cargo clippy) - Documentation updated if needed
- Commits follow the convention above
Coding Standards
snake_casefor functions,CamelCasefor types- Document all public functions with
///comments - Use
Result<T>and the?operator for error handling - Prefer
async/awaitover callbacks - Justify any new dependency in the PR description
Design Decisions
Why a TUI instead of a GUI or web UI?
The primary target is Termux on Android. A TUI requires no display server, no browser, and minimal resources. It also works over SSH, inside tmux, and in any terminal emulator on any platform. A GUI may be added as an optional feature later.
Why ratatui?
ratatui is the most actively maintained TUI framework in the Rust ecosystem. It provides immediate-mode rendering, a rich widget library, and cross-platform terminal support through crossterm. The API is well-documented and the community is responsive.
Why subprocess spawning instead of library linking?
Linking llama.cpp as a C library adds significant build complexity, especially for cross-compilation and Termux. Spawning a subprocess is simpler, isolates crashes, and allows the user to update llama.cpp independently. Library integration is planned for v1.0.
Why Tokio for a TUI?
Inference is slow. Without async, the UI would freeze during response generation. Tokio enables non-blocking subprocess reads, smooth streaming display, and sets the foundation for future parallel features like multi-tab chat.
Why JSON for conversations instead of SQLite?
JSON files are human-readable, trivially portable, and require no additional dependency. Each conversation is self-contained. SQLite may be introduced in v1.0 if search and indexing become necessary.
Build Configuration
Release Profile
[profile.release]
opt-level = "z" # optimize for binary size
lto = true # link-time optimization
codegen-units = 1 # single codegen unit for better optimization
strip = true # strip debug symbols
Environment Variables
RUST_LOG=debug yuy-chat # enable debug logging
RUST_LOG=info yuy-chat # info-level logging
YUY_MODELS_DIR=/path yuy-chat # custom models directory
XDG_CONFIG_HOME=/custom yuy-chat # custom config directory
Cross-Compilation
# ARM64 (Raspberry Pi, Termux native)
rustup target add aarch64-unknown-linux-gnu
cargo build --release --target aarch64-unknown-linux-gnu
# Windows (from Linux)
rustup target add x86_64-pc-windows-gnu
cargo build --release --target x86_64-pc-windows-gnu
# macOS Apple Silicon (from Linux)
rustup target add aarch64-apple-darwin
cargo build --release --target aarch64-apple-darwin
About the Yuuki Project
yuy-chat exists to serve the Yuuki project -- a code-generation LLM being trained entirely on a smartphone with zero cloud budget.
|
Training Details
|
Quality Scores (Checkpoint 2000)
|
A fully native model (trained from scratch, not fine-tuned) is planned for v1.0. A research paper documenting the mobile training methodology is in preparation.
Related Projects
| Project | Description |
|---|---|
| yuy | CLI for downloading, managing, and running Yuuki models |
| Yuuki-best | Best checkpoint model weights |
| Yuuki Space | Web-based interactive demo |
| yuuki-training | Training code and scripts |
Links
License
Copyright 2026 Yuuki Project
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.