Collecting Logs with Loki and Grafana Alloy (Automated)
Previous: Prometheus Metrics Setup
You've got metrics, but what happens when something breaks? Metrics tell you CPU spiked, but logs tell you why - which process crashed, what error it threw, which user triggered it. In this guide, we'll add Loki (log aggregation) and Grafana Alloy (log collection) using automated Ansible deployment to complete your observability stack.
Think of it this way: Prometheus stores metrics (numbers over time), Loki stores logs (text over time). Both query similarly, both visualize in Grafana.
What this tutorial covers:
- Automated Loki + Alloy deployment using Ansible
- Auto-configuring Loki as Grafana data source
- Auto-importing log dashboards
- Collecting systemd journal, Docker containers, and /var/log files
- Querying logs in Grafana
- Correlating logs with metrics
Time to complete: 5-10 minutes (automated deployment)
Github Repository
All the configuration and deployment scripts from this guide are available in https://github.com/IaC-Toolbox/iac-toolbox-raspberrypi. Clone it and follow along!
What Are We Adding?
Loki: A log aggregation system designed to be lightweight and cost-effective. It's like Prometheus, but for logs instead of metrics.
Grafana Alloy: A modern log and metrics collector that ships logs to Loki. Replaces the older Promtail tool with better performance and automatic service discovery.
Together they give you:
- Centralized log storage from all your services
- Full-text search across logs
- Correlation between logs and metrics
- Log-based alerting
Why Not Just Use Docker Logs or journalctl?
Docker logs are scattered: Each container has its own logs. Want to search across all containers? Good luck.
journalctl is local-only: SSH into your Pi every time you want to check logs? That's tedious.
No retention control: Docker logs grow unbounded unless you configure limits per-container.
Loki centralizes everything: Query all logs from Grafana's web UI, correlate with metrics, and set up alerts.
The Complete Stack
Here's how logs flow in your observability setup:
┌────────────────────────────────────────────────────────────────┐
│ LOGS + METRICS STACK │
└────────────────────────────────────────────────────────────────┘
🌍 You → https://grafana.iac-toolbox.com
│
▼
┌─────────────────────────────────────────────────┐
│ Grafana (Port 3000) │
│ • Queries Prometheus (metrics) │
│ • Queries Loki (logs) │
│ • Dashboards with metrics + logs │
│ • Auto-configured via Ansible │
└────────┬────────────────────────┬─────────────── ┘
│ │
│ Metrics │ Logs
▼ ▼
┌─────────────────┐ ┌─────────────────────┐
│ Prometheus │ │ Loki (Port 3100) │
│ (Port 9090) │ │ • Stores logs │
│ │ │ • Configurable │
│ │ │ retention │
└─────────────────┘ └─────────▲───────────┘
│
│ Ships logs
┌──────────┴──────────┐
│ Grafana Alloy │
│ (Port 12345) │
│ • Collects logs │
│ • Tags & filters │
└─────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
systemd journal Docker containers /var/log files
• Service logs • stdout/stderr • auth.log
• Boot events • All containers • syslog
• OOM kills
Data flows: Sources → Alloy → Loki → Grafana
All connected via shared 'monitoring' Docker networkWhat Logs We're Collecting
We'll collect from three sources to cover all failure modes:
1. Systemd Journal
- Service crashes and restarts
- Out-of-memory (OOM) kills
- Boot events
- Any systemd-managed service on your Pi
2. Docker Container Logs
- stdout/stderr from all containers
- Grafana, Prometheus, Vault, your apps
- Automatically tags each container
3. System Logs (/var/log)
- Authentication attempts (auth.log)
- Kernel messages (syslog)
- Cron job output
- Network events
This covers OS-level, container-level, and application-level logs.
What You Need
Before starting:
- Grafana and Prometheus already running (previous tutorials)
- SSH access to your Raspberry Pi
- The iac-toolbox-raspberrypi repository
- About 2-3GB free disk space for log storage
Game Plan
Here's what Ansible will do automatically:
- Deploy Loki and Grafana Alloy containers
- Join them to the
monitoringDocker network (same as Grafana) - Configure Loki for 7-day retention (configurable)
- Configure Alloy to collect logs from systemd, Docker, and /var/log
- Create Loki data source in Grafana via API
- Import log dashboards automatically
- Set up systemd service for auto-start on reboot
- Optionally expose Alloy UI via Cloudflare tunnel
All with one command!
How Ansible Deploys This
Behind the scenes, the Ansible loki role:
- Creates directory structure -
~/loki/on Raspberry Pi - Generates configuration files from templates
loki-config.ymlwith TSDB storage and retentionalloy-config.alloywith three log sourcesdocker-compose.ymlwith both services
- Creates monitoring network (if not exists) - shared with Grafana/Prometheus
- Pulls container images -
grafana/loki:latestandgrafana/alloy:latest - Deploys systemd service -
loki.servicefor auto-start - Waits for Loki to be ready - polls the
/readyendpoint - Configures Grafana data source via API
- Creates Loki data source at
http://loki:3100 - Uses Docker DNS (containers on same network)
- Creates Loki data source at
- Imports log dashboards via Grafana API
- Dashboard ID 13639: "Loki Dashboard - simple log viewer"
The entire deployment is idempotent - you can re-run it safely.
Why the monitoring network?
All observability components (Grafana, Prometheus, Loki, Alloy) join the monitoring Docker network. This enables:
- Docker DNS resolution - Grafana reaches Loki at
http://loki:3100 - Container-to-container communication - No need to expose ports to host
- Isolation - Monitoring stack is separated from other containers
Step 1: Optional Configuration
The default Loki retention is 7 days (168 hours), which is perfect for most Raspberry Pi setups. If you want to change it, edit the configuration:
# Edit Ansible configuration
nano ansible-configurations/inventory/group_vars/all.ymlFind the loki section and adjust retention:
# Loki Log Aggregation Configuration
loki:
enabled: true
version: "latest"
base_dir: "/home/{{ ansible_user }}/loki"
port: 3100
retention_hours: 168 # 7 days (72=3d, 168=7d, 336=14d, 720=30d)Retention options:
72- 3 days (minimal storage)168- 7 days (default, recommended)336- 14 days (more history)720- 30 days (high storage)
If you want to expose the Alloy monitoring UI (optional), add it to the Cloudflare domains list:
cloudflare:
domains:
# ... existing domains ...
- hostname: alloy.iac-toolbox.com
service_port: 12345This gives you a web UI to monitor log collection status at https://alloy.iac-toolbox.com.
Advanced: Configuring Alloy collection sources
The default configuration collects from three sources. If you need to add more sources or modify collection, edit:
nano ansible-configurations/playbooks/roles/loki/templates/alloy-config.alloy.j2For example, to add a custom application log file, add a new local.file_match and loki.source.file block (see "Next Steps" section for examples).
Step 2: Deploy Loki and Alloy
Run the Ansible playbook to deploy the entire log collection stack:
cd ansible-configurations
# Deploy Loki + Alloy
./scripts/setup.sh --tags lokiThis single command:
- Deploys Loki container for log storage
- Deploys Grafana Alloy container for log collection
- Configures Alloy to collect from systemd, Docker, and /var/log
- Creates the Loki data source in Grafana automatically
- Imports log dashboards
- Sets up systemd service for auto-start
What Ansible deploys behind the scenes:
Loki Configuration
Ansible creates a modern Loki config with TSDB (Time Series Database) storage:
auth_enabled: false
server:
http_listen_port: 3100
common:
ring:
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /loki
schema_config:
configs:
- from: 2024-01-01
store: tsdb # Modern TSDB storage
object_store: filesystem
schema: v13 # Latest schema
index:
prefix: index_
period: 24h
storage_config:
tsdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/cache
filesystem:
directory: /loki/chunks
ingester:
chunk_idle_period: 5m
chunk_retain_period: 30s
max_chunk_age: 1h
wal:
dir: /loki/wal # Write-Ahead Log for durability
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h # 7 days default
ingestion_rate_mb: 16
ingestion_burst_size_mb: 32
retention_period: 168h # Configurable via group_vars
compactor:
working_directory: /loki/compactor
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
delete_request_store: filesystemKey improvements over older configs:
- TSDB storage - More efficient than boltdb-shipper (older storage backend)
- Schema v13 - Latest Loki schema version (older: v11)
- WAL (Write-Ahead Log) - Prevents log loss on crashes or restarts
- Compactor with retention - Automatically deletes old logs based on
retention_hours - delete_request_store - Required for retention to work properly
If you've seen older Loki tutorials, they might use boltdb-shipper storage and schema v11. This automated deployment uses the modern TSDB approach which is:
- Faster for queries (better indexing)
- More space-efficient (better compression)
- Simpler configuration (fewer moving parts)
Alloy Configuration
Alloy collects logs from three sources and ships them to Loki:
// Scrape systemd journal (service logs, boot events, OOM kills)
loki.source.journal "systemd" {
max_age = "24h"
forward_to = [loki.write.local.receiver]
labels = {
job = "systemd",
host = "{{ ansible_hostname }}",
}
}
// Scrape Docker container logs
loki.source.docker "containers" {
host = "unix:///var/run/docker.sock"
targets = []
forward_to = [loki.write.local.receiver]
labels = {
job = "docker",
host = "{{ ansible_hostname }}",
}
}
// Scrape /var/log files (auth, syslog, etc.)
local.file_match "system_logs" {
path_targets = [
{
__address__ = "localhost",
__path__ = "/var/log/syslog",
job = "syslog",
host = "{{ ansible_hostname }}",
},
{
__address__ = "localhost",
__path__ = "/var/log/auth.log",
job = "auth",
host = "{{ ansible_hostname }}",
},
]
}
loki.source.file "system_logs" {
targets = local.file_match.system_logs.targets
forward_to = [loki.write.local.receiver]
}
// Send all logs to Loki
loki.write "local" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}What each component does:
loki.source.journal- Reads systemd journal (service crashes, OOM kills, boot events)loki.source.docker- Monitors all Docker containers via socket, auto-discovers new containersloki.source.file- Tails /var/log files (authentication attempts, system logs)loki.write- Ships all collected logs to Loki via HTTP
Each source tags logs with job and host labels for easy filtering in Grafana.
Step 3: Verify Deployment
After Ansible completes, verify everything is running:
# SSH to your Raspberry Pi
ssh <your-user>@<raspberry-pi>
# Check containers are running
docker ps | grep -E 'loki|alloy'You should see:
loki grafana/loki:latest Up X minutes
alloy grafana/alloy:latest Up X minutesCheck the logs for successful startup:
# Check Loki is ready
docker logs loki | tail -20
# Check Alloy is collecting logs
docker logs alloy | tail -20Verify the systemd service is enabled:
systemctl status loki.serviceShould show Active: active (running) and Enabled: enabled.
What got deployed:
- Loki container on port 3100 (log storage)
- Alloy container on port 12345 (log collector)
- Both joined the
monitoringnetwork - Systemd service
loki.servicefor auto-start - Configuration files in
~/loki/
File structure on Raspberry Pi:
/home/<your-user>/loki/
├── docker-compose.yml # Service definitions
├── loki-config.yml # Loki configuration
├── alloy-config.alloy # Alloy collection config
└── loki_data/ # Volume for log storage (created by Docker)
/etc/systemd/system/
└── loki.service # Systemd service for auto-startStep 4: Verify Grafana Integration
Ansible automatically configured Loki as a Grafana data source and imported log dashboards. Let's verify:
Open Grafana:
https://grafana.iac-toolbox.comLogin with your admin credentials.
Verify Data Source
- Click the menu (☰) → Connections → Data sources
- You should see Loki in the list
- Click on it to verify the connection
- Should show: "Data source connected and labels found"
What Ansible configured:
- Data source name:
Loki - URL:
http://loki:3100 - Access mode:
proxy(Grafana queries Loki on your behalf) - Max lines: 1000 (prevents overwhelming the UI)
Verify Dashboards
- Click Dashboards (left sidebar)
- Search for "Logs"
- You should see imported log dashboards
Ansible imported the community dashboard "Loki Dashboard - simple log viewer" (Dashboard ID 13639) which provides:
- Log search interface
- Time range selector
- Log level filters
- Multi-source view (systemd, Docker, auth, syslog)
Using the dashboard:
- Go to Dashboards → Search for "Logs"
- Click to open the log viewer dashboard
- Use the job dropdown to filter:
systemd- Service logs, OOM kills, boot eventsdocker- All container logsauth- SSH login attemptssyslog- System logs
- Adjust time range (top right)
- Use search box to filter log lines
This dashboard is great for quick log browsing. For more advanced queries (regex, rate calculations), use the Explore view.
Note: Loki has no web UI of its own. All log querying and visualization happens in Grafana through Explore and Dashboards. This is intentional - Grafana is your single pane of glass for metrics AND logs.
Optional: Accessing Alloy UI
If you configured the Alloy domain in Cloudflare (Step 1), you can monitor log collection status:
https://alloy.iac-toolbox.comThe Alloy UI shows:
- Active log sources (systemd, Docker, /var/log)
- Logs per second being collected
- Pipeline health
- Component status
This is useful for debugging collection issues, but not required for normal operation.
Step 5: Query Your Logs
Let's verify logs are flowing. In Grafana:
- Click Explore (compass icon) in the left sidebar
- Select Loki as the data source (top dropdown)
- Try these queries:
All logs from systemd journal:
{job="systemd"}All Docker container logs:
{job="docker"}Logs from a specific container (e.g., grafana):
{job="docker", container_name="grafana"}Authentication logs:
{job="auth"}Syslog:
{job="syslog"}Click Run query and you should see logs streaming in!
Search Within Logs
Want to find specific text? Use filters:
Find all errors:
{job="systemd"} |= "error"Find OOM kills:
{job="systemd"} |= "Out of memory"Find failed SSH attempts:
{job="auth"} |= "Failed password"The |= operator searches for text within log lines.
Common use cases:
# Find service restart events
{job="systemd"} |= "Started" or "Stopped"
# Find container crashes
{job="docker"} |= "Exited"
# Monitor authentication attempts
{job="auth"} |= "Accepted" or "Failed"
# Count failed login attempts in last hour
count_over_time({job="auth"} |= "Failed password" [1h])The last query shows number of failed SSH attempts - useful for alerting!
Debugging Your Application Containers
Want to check logs from your own app? Here's how:
First, see which containers are being monitored:
In Grafana Explore with Loki selected, query:
{job="docker"}Click on a log line, then expand the "Labels" section. You'll see all available labels including:
container_name- The actual container name (e.g., "my-app", "grafana", "vault")container_id- Docker container IDcontainer_image- Image name
List all unique container names:
Use the label browser in Grafana (click the label icon) or query specific labels to see what's available.
View logs from a specific app (e.g., my-app):
{job="docker", container_name="my-app"}Find errors in your app:
{job="docker", container_name="my-app"} |= "error"Find exceptions:
{job="docker", container_name="my-app"} |~ "exception|Exception|ERROR"The |~ operator does regex matching - this finds "exception", "Exception", or "ERROR".
Check app startup logs:
{job="docker", container_name="my-app"} |= "started"Tail logs in real-time:
Set the time range to "Last 5 minutes" and enable "Live" mode (top right). Now you're watching logs stream in real-time, just like docker logs -f my-app!
Common Debugging Scenarios
App crashed, what happened?
{job="docker", container_name="my-app"} |~ "crash|killed|exit"Look at logs right before the container stopped.
Performance issues - find slow requests:
{job="docker", container_name="my-app"} |= "slow" or |= "timeout"Database connection issues:
{job="docker", container_name="my-app"} |= "database" |= "connection"API errors:
{job="docker", container_name="my-app"} |= "status" |~ "5[0-9][0-9]"This finds HTTP 5xx errors in logs.
Multiple Containers
Running multiple instances of your app? Query them all:
{job="docker", container_name=~"my-app.*"}The =~ operator matches regex, so this gets my-app, my-app-worker, my-app-api, etc.
Pro Tip: Debugging with Time Windows
When your app breaks, compare logs before and after:
- Set time range to when the issue happened (e.g., "Last 15 minutes")
- Query your app logs
- Look for errors or exceptions
- Adjust time range to just before the issue
- See what changed
Example workflow:
# Current errors (app is broken)
{job="docker", container_name="my-app"} |= "error"
# Check what happened 5 minutes before
{job="docker", container_name="my-app"} Adjust the time range to see the pattern. This helps identify what triggered the issue.
Step 6: Correlate Logs with Metrics
The real power is combining logs and metrics. When CPU spikes, what logs appeared at that time?
Split View in Explore
- In Grafana Explore, click Split (top right)
- Left panel: Select Prometheus, query CPU:
100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) - Right panel: Select Loki, query systemd logs:
{job="systemd"} - Sync the time ranges
Now you can see CPU usage graph on the left and logs on the right. When CPU spikes, check what logs appeared at that moment!
Link from Dashboards
When building dashboards, you can add log panels alongside metric panels:
- Metrics panel shows CPU over time
- Logs panel shows recent errors
- Both update in real-time
This helps diagnose issues faster - metrics show the symptom, logs show the cause.
Understanding Loki Query Language (LogQL)
Loki uses a query language similar to Prometheus:
Label matchers:
{job="docker"} # Exact match
{job=~"docker|systemd"} # Regex match
{job!="auth"} # Not equalLog filters:
{job="systemd"} |= "error" # Contains "error"
{job="systemd"} != "debug" # Does not contain "debug"
{job="systemd"} |~ "error|fail" # Regex matchRate queries (like Prometheus):
rate({job="docker"}[5m]) # Log lines per secondCount occurrences:
count_over_time({job="auth"} |= "Failed password" [1h])This counts failed login attempts in the last hour. You can alert on this!
When Things Break
No logs showing up in Grafana?
Check Alloy is collecting:
docker logs alloy | grep "push request"You should see lines like:
level=info msg="successful push" bytes=1234If not, Alloy isn't shipping logs to Loki.
Loki connection fails?
Test from Alloy container:
docker exec alloy curl http://loki:3100/readyShould return ready. If not, Loki isn't accessible from Alloy.
"No labels found" error in Grafana?
Loki isn't receiving any logs. Check:
# Check Loki received anything
curl http://localhost:3100/loki/api/v1/labelShould return JSON with labels like {job="systemd"}. If empty, no logs have reached Loki yet.
Docker logs not appearing?
Check Alloy has access to Docker socket:
docker exec alloy ls -la /var/run/docker.sockShould show the socket file. If "permission denied", Alloy user needs Docker group access.
Systemd journal logs missing?
Check journal is accessible:
docker exec alloy ls -la /run/log/journalShould show journal files. If empty, journal might not be enabled on your Pi.
Disk space filling up?
Check Loki data size:
du -sh ~/loki/loki_dataIf it's huge, reduce retention period in Ansible configuration:
# Edit the config
nano ansible-configurations/inventory/group_vars/all.yml
# Change retention_hours
loki:
retention_hours: 72 # 3 days instead of 7Re-run Ansible to apply changes:
cd ansible-configurations
./scripts/setup.sh --tags lokiData source not showing in Grafana?
Ansible creates it automatically. If missing, check Ansible output:
# Check if data source creation succeeded
grep -i "loki.*datasource" ansible-configurations/logs/*.logYou can manually verify via Grafana API:
curl -u admin:your-password http://localhost:3000/api/datasources/name/LokiDashboards not imported?
Check Ansible task output. Dashboard import happens after data source creation. If it failed:
# Re-run just the Loki tasks
cd ansible-configurations
./scripts/setup.sh --tags lokiStorage and Retention
Logs can grow fast. Here's what you should know:
Default retention: 7 days
Our config keeps logs for 7 days, then deletes them. For a typical home setup with a few containers, this uses 1-2GB.
Check disk usage:
# SSH to Raspberry Pi
ssh <your-user>@<raspberry-pi>
# Check Loki data size
du -sh ~/loki/loki_data
# Check total logging stack
du -sh ~/lokiAdjust retention:
Edit Ansible group_vars:
# On your local machine
cd ansible-configurations
nano inventory/group_vars/all.ymlFind the loki section:
loki:
retention_hours: 168 # Change to desired valueOptions:
72- 3 days (minimal)168- 7 days (default)336- 14 days720- 30 days
Re-deploy to apply changes:
./scripts/setup.sh --tags lokiWhat uses the most space:
Docker container logs usually dominate. If one noisy container floods your logs, you can exclude it by modifying the Alloy template in your Ansible role:
# Edit the Alloy template
nano ansible-configurations/playbooks/roles/loki/templates/alloy-config.alloy.j2Add filtering to the Docker source:
loki.source.docker "containers" {
host = "unix:///var/run/docker.sock"
targets = []
forward_to = [loki.write.local.receiver]
labels = {
job = "docker",
host = "{{ ansible_hostname }}",
}
// Exclude noisy container
relabel_configs {
source_labels = ["__meta_docker_container_name"]
regex = "noisy-container"
action = "drop"
}
}Then re-deploy:
cd ansible-configurations
./scripts/setup.sh --tags lokiNext Steps
You now have complete observability! Here's what to do next:
Build log dashboards:
- Create panels showing recent errors
- Display container restart counts
- Track failed login attempts
- Monitor disk space warnings
Set up log-based alerts:
Alert when specific patterns appear:
count_over_time({job="auth"} |= "Failed password" [5m]) > 5This alerts on more than 5 failed logins in 5 minutes (possible brute-force attack).
Add application logs:
Once you deploy applications, add their log files to Alloy. Edit the Alloy template:
nano ansible-configurations/playbooks/roles/loki/templates/alloy-config.alloy.j2Add new file sources:
local.file_match "myapp" {
path_targets = [{
__address__ = "localhost",
__path__ = "/var/log/myapp/*.log",
job = "myapp",
host = "{{ ansible_hostname }}",
}]
}
loki.source.file "myapp" {
targets = local.file_match.myapp.targets
forward_to = [loki.write.local.receiver]
}Re-deploy:
cd ansible-configurations
./scripts/setup.sh --tags lokiParse structured logs:
If your app logs JSON, Alloy can parse it:
loki.source.file "myapp" {
targets = local.file_match.myapp.targets
forward_to = [loki.process.json.receiver]
}
loki.process "json" {
forward_to = [loki.write.local.receiver]
stage.json {
expressions = {
level = "level",
msg = "message",
}
}
}Now you can query by JSON fields: {job="myapp"} | json | level="error"
Summary
And that's a wrap! You've added automated centralized logging to your Raspberry Pi:
What you deployed:
- Loki for log storage (configurable retention, default 7 days)
- Grafana Alloy for log collection
- Three log sources: systemd, Docker, /var/log
- Loki data source auto-configured in Grafana
- Log dashboards auto-imported
- Systemd service for auto-start
Deployment method:
cd ansible-configurations
./scripts/setup.sh --tags lokiFiles created on Raspberry Pi:
~/loki/loki-config.yml- Loki configuration (TSDB, schema v13)~/loki/alloy-config.alloy- Alloy collection config~/loki/docker-compose.yml- Both services/etc/systemd/system/loki.service- Systemd service
What you can do now:
- Search all logs from Grafana UI
- Correlate logs with metrics using split view
- Track down why services crashed
- Find authentication failures
- Monitor container restarts
- Set up log-based alerts (next tutorial)
Configuration management:
All settings in ansible-configurations/inventory/group_vars/all.yml:
- Retention period (72h, 168h, 336h, 720h)
- Port configuration
- Cloudflare tunnel for Alloy UI (optional)
Your observability stack is complete: metrics (Prometheus) + logs (Loki) + visualization (Grafana) + alerting (PagerDuty). When something breaks, you'll know what happened and why!
Next steps:
- Set up log-based alerts (OOM kills, failed logins, container crashes)
- Add custom application logs
- Create custom log dashboards
The complete Ansible role is in the iac-toolbox-raspberrypi repository under playbooks/roles/loki/.
Previous: Prometheus Metrics Setup | Next: Grafana Alerts