Production Deployment

Best practices and recommendations for deploying MeshMonitor in production environments.

Production Checklist

Before deploying to production, ensure:

[ ] HTTPS is configured and working
[ ] SSL/TLS certificates are valid and auto-renewing
[ ] Strong SESSION_SECRET is set
[ ] ALLOWED_ORIGINS is set to your HTTPS domain (REQUIRED!)
[ ] TRUST_PROXY=true is set (for reverse proxy)
[ ] COOKIE_SECURE=true is set (for HTTPS)
[ ] Database backups are configured
[ ] Monitoring and alerting are set up
[ ] Log aggregation is configured
[ ] Reverse proxy is configured with security headers
[ ] Firewall rules are properly configured
[ ] SSO/OIDC is configured (if using)
[ ] Resource limits are set appropriately
[ ] High availability is configured (if required)

Deployment Options

Docker Compose (Small Scale)

For single-server deployments:

yaml

version: '3.8'

services:
  meshmonitor:
    image: meshmonitor:latest
    restart: unless-stopped
    environment:
      - MESHTASTIC_NODE_IP=192.168.1.100
      - SESSION_SECRET=${SESSION_SECRET}
      - NODE_ENV=production
      - TRUST_PROXY=true
      - COOKIE_SECURE=true
      - ALLOWED_ORIGINS=https://meshmonitor.example.com
      - OIDC_ISSUER=${OIDC_ISSUER}
      - OIDC_CLIENT_ID=${OIDC_CLIENT_ID}
      - OIDC_CLIENT_SECRET=${OIDC_CLIENT_SECRET}
      - OIDC_REDIRECT_URI=https://meshmonitor.example.com/api/auth/oidc/callback
    volumes:
      - meshmonitor_data:/app/data
    ports:
      - "127.0.0.1:8080:3001"  # Only bind to localhost
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3001/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  meshmonitor_data:
    driver: local

Kubernetes with Helm (Enterprise Scale)

MeshMonitor includes Helm charts for Kubernetes deployment.

Install with Helm

bash

# Add repository (if published)
helm repo add meshmonitor https://meshmonitor.org/charts
helm repo update

# Install
helm install meshmonitor meshmonitor/meshmonitor \
  --namespace meshmonitor \
  --create-namespace \
  --set meshmonitor.nodeIp=192.168.1.100 \
  --set ingress.enabled=true \
  --set ingress.host=meshmonitor.example.com \
  --set oidc.enabled=true \
  --set oidc.issuer=https://your-idp.com \
  --set oidc.clientId=your-client-id \
  --set oidc.clientSecret=your-client-secret

Custom values.yaml

yaml

# values.yaml
meshmonitor:
  nodeIp: "192.168.1.100"
  sessionSecret: "generate-secure-random-string"

  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "512Mi"
      cpu: "500m"

  replicas: 2  # For high availability

persistence:
  enabled: true
  size: 10Gi
  storageClass: "standard"

oidc:
  enabled: true
  issuer: "https://your-idp.com"
  clientId: "your-client-id"
  clientSecret: "your-client-secret"
  redirectUri: "https://meshmonitor.example.com/api/auth/oidc/callback"

ingress:
  enabled: true
  className: "nginx"
  host: "meshmonitor.example.com"
  tls:
    enabled: true
    secretName: "meshmonitor-tls"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"

Deploy:

bash

helm install meshmonitor ./helm/meshmonitor -f values.yaml

High Availability

Load Balancing

Run multiple instances behind a load balancer:

Docker Compose:

yaml

version: '3.8'

services:
  meshmonitor-1:
    image: meshmonitor:latest
    environment:
      - MESHTASTIC_NODE_IP=192.168.1.100
    volumes:
      - meshmonitor_data:/app/data
    expose:
      - "8080"
    networks:
      - app-network

  meshmonitor-2:
    image: meshmonitor:latest
    environment:
      - MESHTASTIC_NODE_IP=192.168.1.100
    volumes:
      - meshmonitor_data:/app/data
    expose:
      - "8080"
    networks:
      - app-network

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx-lb.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - meshmonitor-1
      - meshmonitor-2
    networks:
      - app-network

volumes:
  meshmonitor_data:

networks:
  app-network:

NGINX Load Balancer Config:

nginx

upstream meshmonitor_backend {
    least_conn;  # Load balancing method
    server meshmonitor-1:8080 max_fails=3 fail_timeout=30s;
    server meshmonitor-2:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 443 ssl http2;
    server_name meshmonitor.example.com;

    # SSL config...

    location / {
        proxy_pass http://meshmonitor_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        # Session stickiness
        proxy_set_header Connection "";
        proxy_http_version 1.1;
    }
}

Session Management

For multiple instances, sessions must be shared:

Options:

Sticky sessions (session affinity) - Route users to same instance
Shared session store - Redis, Memcached, or database
JWT tokens - Stateless authentication

MeshMonitor uses SQLite for sessions by default, which works with sticky sessions.

Security Hardening

Environment Variables

Never hardcode secrets:

bash

# Generate secure session secret
openssl rand -base64 32

# Store in .env file (never commit!)
echo "SESSION_SECRET=$(openssl rand -base64 32)" >> .env

Firewall Configuration

UFW (Ubuntu):

bash

# Allow SSH
sudo ufw allow 22/tcp

# Allow HTTP/HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Deny direct access to app port
sudo ufw deny 8080/tcp

# Enable firewall
sudo ufw enable

iptables:

bash

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow SSH
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Allow HTTP/HTTPS
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Deny app port from external
iptables -A INPUT -p tcp --dport 8080 -j DROP

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

Docker Security

Run as non-root user:

dockerfile

# In Dockerfile
USER node

Limit container capabilities:

yaml

services:
  meshmonitor:
    image: meshmonitor:latest
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Kubernetes Security

Pod Security Policy:

yaml

apiVersion: policy/v1beta1
kind:PodSecurityPolicy
metadata:
  name: meshmonitor-psp
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'persistentVolumeClaim'
    - 'secret'

Backups

Database Backups

Automated backup script:

bash

#!/bin/bash
# backup-meshmonitor.sh

BACKUP_DIR="/backups/meshmonitor"
DATE=$(date +%Y%m%d-%H%M%S)
CONTAINER="meshmonitor"

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Backup database
docker cp "$CONTAINER:/app/data/meshmonitor.db" \
  "$BACKUP_DIR/meshmonitor-$DATE.db"

# Compress
gzip "$BACKUP_DIR/meshmonitor-$DATE.db"

# Keep only last 30 days
find "$BACKUP_DIR" -name "*.db.gz" -mtime +30 -delete

echo "Backup completed: meshmonitor-$DATE.db.gz"

Cron job:

bash

# Run daily at 2 AM
0 2 * * * /usr/local/bin/backup-meshmonitor.sh

Kubernetes CronJob:

yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: meshmonitor-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: alpine:latest
            command:
            - /bin/sh
            - -c
            - |
              apk add --no-cache sqlite
              sqlite3 /data/meshmonitor.db ".backup '/backup/meshmonitor-$(date +%Y%m%d).db'"
              gzip /backup/meshmonitor-$(date +%Y%m%d).db
            volumeMounts:
            - name: data
              mountPath: /data
            - name: backup
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: data
            persistentVolumeClaim:
              claimName: meshmonitor-data
          - name: backup
            persistentVolumeClaim:
              claimName: meshmonitor-backup

Restore from Backup

bash

# Stop MeshMonitor
docker compose down

# Restore database
gunzip -c /backups/meshmonitor-20241012.db.gz > data/meshmonitor.db

# Start MeshMonitor
docker compose up -d

Monitoring

Health Checks

MeshMonitor provides health check endpoints:

bash

# Basic health check
curl http://localhost:8080/api/health

# Detailed status with statistics
curl http://localhost:8080/api/status

Health check response:

json

{
  "status": "ok",
  "timestamp": "2025-10-15T12:00:00.000Z",
  "nodeEnv": "production"
}

Status endpoint response:

json

{
  "status": "ok",
  "timestamp": "2025-10-15T12:00:00.000Z",
  "version": "2.6.0",
  "nodeEnv": "production",
  "connection": {
    "connected": true,
    "localNode": {
      "nodeNum": 123456789,
      "nodeId": "!075bcd15",
      "longName": "My Node",
      "shortName": "NODE"
    }
  },
  "statistics": {
    "nodes": 42,
    "messages": 1337,
    "channels": 3
  },
  "uptime": 86400
}

Log Aggregation

ELK Stack:

yaml

services:
  meshmonitor:
    logging:
      driver: "fluentd"
      options:
        fluentd-address: "localhost:24224"
        tag: "meshmonitor"

Loki:

yaml

services:
  meshmonitor:
    logging:
      driver: "loki"
      options:
        loki-url: "http://loki:3100/loki/api/v1/push"

Alerting

Configure alerting based on the health check endpoints:

bash

# Example monitoring script
#!/bin/bash
HEALTH_URL="https://meshmonitor.example.com/api/health"
STATUS_URL="https://meshmonitor.example.com/api/status"

# Check health endpoint
if ! curl -sf "$HEALTH_URL" > /dev/null; then
  echo "ALERT: MeshMonitor health check failed"
  # Send alert via your preferred method (email, Slack, PagerDuty, etc.)
fi

# Check detailed status
STATUS=$(curl -sf "$STATUS_URL")
if [ $? -eq 0 ]; then
  # Parse JSON and check connection status
  CONNECTED=$(echo "$STATUS" | jq -r '.connection.connected')
  if [ "$CONNECTED" != "true" ]; then
    echo "WARNING: MeshMonitor not connected to node"
  fi
fi

Add this to cron for periodic monitoring:

bash

# Check every 5 minutes
*/5 * * * * /usr/local/bin/check-meshmonitor.sh

Performance Optimization

Resource Limits

Set appropriate resource limits:

Docker:

yaml

services:
  meshmonitor:
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 512M
        reservations:
          cpus: '0.5'
          memory: 256M

Kubernetes:

yaml

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "1000m"

Database Optimization

Enable WAL mode for better performance:

bash

sqlite3 data/meshmonitor.db "PRAGMA journal_mode=WAL;"

Caching

Enable caching at the reverse proxy level (see Reverse Proxy guide).

Updates and Maintenance

Rolling Updates

Docker Compose:

bash

# Pull new image
docker compose pull

# Recreate containers with zero downtime
docker compose up -d --no-deps --build meshmonitor

Kubernetes:

bash

# Update deployment
helm upgrade meshmonitor ./helm/meshmonitor -f values.yaml

# Or with kubectl
kubectl set image deployment/meshmonitor meshmonitor=meshmonitor:v2.0.0

# Monitor rollout
kubectl rollout status deployment/meshmonitor

Maintenance Windows

For major updates:

Notify users of maintenance window
Enable maintenance mode (if available)
Backup database
Perform update
Test functionality
Restore service

Disaster Recovery

Backup Strategy

Follow the 3-2-1 rule:

3 copies of data
2 different storage media
1 off-site backup

Recovery Time Objective (RTO)

Target: < 1 hour

Deploy fresh instance
Restore database from backup
Verify functionality
Update DNS if needed

Testing Recovery

Regularly test your recovery procedure:

bash

# Test restoration in a separate environment
docker compose -f docker-compose.test.yml up -d
docker cp backup.db meshmonitor-test:/app/data/meshmonitor.db
# Verify data integrity

Compliance

Implement data retention policies
Provide user data export
Enable account deletion
Log access to personal data

Audit Logging

Enable comprehensive audit logging:

bash

# View authentication logs
docker logs meshmonitor | grep "auth"

# View all API access
docker logs meshmonitor | grep "api"

Troubleshooting

High CPU Usage

Check for:

Long-running queries
Memory leaks
Excessive logging

bash

# Docker stats
docker stats meshmonitor

# Top processes in container
docker exec meshmonitor top

Database Locked

SQLite database locked errors:

bash

# Enable WAL mode
sqlite3 data/meshmonitor.db "PRAGMA journal_mode=WAL;"

# Increase busy timeout
sqlite3 data/meshmonitor.db "PRAGMA busy_timeout=30000;"

Out of Memory

Increase memory limits or optimize queries.

Next Steps

Configure monitoring and alerting
Set up automated backups
Review security hardening
Test disaster recovery procedures

Production Deployment ​

Production Checklist ​

Deployment Options ​

Docker Compose (Small Scale) ​

Kubernetes with Helm (Enterprise Scale) ​

Install with Helm ​

Custom values.yaml ​

High Availability ​

Load Balancing ​

Session Management ​

Security Hardening ​

Environment Variables ​

Firewall Configuration ​

Docker Security ​

Kubernetes Security ​

Backups ​

Database Backups ​

Restore from Backup ​

Monitoring ​

Health Checks ​

Log Aggregation ​

Alerting ​

Performance Optimization ​

Resource Limits ​

Database Optimization ​

Caching ​

Updates and Maintenance ​

Rolling Updates ​

Maintenance Windows ​

Disaster Recovery ​

Backup Strategy ​

Recovery Time Objective (RTO) ​

Testing Recovery ​

Compliance ​

GDPR Considerations ​

Audit Logging ​

Troubleshooting ​

High CPU Usage ​

Database Locked ​

Out of Memory ​

Next Steps ​

Production Deployment

Production Checklist

Deployment Options

Docker Compose (Small Scale)

Kubernetes with Helm (Enterprise Scale)

Install with Helm

Custom values.yaml

High Availability

Load Balancing

Session Management

Security Hardening

Environment Variables

Firewall Configuration

Docker Security

Kubernetes Security

Backups

Database Backups

Restore from Backup

Monitoring

Health Checks

Log Aggregation

Alerting

Performance Optimization

Resource Limits

Database Optimization

Caching

Updates and Maintenance

Rolling Updates

Maintenance Windows

Disaster Recovery

Backup Strategy

Recovery Time Objective (RTO)

Testing Recovery

Compliance

GDPR Considerations

Audit Logging

Troubleshooting

High CPU Usage

Database Locked

Out of Memory

Next Steps