Monitoring & Alerting
What to Monitor
| Metric | Why | Alert Threshold |
|---|---|---|
| Block height | Detect sync stalls | No increase in 30 minutes |
| Peer count | Network connectivity | Below 3 peers |
| Disk usage | Prevent full disk | Above 85% capacity |
| RAM usage | Detect memory leaks | Above 90% capacity |
| Process status | Detect crashes | Process not running |
| BPoS status | Detect penalties | Status changed from "Active" |
| Mempool size | Detect spam/congestion | Above 10,000 transactions |
| Block timestamp | Detect clock drift | More than 5 minutes behind |
| Chain data growth rate | Capacity planning | N/A (informational) |
Built-In Status Commands
# Full status of all chains
node.sh status
# Per-chain status (shows 18 metrics for ELA)
node.sh ela status
# Output includes:
# Version, disk usage, address, public key, balance
# PID, RAM, uptime, file descriptors, TCP ports/connections
# Peers, height
# BPoS state (name, status, staked, votes, unclaimed rewards)
# Elastos Council state (name, status)
RPC Health Checks
ELA main chain:
# Current block height
node.sh ela jsonrpc getcurrentheight
# Best block hash
node.sh ela jsonrpc getbestblockhash
# Connection count
node.sh ela jsonrpc getconnectioncount
# Node state (comprehensive)
node.sh ela jsonrpc nodestate
# Memory pool info
node.sh ela jsonrpc getrawmempool
ESC sidechain:
# Block number
curl -s -X POST http://127.0.0.1:20636 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
# Peer count
curl -s -X POST http://127.0.0.1:20636 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}'
# Syncing status
curl -s -X POST http://127.0.0.1:20636 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
Alerting with Elastos.ELA.Monitor
The official monitoring tool is a Python-based cron job from github.com/elastos/Elastos.ELA.Monitor:
Checks performed:
- Block height stalls (no new blocks in N minutes)
- Producer status changes
- Mempool size thresholds
- Peer count minimums
Alerts: Sent via SMTP email.
Setup:
git clone https://github.com/elastos/Elastos.ELA.Monitor.git
cd Elastos.ELA.Monitor
# Configure monitoring targets and SMTP settings
cp config.example.json config.json
# Edit config.json with your ELA RPC details and email settings
# Install as cron job (run every 5 minutes)
crontab -e
# Add: */5 * * * * cd /path/to/Elastos.ELA.Monitor && python3 monitor.py
Custom Monitoring Scripts
Block height monitor (bash):
#!/bin/bash
PREV_HEIGHT_FILE="/tmp/ela_height"
ALERT_EMAIL="ops@yourdomain.com"
CURRENT_HEIGHT=$(node.sh ela jsonrpc getcurrentheight 2>/dev/null | jq -r '.result')
if [ -f "$PREV_HEIGHT_FILE" ]; then
PREV_HEIGHT=$(cat "$PREV_HEIGHT_FILE")
if [ "$CURRENT_HEIGHT" = "$PREV_HEIGHT" ]; then
echo "ELA block height stalled at $CURRENT_HEIGHT" | \
mail -s "ALERT: ELA Sync Stalled" "$ALERT_EMAIL"
fi
fi
echo "$CURRENT_HEIGHT" > "$PREV_HEIGHT_FILE"
Process watchdog (bash):
#!/bin/bash
COMPONENTS=("ela" "esc" "eid" "arbiter")
ALERT_EMAIL="ops@yourdomain.com"
for comp in "${COMPONENTS[@]}"; do
if ! pgrep -x "$comp" > /dev/null 2>&1; then
echo "$comp process is not running on $(hostname)" | \
mail -s "ALERT: $comp DOWN on $(hostname)" "$ALERT_EMAIL"
# Attempt auto-restart
node.sh "$comp" start
fi
done
Disk usage monitor (bash):
#!/bin/bash
THRESHOLD=85
ALERT_EMAIL="ops@yourdomain.com"
USAGE=$(df ~/node --output=pcent | tail -1 | tr -d ' %')
if [ "$USAGE" -gt "$THRESHOLD" ]; then
echo "Disk usage at ${USAGE}% on $(hostname). Node directory: ~/node" | \
mail -s "ALERT: Disk ${USAGE}% on $(hostname)" "$ALERT_EMAIL"
fi
Install all three as cron jobs:
crontab -e
# Add:
*/5 * * * * /home/elastos/scripts/monitor_height.sh
*/2 * * * * /home/elastos/scripts/monitor_process.sh
*/30 * * * * /home/elastos/scripts/monitor_disk.sh
Prometheus & Grafana Integration
For production environments, export metrics to Prometheus. ESC/EID (geth forks) have built-in metrics support:
# Start ESC with metrics enabled
./esc --datadir data --metrics --metrics.addr 127.0.0.1 --metrics.port 6060 ...
For ELA main chain, create a custom exporter that polls RPC and exposes Prometheus metrics:
# prometheus.yml scrape config
scrape_configs:
- job_name: 'elastos-ela'
static_configs:
- targets: ['localhost:9101']
scrape_interval: 30s
- job_name: 'elastos-esc'
static_configs:
- targets: ['localhost:6060']
scrape_interval: 15s