Monitoring & Alerting

What to Monitor

Metric	Why	Alert Threshold
Block height	Detect sync stalls	No increase in 30 minutes
Peer count	Network connectivity	Below 3 peers
Disk usage	Prevent full disk	Above 85% capacity
RAM usage	Detect memory leaks	Above 90% capacity
Process status	Detect crashes	Process not running
BPoS status	Detect penalties	Status changed from "Active"
Mempool size	Detect spam/congestion	Above 10,000 transactions
Block timestamp	Detect clock drift	More than 5 minutes behind
Chain data growth rate	Capacity planning	N/A (informational)

Built-In Status Commands

# Full status of all chains
node.sh status

# Per-chain status (shows 18 metrics for ELA)
node.sh ela status

# Output includes:
#   Version, disk usage, address, public key, balance
#   PID, RAM, uptime, file descriptors, TCP ports/connections
#   Peers, height
#   BPoS state (name, status, staked, votes, unclaimed rewards)
#   Elastos Council state (name, status)

RPC Health Checks

ELA main chain:

# Current block height
node.sh ela jsonrpc getcurrentheight

# Best block hash
node.sh ela jsonrpc getbestblockhash

# Connection count
node.sh ela jsonrpc getconnectioncount

# Node state (comprehensive)
node.sh ela jsonrpc nodestate

# Memory pool info
node.sh ela jsonrpc getrawmempool

ESC sidechain:

# Block number
curl -s -X POST http://127.0.0.1:20636 \
    -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'

# Peer count
curl -s -X POST http://127.0.0.1:20636 \
    -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}'

# Syncing status
curl -s -X POST http://127.0.0.1:20636 \
    -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'

Alerting with Elastos.ELA.Monitor

The official monitoring tool is a Python-based cron job from github.com/elastos/Elastos.ELA.Monitor:

Checks performed:

Block height stalls (no new blocks in N minutes)
Producer status changes
Mempool size thresholds
Peer count minimums

Alerts: Sent via SMTP email.

Setup:

git clone https://github.com/elastos/Elastos.ELA.Monitor.git
cd Elastos.ELA.Monitor

# Configure monitoring targets and SMTP settings
cp config.example.json config.json
# Edit config.json with your ELA RPC details and email settings

# Install as cron job (run every 5 minutes)
crontab -e
# Add: */5 * * * * cd /path/to/Elastos.ELA.Monitor && python3 monitor.py

Custom Monitoring Scripts

Block height monitor (bash):

#!/bin/bash
PREV_HEIGHT_FILE="/tmp/ela_height"
ALERT_EMAIL="ops@yourdomain.com"

CURRENT_HEIGHT=$(node.sh ela jsonrpc getcurrentheight 2>/dev/null | jq -r '.result')

if [ -f "$PREV_HEIGHT_FILE" ]; then
    PREV_HEIGHT=$(cat "$PREV_HEIGHT_FILE")
    if [ "$CURRENT_HEIGHT" = "$PREV_HEIGHT" ]; then
        echo "ELA block height stalled at $CURRENT_HEIGHT" | \
            mail -s "ALERT: ELA Sync Stalled" "$ALERT_EMAIL"
    fi
fi

echo "$CURRENT_HEIGHT" > "$PREV_HEIGHT_FILE"

Process watchdog (bash):

#!/bin/bash
COMPONENTS=("ela" "esc" "eid" "arbiter")
ALERT_EMAIL="ops@yourdomain.com"

for comp in "${COMPONENTS[@]}"; do
    if ! pgrep -x "$comp" > /dev/null 2>&1; then
        echo "$comp process is not running on $(hostname)" | \
            mail -s "ALERT: $comp DOWN on $(hostname)" "$ALERT_EMAIL"

        # Attempt auto-restart
        node.sh "$comp" start
    fi
done

Disk usage monitor (bash):

#!/bin/bash
THRESHOLD=85
ALERT_EMAIL="ops@yourdomain.com"

USAGE=$(df ~/node --output=pcent | tail -1 | tr -d ' %')

if [ "$USAGE" -gt "$THRESHOLD" ]; then
    echo "Disk usage at ${USAGE}% on $(hostname). Node directory: ~/node" | \
        mail -s "ALERT: Disk ${USAGE}% on $(hostname)" "$ALERT_EMAIL"
fi

Install all three as cron jobs:

crontab -e
# Add:
*/5 * * * *  /home/elastos/scripts/monitor_height.sh
*/2 * * * *  /home/elastos/scripts/monitor_process.sh
*/30 * * * * /home/elastos/scripts/monitor_disk.sh

Prometheus & Grafana Integration

For production environments, export metrics to Prometheus. ESC/EID (geth forks) have built-in metrics support:

# Start ESC with metrics enabled
./esc --datadir data --metrics --metrics.addr 127.0.0.1 --metrics.port 6060 ...

For ELA main chain, create a custom exporter that polls RPC and exposes Prometheus metrics:

# prometheus.yml scrape config
scrape_configs:
  - job_name: 'elastos-ela'
    static_configs:
      - targets: ['localhost:9101']
    scrape_interval: 30s

  - job_name: 'elastos-esc'
    static_configs:
      - targets: ['localhost:6060']
    scrape_interval: 15s

What to Monitor​

Built-In Status Commands​

RPC Health Checks​

Alerting with Elastos.ELA.Monitor​

Custom Monitoring Scripts​

Prometheus & Grafana Integration​