server-management

Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.

View Source
name:server-managementdescription:Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.allowed-tools:Read, Write, Edit, Glob, Grep, Bash

Server Management

> Server management principles for production operations.
> Learn to THINK, not memorize commands.


1. Process Management Principles

Tool Selection

ScenarioTool
Node.js appPM2 (clustering, reload)
Any appsystemd (Linux native)
ContainersDocker/Podman
OrchestrationKubernetes, Docker Swarm

Process Management Goals

GoalWhat It Means
Restart on crashAuto-recovery
Zero-downtime reloadNo service interruption
ClusteringUse all CPU cores
PersistenceSurvive server reboot


2. Monitoring Principles

What to Monitor

CategoryKey Metrics
AvailabilityUptime, health checks
PerformanceResponse time, throughput
ErrorsError rate, types
ResourcesCPU, memory, disk

Alert Severity Strategy

LevelResponse
CriticalImmediate action
WarningInvestigate soon
InfoReview daily

Monitoring Tool Selection

NeedOptions
Simple/FreePM2 metrics, htop
Full observabilityGrafana, Datadog
Error trackingSentry
UptimeUptimeRobot, Pingdom


3. Log Management Principles

Log Strategy

Log TypePurpose
Application logsDebug, audit
Access logsTraffic analysis
Error logsIssue detection

Log Principles

  • Rotate logs to prevent disk fill

  • Structured logging (JSON) for parsing

  • Appropriate levels (error/warn/info/debug)

  • No sensitive data in logs

  • 4. Scaling Decisions

    When to Scale

    SymptomSolution
    High CPUAdd instances (horizontal)
    High memoryIncrease RAM or fix leak
    Slow responseProfile first, then scale
    Traffic spikesAuto-scaling

    Scaling Strategy

    TypeWhen to Use
    VerticalQuick fix, single instance
    HorizontalSustainable, distributed
    AutoVariable traffic


    5. Health Check Principles

    What Constitutes Healthy

    CheckMeaning
    HTTP 200Service responding
    Database connectedData accessible
    Dependencies OKExternal services reachable
    Resources OKCPU/memory not exhausted

    Health Check Implementation

  • Simple: Just return 200

  • Deep: Check all dependencies

  • Choose based on load balancer needs

  • 6. Security Principles

    AreaPrinciple
    AccessSSH keys only, no passwords
    FirewallOnly needed ports open
    UpdatesRegular security patches
    SecretsEnvironment vars, not files
    AuditLog access and changes


    7. Troubleshooting Priority

    When something's wrong:

  • Check if running (process status)

  • Check logs (error messages)

  • Check resources (disk, memory, CPU)

  • Check network (ports, DNS)

  • Check dependencies (database, APIs)

  • 8. Anti-Patterns

    ❌ Don't✅ Do
    Run as rootUse non-root user
    Ignore logsSet up log rotation
    Skip monitoringMonitor from day one
    Manual restartsAuto-restart config
    No backupsRegular backup schedule


    > Remember: A well-managed server is boring. That's the goal.

      server-management - Agent Skills