Skip to main content

WebSocket Error Handling

SDD Classification: L3-Technical Authority: Engineering Team Review Cycle: Quarterly
This guide covers error handling for WebSocket connections, including error codes, recovery strategies, and best practices for building resilient real-time applications.

Error Message Format

All WebSocket errors follow a consistent structure:
{
  "type": "system",
  "event": "error",
  "data": {
    "error_code": "INSUFFICIENT_PERMISSIONS",
    "message": "You do not have write permissions for this document",
    "details": {
      "required_permission": "document:write",
      "user_permissions": ["document:read"]
    },
    "recoverable": false,
    "timestamp": "2025-01-07T10:30:00Z"
  }
}

Error Fields

FieldTypeDescription
error_codestringMachine-readable error identifier
messagestringHuman-readable description
detailsobjectAdditional context (optional)
recoverablebooleanWhether the error can be recovered from
timestampstringWhen the error occurred

Error Code Reference

Connection Errors

CodeDescriptionRecovery
CONNECTION_FAILEDFailed to establish connectionRetry with backoff
CONNECTION_TIMEOUTConnection timed outRetry with backoff
CONNECTION_LIMIT_EXCEEDEDMax connections reachedWait or close other connections
PROTOCOL_ERRORInvalid WebSocket protocolCheck client implementation

Authentication Errors

CodeDescriptionRecovery
AUTHENTICATION_FAILEDInvalid or missing tokenRe-authenticate
TOKEN_EXPIREDJWT token has expiredRefresh token
TOKEN_REVOKEDToken was revokedRe-authenticate
INSUFFICIENT_PERMISSIONSUser lacks required accessContact admin

Document Errors

CodeDescriptionRecovery
DOCUMENT_NOT_FOUNDDocument doesn’t existVerify document ID
DOCUMENT_LOCKEDDocument is lockedWait and retry
DOCUMENT_DELETEDDocument was deletedClose connection
VERSION_CONFLICTOperation version mismatchRequest sync

Operation Errors

CodeDescriptionRecovery
INVALID_OPERATIONMalformed operationFix operation data
OPERATION_REJECTEDServer rejected operationCheck permissions
OPERATION_TIMEOUTOperation timed outRetry operation
RATE_LIMIT_EXCEEDEDToo many operationsWait and retry

System Errors

CodeDescriptionRecovery
INTERNAL_ERRORServer errorRetry with backoff
SERVICE_UNAVAILABLEService temporarily downRetry later
MAINTENANCE_MODEServer in maintenanceWait for service

Error Handler Implementation

class WebSocketErrorHandler {
  constructor(websocket, options = {}) {
    this.ws = websocket;
    this.maxRetries = options.maxRetries || 5;
    this.errorCallbacks = new Map();
    this.errorCount = 0;
    this.errorWindow = 60000; // 1 minute
    this.errorTimestamps = [];
  }

  handleError(errorMessage) {
    const { error_code, message, recoverable, details } = errorMessage.data;

    // Track error rate
    this.trackError();

    // Check for error storm
    if (this.isErrorStorm()) {
      this.handleErrorStorm();
      return;
    }

    // Log error
    console.error(`WebSocket Error [${error_code}]: ${message}`, details);

    // Call registered handler
    const handler = this.errorCallbacks.get(error_code);
    if (handler) {
      handler(errorMessage.data);
      return;
    }

    // Default handling based on recoverability
    if (recoverable) {
      this.handleRecoverableError(errorMessage.data);
    } else {
      this.handleFatalError(errorMessage.data);
    }
  }

  trackError() {
    const now = Date.now();
    this.errorTimestamps.push(now);
    this.errorTimestamps = this.errorTimestamps.filter(
      ts => now - ts < this.errorWindow
    );
    this.errorCount = this.errorTimestamps.length;
  }

  isErrorStorm() {
    return this.errorCount > 10;
  }

  handleErrorStorm() {
    console.error('Error storm detected, backing off');
    this.ws.close(4999, 'Error storm');
    this.emit('error_storm');
  }

  on(errorCode, handler) {
    this.errorCallbacks.set(errorCode, handler);
  }

  handleRecoverableError(error) {
    switch (error.error_code) {
      case 'VERSION_CONFLICT':
        this.requestDocumentSync();
        break;

      case 'RATE_LIMIT_EXCEEDED':
        this.backoffAndRetry(error.details?.retry_after || 1000);
        break;

      case 'OPERATION_TIMEOUT':
        this.retryLastOperation();
        break;

      default:
        this.emit('recoverable_error', error);
    }
  }

  handleFatalError(error) {
    switch (error.error_code) {
      case 'AUTHENTICATION_FAILED':
      case 'TOKEN_EXPIRED':
        this.emit('auth_error', error);
        break;

      case 'INSUFFICIENT_PERMISSIONS':
        this.emit('permission_error', error);
        break;

      case 'DOCUMENT_DELETED':
        this.emit('document_deleted', error);
        this.ws.close(1000, 'Document deleted');
        break;

      default:
        this.emit('fatal_error', error);
    }
  }

  requestDocumentSync() {
    this.ws.send(JSON.stringify({
      type: 'system',
      event: 'sync_request',
      data: {}
    }));
  }

  backoffAndRetry(delay) {
    setTimeout(() => {
      this.emit('retry_ready');
    }, delay);
  }
}

Specific Error Handling

Authentication Errors

class AuthErrorHandler {
  constructor(websocket, authService) {
    this.ws = websocket;
    this.authService = authService;
    this.refreshAttempts = 0;
    this.maxRefreshAttempts = 3;
  }

  async handleAuthError(error) {
    if (error.error_code === 'TOKEN_EXPIRED' && this.refreshAttempts < this.maxRefreshAttempts) {
      await this.attemptTokenRefresh();
    } else {
      this.redirectToLogin(error);
    }
  }

  async attemptTokenRefresh() {
    this.refreshAttempts++;

    try {
      const newToken = await this.authService.refreshToken();
      this.ws.reconnectWithToken(newToken);
      this.refreshAttempts = 0;
    } catch (refreshError) {
      if (this.refreshAttempts < this.maxRefreshAttempts) {
        // Wait and retry
        await this.delay(1000 * this.refreshAttempts);
        await this.attemptTokenRefresh();
      } else {
        this.redirectToLogin({ error_code: 'REFRESH_FAILED' });
      }
    }
  }

  redirectToLogin(error) {
    // Store current document for post-login redirect
    sessionStorage.setItem('returnTo', window.location.href);

    // Notify user
    this.showNotification('Session expired. Please log in again.');

    // Redirect
    window.location.href = '/login';
  }
}

Version Conflict Recovery

class ConflictRecoveryHandler {
  constructor(websocket, documentManager) {
    this.ws = websocket;
    this.documentManager = documentManager;
  }

  async handleVersionConflict(error) {
    const { client_version, server_version } = error.details;

    console.warn(`Version conflict: client=${client_version}, server=${server_version}`);

    // Request full document sync
    const syncData = await this.requestSync();

    // Apply server state
    this.documentManager.applyServerState(syncData.document);

    // Re-apply pending operations
    const pendingOps = this.documentManager.getPendingOperations();

    for (const op of pendingOps) {
      // Transform against new state
      const transformed = this.documentManager.transformOperation(op, syncData.operations);
      this.documentManager.sendOperation(transformed);
    }

    this.emit('conflict_resolved', {
      operationsReplayed: pendingOps.length
    });
  }

  async requestSync() {
    return new Promise((resolve, reject) => {
      const timeout = setTimeout(() => {
        reject(new Error('Sync request timeout'));
      }, 5000);

      this.ws.once('document_sync', (data) => {
        clearTimeout(timeout);
        resolve(data);
      });

      this.ws.send(JSON.stringify({
        type: 'system',
        event: 'sync_request',
        data: {
          last_known_version: this.documentManager.getVersion()
        }
      }));
    });
  }
}

Rate Limit Handling

class RateLimitHandler {
  constructor() {
    this.operationQueue = [];
    this.isThrottled = false;
    this.throttleUntil = 0;
  }

  handleRateLimit(error) {
    const retryAfter = error.details?.retry_after || 1000;

    this.isThrottled = true;
    this.throttleUntil = Date.now() + retryAfter;

    console.warn(`Rate limited. Retry after ${retryAfter}ms`);

    // Queue subsequent operations
    setTimeout(() => {
      this.isThrottled = false;
      this.flushQueue();
    }, retryAfter);
  }

  queueOperation(operation) {
    if (this.isThrottled) {
      this.operationQueue.push(operation);
      return false;
    }
    return true;
  }

  flushQueue() {
    const ops = this.operationQueue.splice(0);
    ops.forEach(op => this.sendOperation(op));
  }
}

Connection Close Codes

Standard Codes

CodeNameDescription
1000Normal ClosureClean close
1001Going AwayNavigation or server shutdown
1002Protocol ErrorProtocol violation
1003Unsupported DataUnexpected data type
1006Abnormal ClosureNo close frame received
1011Internal ErrorServer error

Custom Codes

CodeNameDescription
4000Heartbeat TimeoutNo pong received
4001Auth ExpiredToken expired
4002Permission DeniedLost access
4003Document LockedDocument unavailable
4004KickedRemoved by admin
4999Error StormToo many errors

Close Code Handler

function handleCloseCode(code, reason) {
  switch (code) {
    case 1000:
      // Normal closure, no action needed
      break;

    case 1001:
      // Server going away, attempt reconnect
      scheduleReconnect();
      break;

    case 1006:
      // Abnormal closure, network issue
      scheduleReconnect();
      break;

    case 4000:
      // Heartbeat timeout, reconnect immediately
      reconnect();
      break;

    case 4001:
      // Auth expired, refresh token first
      refreshTokenAndReconnect();
      break;

    case 4002:
      // Permission denied, notify user
      showPermissionDeniedMessage();
      break;

    case 4003:
      // Document locked, show status
      showDocumentLockedMessage();
      scheduleReconnect(30000); // Try again in 30s
      break;

    case 4004:
      // Kicked by admin
      showKickedMessage();
      // Don't reconnect
      break;

    default:
      console.warn(`Unknown close code: ${code} - ${reason}`);
      scheduleReconnect();
  }
}

Error Recovery Strategies

Exponential Backoff

class BackoffStrategy {
  constructor(options = {}) {
    this.baseDelay = options.baseDelay || 1000;
    this.maxDelay = options.maxDelay || 30000;
    this.maxAttempts = options.maxAttempts || 10;
    this.jitter = options.jitter || true;
    this.attempts = 0;
  }

  getNextDelay() {
    const exponentialDelay = Math.min(
      this.baseDelay * Math.pow(2, this.attempts),
      this.maxDelay
    );

    const delay = this.jitter
      ? exponentialDelay + Math.random() * 1000
      : exponentialDelay;

    this.attempts++;
    return delay;
  }

  shouldRetry() {
    return this.attempts < this.maxAttempts;
  }

  reset() {
    this.attempts = 0;
  }
}

Circuit Breaker

class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 60000;
    this.failures = 0;
    this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    this.lastFailure = null;
  }

  recordFailure() {
    this.failures++;
    this.lastFailure = Date.now();

    if (this.failures >= this.failureThreshold) {
      this.state = 'OPEN';
      this.scheduleReset();
    }
  }

  recordSuccess() {
    this.failures = 0;
    this.state = 'CLOSED';
  }

  scheduleReset() {
    setTimeout(() => {
      this.state = 'HALF_OPEN';
    }, this.resetTimeout);
  }

  canExecute() {
    if (this.state === 'CLOSED') return true;
    if (this.state === 'OPEN') return false;
    if (this.state === 'HALF_OPEN') return true; // Allow test request
    return false;
  }
}

User-Facing Error Messages

Error Message Mapping

const errorMessages = {
  AUTHENTICATION_FAILED: {
    title: 'Session Expired',
    message: 'Please log in again to continue editing.',
    action: 'Log In'
  },
  INSUFFICIENT_PERMISSIONS: {
    title: 'Access Denied',
    message: 'You no longer have permission to edit this document.',
    action: 'Request Access'
  },
  DOCUMENT_LOCKED: {
    title: 'Document Locked',
    message: 'This document is temporarily unavailable. Please try again shortly.',
    action: 'Retry'
  },
  RATE_LIMIT_EXCEEDED: {
    title: 'Slow Down',
    message: 'You\'re making changes too quickly. Please wait a moment.',
    action: null
  },
  CONNECTION_FAILED: {
    title: 'Connection Lost',
    message: 'Unable to connect to the server. Retrying...',
    action: 'Retry Now'
  },
  INTERNAL_ERROR: {
    title: 'Something Went Wrong',
    message: 'An unexpected error occurred. Our team has been notified.',
    action: 'Reload'
  }
};

function showUserError(errorCode) {
  const errorInfo = errorMessages[errorCode] || errorMessages.INTERNAL_ERROR;

  showNotification({
    type: 'error',
    title: errorInfo.title,
    message: errorInfo.message,
    action: errorInfo.action
  });
}

Best Practices

Do

  1. Log all errors - For debugging and monitoring
  2. Show user-friendly messages - Don’t expose technical details
  3. Implement retry logic - Most errors are transient
  4. Use circuit breakers - Prevent cascade failures
  5. Track error metrics - Monitor error rates

Don’t

  1. Don’t ignore errors - Always handle them appropriately
  2. Don’t retry indefinitely - Set maximum attempts
  3. Don’t expose internal errors - Hide technical details from users
  4. Don’t block the UI - Handle errors asynchronously
  5. Don’t lose user data - Queue operations during errors


Document Status: Complete Version: 2.0