system design

System Design: WebSocket Protocol Deep Dive

#System Design#Networking#WebSocket#Real-Time#Golang

A deep dive into the WebSocket protocol, covering its handshake, framing, and how it enables persistent, full-duplex communication between clients and servers.

System Design: WebSocket Protocol Deep Dive

Traditional web communication, based on the HTTP request-response model, is client-initiated. The client asks, and the server answers. This works perfectly for browsing documents but falls short for real-time applications like chat apps, live sports updates, or collaborative editing tools, which require the server to push data to the client without being asked.

Techniques like long polling were created as workarounds, but they are inefficient. The WebSocket protocol was designed specifically to solve this problem, providing a persistent, full-duplex communication channel over a single TCP connection.

What is Full-Duplex Communication?

Full-duplex means that both the client and the server can send messages to each other independently and simultaneously, just like a natural phone conversation. This is a major departure from the half-duplex nature of HTTP's request-response cycle.

graph TD subgraph HTTP Request-Response (Half-Duplex) Client -- 1. Request --> Server Server -- 2. Response --> Client Client -- 3. New Request --> Server Server -- 4. New Response --> Client end subgraph WebSocket (Full-Duplex) Client -- "Connection" --> Server Client -- "Messages" --> Server Server -- "Messages" --> Client end

The WebSocket Handshake: Upgrading from HTTP

A WebSocket connection starts its life as a standard HTTP request. This clever design allows WebSocket traffic to pass through existing web infrastructure (like proxies and firewalls) on standard ports (80 and 443).

The client sends an HTTP GET request with two special headers:

  1. Upgrade: websocket: This signals the client's intent to switch protocols.
  2. Connection: Upgrade: This informs the server that the Upgrade header is present.
  3. Sec-WebSocket-Key: A randomly generated value that is used to prove that the server understands the WebSocket protocol.

Client Request:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

If the server supports WebSockets, it responds with a special 101 Switching Protocols status code. It also computes a response key based on the client's key to confirm the handshake.

Server Response:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The Sec-WebSocket-Accept value is derived by concatenating the client's Sec-WebSocket-Key with a standard, globally unique ID ("258EAFA5-E914-47DA-95CA-C5AB0DC85B11"), taking the SHA-1 hash of the result, and then Base64 encoding it. This prevents caching proxies from misinterpreting the handshake.

Once this handshake is complete, the underlying TCP connection is no longer used for HTTP. It becomes a persistent, bidirectional WebSocket connection.

WebSocket Framing: How Data is Sent

After the handshake, all data is transmitted as frames. A frame is a small chunk of data with a header that describes it. This framing mechanism allows the client and server to send messages of various types and sizes over the same connection without confusion.

A simplified WebSocket frame consists of:

  • FIN bit: Is this the final frame of a message? (A single message can be split across multiple frames).
  • Opcode: What type of data is this? (e.g., text, binary, ping, pong, close).
  • Payload Length: The size of the data in this frame.
  • Payload Data: The actual application data.
graph LR Frame --> FIN[FIN Bit] Frame --> Opcode[Opcode] Frame --> Mask[Mask Bit] Frame --> Len[Payload Length] Frame --> Key[Masking Key (if Mask=1)] Frame --> Data[Payload Data] style FIN fill:#f9f style Opcode fill:#ccf
  • Text vs. Binary: The opcode specifies whether the payload is UTF-8 text or raw binary data.
  • Control Frames: Opcodes are also used for control signals. Ping and Pong frames act as a heartbeat to keep the connection alive and check its status. The Close frame initiates a graceful shutdown of the connection.
  • Masking: All frames sent from the client to the server must be masked. This is a security measure to prevent cache poisoning attacks on intermediate proxies. The server does not mask its frames.

Go Example: A Simple Echo Server and Client

Let's build a WebSocket server that echoes back any message it receives, and a client to connect to it. We'll use the popular gorilla/websocket library.

1. Install the library: go get github.com/gorilla/websocket

2. The WebSocket Server (server.go):

package main

import (
	"log"
	"net/http"

	"github.com/gorilla/websocket"
)

// The Upgrader converts an HTTP connection to a WebSocket connection.
var upgrader = websocket.Upgrader{
	ReadBufferSize:  1024,
	WriteBufferSize: 1024,
	// We can add a CheckOrigin function for security in production.
	CheckOrigin: func(r *http.Request) bool { return true },
}

// echo handles WebSocket requests from the peer.
func echo(w http.ResponseWriter, r *http.Request) {
	// Upgrade the HTTP connection to a WebSocket connection.
	conn, err := upgrader.Upgrade(w, r, nil)
	if err != nil {
		log.Println("Upgrade error:", err)
		return
	}
	defer conn.Close()
	log.Println("Client connected")

	// Infinite loop to read messages from the client.
	for {
		// ReadMessage blocks until a message is received.
		messageType, p, err := conn.ReadMessage()
		if err != nil {
			log.Println("Read error:", err)
			break // Exit loop on error (e.g., client disconnects).
		}

		log.Printf("Received message: %s", p)

		// Write the message back to the client.
		if err := conn.WriteMessage(messageType, p); err != nil {
			log.Println("Write error:", err)
			break
		}
	}
	log.Println("Client disconnected")
}

func main() {
	http.HandleFunc("/echo", echo)
	log.Println("Server starting on :8080")
	log.Fatal(http.ListenAndServe(":8080", nil))
}

3. The WebSocket Client (client.go):

package main

import (
	"log"
	"os"
	"os/signal"
	"time"

	"github.com/gorilla/websocket"
)

func main() {
	// Handle interrupt signal to gracefully close the connection.
	interrupt := make(chan os.Signal, 1)
	signal.Notify(interrupt, os.Interrupt)

	// Connect to the server.
	url := "ws://localhost:8080/echo"
	log.Printf("Connecting to %s", url)

	conn, _, err := websocket.DefaultDialer.Dial(url, nil)
	if err != nil {
		log.Fatal("Dial error:", err)
	}
	defer conn.Close()

	done := make(chan struct{})

	// Goroutine to read messages from the server.
	go func() {
		defer close(done)
		for {
			_, message, err := conn.ReadMessage()
			if err != nil {
				log.Println("Read error:", err)
				return
			}
			log.Printf("Received from server: %s", message)
		}
	}()

	// Send a message every second.
	ticker := time.NewTicker(1 * time.Second)
	defer ticker.Stop()

	for {
		select {
		case <-done:
			return
		case t := <-ticker.C:
			// Send a text message.
			err := conn.WriteMessage(websocket.TextMessage, []byte(t.String()))
			if err != nil {
				log.Println("Write error:", err)
				return
			}
		case <-interrupt:
			log.Println("Interrupt received, closing connection.")
			// Send a Close frame and wait for the server to close the connection.
			err := conn.WriteMessage(websocket.CloseMessage, websocket.FormatCloseMessage(websocket.CloseNormalClosure, ""))
			if err != nil {
				log.Println("Write close error:", err)
				return
			}
			select {
			case <-done:
			case <-time.After(time.Second):
			}
			return
		}
	}
}

To run this:

  1. Run the server: go run server.go
  2. In a new terminal, run the client: go run client.go
  3. You will see the client sending timestamps and the server echoing them back.

When to Use WebSockets

WebSockets are ideal for applications that require low-latency, bidirectional communication.

  • Real-Time Applications: Chat rooms, multiplayer games, live score tickers.
  • Collaborative Tools: Collaborative document editing (like Google Docs), whiteboarding applications.
  • Financial Data: Real-time stock market data feeds.
  • Monitoring: Live dashboards and server monitoring tools.

Conclusion

The WebSocket protocol fundamentally changes the client-server communication model for the web. By "upgrading" an HTTP connection, it provides a persistent, low-latency, full-duplex channel that enables a new class of interactive and real-time applications. Its standardized handshake and framing make it a robust and efficient solution for any scenario where the server needs to push data to the client proactively.