Structured Logging Explained: Levels, Examples, and Best Practices

Debugging with bad logs is like investigating a crime without CCTV. Structured logs give you the clarity and evidence you need when systems misbehave.

Structured Logging Explained: Levels, Examples, and Best Practices

Logging is often treated as a mundane detail. Engineers throw in a few console.log statements, hope for the best, and move on. Until, of course, something breaks in production at 3 AM and everyone scrambles through a haystack of logs to find the needle of truth.

Logs are the primary evidence of what happened in your system. Done right, they help reconstruct events, diagnose issues, and even prevent failures from recurring. Done poorly, they're just noise.

What is Logging?

Logging, at its simplest, is the practice of recording information about what a system is doing at a given point in time. Think of it as a diary entry for your application: every significant event, action, or error is written down.

Logs can capture:

  • API requests and responses
  • Errors or exceptions
  • Performance metrics like execution time
  • Security events like authentication or access failures

Without logging, your system is like wandering in the dark. When something goes wrong, you have no breadcrumbs to follow and no way to retrace what happened.

What is Structured Logging?

Traditional or unstructured logs are just lines of free text:

User logged in successfully.
Error: Connection timeout at 10:42pm.

These lines might be easy for humans to read but are difficult for machines to parse, search, or analyze. Imagine trying to filter every error log across thousands of servers using only grep-it quickly becomes a nightmare.

Structured logs, on the other hand, use a defined format, usually JSON, with consistent fields. Example:

{
  "timestamp": "2025-08-18T12:42:00Z",
  "level": "error",
  "service": "auth-service",
  "message": "Connection timeout",
  "userId": "12345",
  "traceId": "abc-xyz-789"
}

The advantages are clear:

  • Machine-readable and easy to index
  • Searchable by fields like userId or traceId
  • Correlatable with metrics and traces

Structured logging is the foundation of serious observability.

Log Levels Explained

A log level indicates the severity or importance of an event. Here's how to think about them using the Tokyo Metro as an analogy:

1. Debug
The most detailed logs, used mainly by developers to see what’s happening inside the system. Too noisy for everyday use.
Example: Before departure, Metro staff run brake pressure tests and sensor diagnostics. Passengers never hear about it, but engineers rely on it.

2. Info
Regular updates about normal operations. Gives context but doesn’t signal problems.
Example: “Train will arrive at Shibuya in 2 minutes.” A standard update, nothing unusual.

3. Warning
Alerts that something may go wrong. Not a failure yet, but it needs attention.
Example: “Due to heavy rain, trains may be delayed.” It’s a heads-up about possible disruption.

4. Error
A failure that affects operations but doesn’t bring the entire system down. Users feel the impact.
Example: “Train delayed due to signal failure.” The service is disrupted, and people are waiting.

5. Fatal
A severe failure that halts the system entirely. Demands immediate response.
Example: “Entire line suspended due to earthquake.” This isn’t a delay, it’s a complete shutdown.

Dev vs Prod Logs

Not all environments need the same level of verbosity.

  • Development Logs: Can be noisy. More details, stack traces, experimental logging statements. The goal is to give developers visibility into every corner of the system.
  • Production Logs: Must be leaner and cleaner. They should provide enough detail for troubleshooting but avoid overwhelming storage or alert systems.

A good rule: log everything in dev, log what matters in prod.

Logging in Node.js: Winston and Pino

Logging in modern applications isn’t just about printing to the console. As systems scale, logs need to be structured, contextual, and often streamed to external systems. In the Node.js ecosystem, two libraries stand out:

  • Winston: A mature and flexible logging library. It supports multiple transports (console, files, databases, cloud services) and can format logs in different ways. Perfect when you want fine-grained control over how and where logs are written.
  • Pino: A high-performance logger designed for speed. It outputs structured logs by default in JSON, making it ideal for production-grade APIs where throughput matters more than fancy configuration.

Both libraries enforce the discipline of structured logging and integrate easily with centralized log systems.

Setting up Winston

const winston = require('winston');

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'errors.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' })
  ]
});

logger.info('Server started on port 3000');
logger.error('Database connection failed', { traceId: 'abc-xyz-789' });

This simple setup writes structured logs to both console and files, separates errors for easier debugging, and creates a combined log for everything else. With just a few lines of configuration, you move from scattered console.log chaos to production-ready logging.

Best Practices for Logging

  1. Always Use Structured Logs: Plain text is human-friendly but useless at scale. Adopt JSON or another structured format from the start.
  2. Include Contextual Data: Add request IDs, user IDs, and trace IDs. These become essential when debugging distributed systems.
  3. Log at the Right Level: Avoid logging everything at error. Use levels correctly so alerts and dashboards aren't polluted.
  4. Don't Log Sensitive Data: Be mindful of privacy and compliance. Never log passwords, tokens, or personally identifiable information.
  5. Standardize Formats Across Services: Consistency ensures that centralized log queries work seamlessly.
  6. Rotate and Retain: Implement log rotation and retention policies. Keep error logs longer, but clear out debug logs quickly.
  7. Integrate with Alerts: Pipe critical logs into your alerting system. Logs should not just sit idle; they should trigger action when necessary.

Common Pitfalls in Logging

  1. Over-Logging: Dumping every detail floods the system and makes the important events harder to find. This also drives up storage and cloud costs.
  2. Under-Logging: Too few logs make incidents impossible to diagnose. A missing log statement at the right place can add hours to debugging.
  3. Inconsistent Log Levels: Different teams logging similar events at different levels creates confusion.
  4. Lack of Correlation: Without trace IDs, logs are isolated lines of text. Distributed systems require correlation.
  5. Ignoring Performance Impact: Synchronous logging can block performance-critical threads. Always consider async or buffered logging for production.
  6. No Review Process: Logging should be periodically reviewed, just like code. Remove outdated or redundant log lines.

What's Next: Centralizing Logs

So far, I've covered the basics of logging, the importance of structured logs, log levels, and how to implement them with tools like Winston and Pino. But as systems grow, logs scattered across servers and services lose their value unless they're aggregated in one place.

That's where centralizing logs comes in. In the next article, I'll explore solutions like ELK, Loki + Grafana, and Cloud-native logging (AWS CloudWatch, GCP Cloud Logging) in depth-covering their advantages, tradeoffs, and best practices.

The Takeaway

Logging is more than a debugging tool-it's the foundation for reliability and observability. Without structured logs, metrics and traces lose meaning. By adopting structured logging, setting clear log levels, avoiding common pitfalls, and preparing for centralized log management, you can transform raw data into actionable insights.

Together with API Reliability and API Observability, logging forms the third pillar of the Reliability Stack. It's what ensures that when something breaks, you don't just know that it broke-you know exactly why and where.

Mastodon