Azure Monitor Architecture and Core Components
Azure Monitor is the foundational service for all monitoring in Azure. It collects data from two primary sources: metrics and logs.
Metrics vs. Logs
Metrics are numerical values collected at regular intervals. They measure specific aspects of resources like CPU percentage, memory usage, or disk I/O. Logs are detailed records of events and activities within your Azure environment, stored in Log Analytics workspaces.
Azure Monitor automatically collects platform metrics for most Azure services without extra configuration. You can view metrics in real-time through the Azure portal's metrics explorer. This lets administrators see performance data instantly.
Data Routing and Retention
Diagnostic settings let you route metrics and logs to different destinations:
- Log Analytics workspaces for detailed analysis
- Storage accounts for long-term retention
- Event hubs for streaming to external tools
The Data Collection Rules (DCR) framework provides granular control over which data gets collected and where it goes.
Understanding Data Lifecycle
Metrics have a default retention of 30 days. Logs in Log Analytics can be retained based on your tier selection. Activity logs automatically capture subscription-level events like resource creation, modification, and deletion.
The integration between these components creates a comprehensive ecosystem. Administrators can correlate data from multiple sources to understand system behavior and troubleshoot issues effectively.
Log Analytics Queries and Kusto Query Language (KQL)
Log Analytics is the query engine within Azure Monitor. You write queries using Kusto Query Language (KQL) to analyze logs and extract meaningful insights from massive datasets.
KQL Query Structure
A basic KQL query starts with a table name, followed by operators that filter, transform, and aggregate data. The pipe character (|) chains operators together. Each operator processes the output of the previous one.
Common operators include:
- where for filtering data based on conditions
- summarize for aggregating data with functions like sum, count, or avg
- extend for creating calculated columns
- render for visualizing results as charts or graphs
Practical Query Examples
A basic query might use where TimeGenerated > ago(1d) to look at data from the last day. Then pipe to summarize to count events by computer. Another example searches for failed login attempts using where EventID == 4625.
Advanced Query Techniques
Effective KQL usage involves understanding common patterns. Track resource changes through the AzureActivity table. Monitor performance metrics like processor percentage over time. The most powerful aspect of Log Analytics is join operations, which correlate data from multiple tables. This helps you find relationships between events and resource changes.
Many administrators struggle with KQL syntax initially, making it ideal for flashcard study. Create flashcards pairing query objectives with code snippets. This approach solidifies your knowledge of specific functions, operators, and their proper syntax.
Alerts, Action Groups, and Notification Strategies
Azure alerts are automated responses triggered when monitoring data meets specified conditions. They enable proactive issue management and prevent problems from escalating.
Alert Types
There are four main alert types:
- Metric alerts trigger based on metric values (e.g., CPU > 80%)
- Log alerts run KQL queries on schedules to identify patterns
- Activity log alerts monitor subscription-level events
- Smart detection alerts from Application Insights identify anomalies automatically
Creating Effective Alerts
Effective alerts require defining three elements: the scope (which resources to monitor), condition (the threshold or query logic), and action (what happens when triggered). Action groups are the delivery mechanism for alerts, specifying how notifications are sent.
Common actions include sending emails, SMS messages, pushing notifications to mobile apps, triggering webhooks, or executing runbooks for automated remediation.
Preventing Alert Fatigue
A well-designed alert strategy prevents alert fatigue. Too many unnecessary notifications cause administrators to ignore important ones. Instead of alerting on every CPU spike, alert when CPU exceeds 80% for more than 5 minutes. This reduces false positives significantly.
Suppression rules and schedules let you disable alerts during maintenance windows or off-hours. Actionable alerts include clear descriptions of what's wrong and suggested remediation steps. Understanding alert latency is crucial: metric alerts evaluate every minute, while log alerts run on configurable schedules. This affects how quickly you detect issues.
Application Insights and Custom Metrics
Application Insights is Azure's application performance management (APM) service. It monitors web applications, microservices, and mobile apps from the client perspective, unlike infrastructure monitoring.
What Application Insights Tracks
Application Insights automatically instruments applications when you integrate the SDK. It collects telemetry about requests, exceptions, page views, custom events, and dependencies like database calls and external API requests.
Availability tests simulate user interactions from multiple geographic locations. You get alerts if your application becomes unreachable or slow. Performance analytics show request rates, response times, and failure rates. The dependency map visualizes how components interact, making it easier to trace performance issues through the application stack.
Custom Metrics and Diagnostics
Custom metrics allow developers to track business-specific metrics. Examples include checkout completion rates or login failures. The profiler captures CPU and memory usage details. The snapshot debugger preserves debug snapshots at the moment exceptions occur, allowing post-mortem analysis.
Integration with Azure Monitor
Application Insights integrates seamlessly with Azure Monitor. Data routes to Log Analytics for advanced querying. For Azure administrators, understanding Application Insights is important for monitoring application health and configuring alerts on application metrics. Flashcards should cover key capabilities, the types of telemetry it collects, and how to configure availability tests and alerts based on application performance metrics.
Azure Advisor and Best Practice Recommendations
Azure Advisor is an automated recommendation engine that analyzes your Azure resources. It provides personalized suggestions for improving reliability, security, performance, and cost optimization based on Azure best practices.
Five Recommendation Categories
Advisor examines your resource configurations against Microsoft's experience with millions of Azure deployments. The five categories are:
- Reliability (preventing downtime through redundancy and backup strategies)
- Security (identifying vulnerabilities and compliance gaps)
- Performance (improving speed and efficiency)
- Operational excellence (streamlining management and processes)
- Cost (reducing unnecessary spending)
How Advisor Works
Advisor scans your subscriptions continuously, generating recommendations with severity levels: high, medium, and low. Each recommendation includes a description of the issue, the business impact, and specific steps to resolve it.
For example, Advisor might detect that virtual machines lack backup protection. It recommends enabling Azure Backup. Or it identifies unused public IP addresses consuming costs.
Accessing and Using Recommendations
You access Advisor recommendations through the Azure portal and filter by category or severity. Set up email alerts for new recommendations. Dismissing or deferring recommendations helps tailor the view to your organization's priorities.
Unlike alerts that respond to current conditions, Advisor provides forward-looking recommendations. For exam study, flashcards should cover the five categories, how to access recommendations, and understanding action items for common scenarios.
