In this chapter, we will cover security operations and exactly what it entails. Like a technical operations team, it is just as important to have a security operations team or Security Operations Center (SOC) and program in place. This team's day-to-day responsibilities include 24/7 monitoring and responding to any security-related incidents within your environment or with your users. This is a critical component and a necessity of the overall security program today.
In this chapter, we will focus on the Microsoft technologies available that can support your SOC and provide the insights needed to ensure your servers, end user devices, and users are safe. We will first cover an introduction to a SOC and provide an overview of what is needed to make this a successful operation. We will then review the Microsoft 365 (M365) security center, which provides a centralized place for monitoring Microsoft security solutions. Next, we will cover Microsoft Cloud App Security (MCAS) and how to configure and use Azure Advanced Threat Protection (ATP).
In the following section, we will review Azure Security Center alerts and incidents, before providing an overview of Azure Sentinel, Microsoft's cloud Security Information and Event Management (SIEM) tool. We will then review Windows Defender ATP alerts and incidents, and review automated investigations before finishing off with a brief overview of Business Continuity Planning (BCP) and Disaster Recovery (DR) and its place within the security program. To recap, this chapter will cover the following topics:
- Introducing the SOC
- Using the M365 security portal
- Using MCAS
- Configuring Azure ATP
- Investigating threats with Azure Security Center
- Introducing Azure Sentinel
- Microsoft Defender Security Center
- Planning for business continuity and DR
In order to follow along with the overviews in this chapter and complete the how-to instructions, the following requirements are recommended:
- An Azure subscription with contributor rights: https://azure.microsoft.com/en-us/free/
- MCAS (trial/MCAS + EMS E3 or E5): https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE2NXYO
- Azure ATP: https://docs.microsoft.com/en-us/azure-advanced-threat-protection/atp-prerequisites
- Azure Security Center (free/standard): https://docs.microsoft.com/en-us/azure/security-center/security-center-pricing
- Azure Sentinel (per-capacity or pay-as-you-go): https://azure.microsoft.com/en-us/pricing/details/azure-sentinel/
- Microsoft Defender ATP (Windows 10 E5/M365 E5): https://docs.microsoft.com/en-us/windows/security/threat-protection/microsoft-defender-atp/minimum-requirements
In addition to these products, we will also discuss an overview of Microsoft's data loss prevention technology using Azure Information Protection (AIP). AIP requires at least Azure AD Premium P1 to configure. Many of the services mentioned in the preceding bullet points also allow you to set up a trial or include free versions to help you get started. We recommend working with a sales representative at Microsoft to understand the best options regarding licensing these products for your organization.
Operations within the technical world have become a very standard and mature process. This function is core to the ongoing success of ensuring your users, systems, and applications are always available and running efficiently for your business. If there is an outage or an issue, operations teams typically follow very strict Service Level Agreements (SLAs) to return the service back to normal. This same concept is applicable to the security world. The concept of a SOC has grown exponentially over recent years, to the point where it is a necessity for maintaining normal business operations.
In short, a SOC manages and overlooks the day-to-day functions of your security operations for your organization. They typically operate 24/7 to monitor and detect potential security risks and alerts within your organization. If any alerts are detected, it is the SOC's responsibility to investigate and remediate them. A major part of this process also includes identifying the impact and potential damage your organization may face as a result of a security incident.
When looking at the security incident response life cycle, there are a few different variations, but they mostly overlap and follow a similar process. When referencing the NIST Special Publication 800-61, Revision 2, Computer Security Incident Handling guide, a four-step process is used for the incident response life cycle:
- Detection and analysis
- Containment, eradication, and recovery
- Post-incident activity
To learn more about the NIST incident response life cycle, visit the NIST Computer Security and Incident Handling guide at https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf.
One important document that needs to be clearly stated after any incident is the Root Cause Analysis (RCA) document. Ensuring a root cause is found is critical to not only ensure that the threat has been contained but also that the vulnerability that caused the incident has been remediated. In addition, and sometimes dictated by the severity of the incident, the RCA may need to be provided to the leadership team for review. For this reason, the RCA must clearly define why there was an incident, what the impact or damage was, how the incident was remediated, and a long-term resolution to ensure it doesn't happen again and that the risk is mitigated.
There are different variations of SOC models that can be adopted within your organization. Different factors will come into play as you decide on which model makes sense. These factors may include the size of your organization, the type of industry of the business, regulatory reasons, and budget considerations, to name a few. In general, an internal SOC can be created that is managed and operated by internal staff in the organization. This is most likely the case for large enterprises with bigger budgets who can afford to recruit top talent. Commonly, organizations opt for a fully outsourced model where you contract your SOC services to an external vendor who specializes in security services. This is referred to as a Managed Security Service Provider (MSSP). The MSSP model is an attractive service for medium to smaller sized businesses who may not have the budget to implement a fully functioning in-house SOC. Hybrid models also exist that maintain some functions of the SOC internally and outsource a subset of specialty services to the MSSP.
An additional service that needs to be accounted for when implementing a SOC program is a Digital Forensic Incident Response (DFIR) service. This is a very specialized service that requires detailed analysis for an investigation into breaches. Because this is a very specialized service and skillset, internal resources on staff or the MSSP may not provide this as part of their standard service offering. With the increasing amounts of active breaches that organizations face today, having a DFIR service available to engage quickly is beneficial to providing detailed forensic analysis of what has been (if anything) impacted. If your business has purchased a cyber insurance policy, the cyber insurance company should have an approved list of vendors who they will allow you to engage with for any DFIR services.
A SOC's ability to be efficient will depend on the tools it has at its disposal to allow the best visibility into your environment and the activities taking place. Throughout this chapter, we will be reviewing many of the Microsoft tools that make up a well-rounded and extremely robust security operations program to best protect your users, Windows devices, and Windows servers. We will first look at the M365 security portal as part of your security operations.
With all of the security tools that are deployed these days and all the different data points, it becomes challenging to keep up with different management consoles and ongoing feature enhancements. With Microsoft, there is a whole suite of security features and functionality that can be enabled for your organization. To help centralize and manage your security with Microsoft, they have provided the M365 security center, which is a place to view, manage, access, and monitor all the M365 security features. This is a very powerful tool for your security operations team and one that will be constantly accessed by that team. To access the M365 security center, browse to https://security.microsoft.com and log in to the management console:
To access the M365 security center, you will need to be either a global administrator, security administrator, security operator, or security reader within Azure Active Directory (AD).
- Home provides an overview of the security center and a dashboard that can be customized with different dashboards.
- Incidents will have a consolidated list of all your incidents, which can be sorted based on status, severity, service source, and so on.
- Alerts from all your active Microsoft security tools will be consolidated here for viewing. Some examples include Office 365 ATP and Windows Defender ATP.
- Action Center shows any current or historical investigations within your environment. Some remediation actions may require manual approval or rejection and others may be automatic.
- Reports is a consolidated place to view and access all activity reports for all active security features.
- Secure score provides insight into all your Secure Score metrics across the environment.
- Hunting is where you can perform deep analysis and search for threats within your environment.
- Classification is where you will find Data Loss Protection (DLP) features, which include Sensitivity labels, Retention labels, Sensitive info types, and Label analytics.
- Policies provides links to all the security policies that can be managed within your environment.
- Permissions is where admins can manage roles and access to the M365 security center.
- Settings will show any configurable M365 settings.
- More resources provides links to all other Microsoft security consoles within your environment.
- Customize navigation is where you can customize what you would like to view within the navigation pane.
Expanding the More resources section will show you all the different management consoles that make up most of the security and compliance portfolio for Microsoft and include quick links. For more information about the Microsoft security portals and admin centers, visit https://docs.microsoft.com/en-us/microsoft-365/security/mtp/portals?view=o365-worldwide.
One important area that will help with providing a more secure environment is Microsoft Secure Score. Microsoft Secure Score provides a numeric score of your environment based on an analysis of the current settings and configurations. It will provide recommendations based on your current state to help improve your overall security posture. Each recommendation has a different impact on your secure score and the higher the score, the more secure your environment is considered.
The current data sources used to build Secure Score reports include Office 365 (Exchange Online, OneDrive for Business, SharePoint Online, and Microsoft Information Protection), Azure AD, Microsoft Defender ATP, and Cloud App Security. Microsoft continues to add more data sources as the product evolves. Secure Score is currently broken down into five different categories: Identity, Data, Device, Apps, and Infrastructure. To access Microsoft Secure Store, follow these steps:
- Browse to https://security.microsoft.com and log in.
- Click on Secure Score in the left menu.
- You will be presented with the Secure Score dashboard with an overview of the different device category scores:
To improve your score, click on the Improvement actions option at the top of the page to view a list of all the recommendations from Microsoft. From here, you can click on a recommendation to view more details on why Microsoft is providing the recommendation, along with how this recommendation will better protect you with the next steps to take. Once the action has been taken, your Secure Score dashboard will reflect the changes and add the points to your score for that specific category group, as well as the overall score. If you are using a third-party tool to manage any of the recommendations, you can click on Resolved through third-party to also receive the points.
In the following example, Microsoft recommends Turn on user risk policy, which will help detect against potentially compromised accounts, allowing actions to be taken:
Recommendations for Azure ATP and Microsoft Defender ATP will be available soon within Microsoft Secure Score.
Expand the Classification menu item or open Office 365 Security & Compliance Center (https://protection.office.com/) within the More resources menu item. These are the data protection tools for your organization. Although they are not directly related to hardening your Windows 10 device or Windows servers, it is critical that you are aware of the tools that will help protect the data your users are accessing from their Windows 10 devices and any data stored on Windows servers. There are three primary technologies to be familiar with to better protect your company data:
- Windows Information Protection (WIP)
These technologies can be used as separate products from Microsoft and may appear to provide some overlap, but they all have their unique usages as well as complementing each other to provide additional protection for your company's data. Let's look at each of these technologies.
DLP tools have been around for a while and provide great benefits to help protect against data exfiltration in your environment. With Microsoft DLP, you can protect sensitive information within Office 365 environments (Exchange Online, SharePoint Online, OneDrive for Business, and Microsoft Teams) from leaking out of your environment. Within the DLP engine, you can create policies that allow you to select which type of data you would like to protect. With the data type, you can select from many pre-defined templates that scan for Personal Identifiable Information (PII), finance, health, and GDPR types of information. You can then define the technology that you would like to search within, along with the rules and conditions, and actions on how to handle sensitive data if detected.
To access Microsoft's DLP, log in to the Office 365 Security & Compliance console at https://protection.office.com/homepage, and select Data Loss Prevention. Click on Policy to set up and review your policies.
The licensing requirements constantly change with Microsoft, so it’s always good to confirm what the current licensing is. You can view the latest Security & Compliance licensing at https://docs.microsoft.com/en-us/office365/servicedescriptions/office-365-platform-service-description/office-365-securitycompliance-center.
AIP, in short, is a technology that provides the ability to classify data by applying both sensitivity and retention labels. You can then enforce protection mechanisms such as encryption against the data based on the label type and sensitivity of that data. Data can be classified manually by users or automated depending on your license type. In addition, you can expand the labeling on to your on-premise files using the AIP scanner. You can view the different functionalities based on license type at https://azure.microsoft.com/en-au/pricing/details/information-protection/.
To access Microsoft's AIP, access the Office 365 Security & Compliance console by logging in to https://protection.office.com/homepage, then select Classification. Here, you can manage Sensitivity labels, Retention labels, and Sensitive info types.
The final technology we will review is WIP. WIP is the technology available to help prevent the accidental leakage or loss of data from your enterprise documents in Windows 10 version 1607 and later. With WIP, you can create policies that prevent data from being moved out of your environment, such as preventing data from being copied to USB drives. WIP also helps bridge the gap between users bringing their own devices and isolating corporate and personal data without impacting the user experience. In addition to running Windows 10 version 1607 or later, you will also need to be licensed for Intune or to run Microsoft Endpoint Configuration Manager. Third-party solutions can also apply WIP policies through the EnterpriseDataProtection Configuration Service Provider (CSP). Information about the CSP can be found at https://docs.microsoft.com/en-us/windows/client-management/mdm/enterprisedataprotection-csp?redirectedfrom=MSDN.
To access Microsoft's WIP, log in to the Microsoft Endpoint Manager admin center at https://devicemanagement.microsoft.com and click on Apps. Then, click on App Protection Policies under the Policy menu. Here, you can create and manage your WIP policies, including MAM policies for iOS and Android. For more information about creating a WIP policy using MDM, visit https://docs.microsoft.com/en-us/windows/security/information-protection/windows-information-protection/create-wip-policy-using-intune-azure.
We have just reviewed the M365 security center and highlighted an overview of important areas. This management resource provides a centralized view of Microsoft's security, including links to many additional security resources. The M365 365 security center is an essential resource for your security operations team. Next, we will review MCAS and the benefits it provides to protect your environment.
MCAS is a Cloud Access Security Broker (CASB). In short, a CASB is a service and/or tool that can extend your security footprint from on-premises into the cloud and provide better visibility and control over your data across a multitude of services. Additionally, it helps to add visibility through the discovery of shadow IT processes, which is traditionally a challenge for many organizations. One of the major benefits of MCAS is its native and simple integration with Microsoft security technologies. In addition, MCAS also integrates with other cloud providers for visibility into all your combined cloud environments in a single console.
As with any security tool, it will take time to fully configure all the MCAS features, implement your monitoring and policies correctly, and ensure that ongoing maintenance and operations are running efficiently. At a high level, you can set up MCAS relatively quickly to start gaining visibility with your environment. To access MCAS, follow these steps:
- Browse to http://portal.cloudappsecurity.com/. You can also access it through the M365 security portal. Then, log in.
You will be presented with the general dashboard. Here, you will be provided with an overview of all the environments that are being monitored. From the dashboard, you will be able to link to other app or risk-type dashboards, alerts, activity or content matches, and any of your investigative activities:
MCAS requires the correct per-user licensing assignments depending on the features you wish to enable. Visit the following link to view more information about license requirements:https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE2NXYO
- The first thing to do is to connect your apps by clicking on the Connect apps option at the top of the dashboard. Once you are on the Connected apps page, click on the + sign to connect to an app, then follow the instructions to add the app:
- The next step is to set up policies based on the provided templates. To access polices and templates, click on the Control menu item in the left-side navigation pane. Depending on the apps you have connected, there may already be predefined policies with the Policy severity and Category options configured. To set up alerts on any policy, go to the Policies page and click on the Edit policy cog, then configure your notifications within the Alerts section.
The following article provides instructions to integrate Microsoft Defender ATP with MCAS for device alerts:
There is a lot more involved in the setup of MCAS. To access the documentation library available from Microsoft, which contains more details on MCAS, visit https://docs.microsoft.com/en-us/cloud-app-security/.
Next, let's go into further detail about MCAS by reviewing the activity log.
The activity log in MCAS shows all the activities that have been performed by your users. The SOC can use these logs during an investigation to identify sign-ins and anomalous behavior and to review activity throughout Azure, Office, and all your connected apps to help identify risky behavior. Activity logs can be filtered to help narrow down the scope of an action if you are trying to identify a specific activity. Some examples of filterable queries include the following:
- Admin activities
- Failed log in
- Impersonation activities
- Security risks
- Password changes and reset requests
The following screenshot is filtered to Failed log in. One way in which the SOC team can use this query is to see which user accounts may be a target for brute-force login attempts. The results return a list of the activity performed, the user account, the application with which the activity originated, the IP address, the location, the device type, and the date:
When investigating the query results, we find that the SOC team identified a failed login attempt to Microsoft Exchange Online originating from Thailand. The users are blanked out for privacy reasons, but the SOC team is aware that their users have never visited Thailand, so the failed log on activity from that location seems suspicious. Using the filtering options, the SOC adds an additional condition to scope the location to Thailand and try to understand more about the source of these login attempts. After the filter is added, multiple failed login attempts to Microsoft Exchange Online are presented, all originating from Thailand with activities being logged every few minutes.
Clicking on a specific activity will open the activity drawer, which provides additional details about the alert, as well as insights about the user and IP in question. The SOC analyst clicks to open the activity drawer and then selects the User tab to return more details, such as the ones in the following screenshot. Here, they can see a map view of the user's frequent locations, which only includes the United States. This further confirms the activity from Thailand as suspicious:
The activity log is a great way to look at all the activities from your connected apps to identify potential security risks and suspicious activity. More information about the MCAS activity logs, including the activity drawer, can be found at https://docs.microsoft.com/en-us/cloud-app-security/activity-filters.
Next, let's look at investigating user accounts to get more comprehensive details about a user's activity.
The Users and accounts page in MCAS lets you view activity relating to a user whose information is gathered from your connected apps. This is helpful for SOC teams as it allows a unified view of the identity across all the applications without needing to look at various monitoring systems. The landing page of Users and accounts returns all the users by the most recent and includes the username, investigation priority, type of account, email address, apps, groups, and the last seen information.
Users and accounts is filterable by username, affiliation, or account type, or you can select a specific connected app or user group. Affiliation filters include internal accounts as well as external accounts, such as Business-to-Business (B2B)-invited guests, and are denoted by green and yellow icons. Any user who has admin privileges in your environment also has a red tie on their icon, as seen in the User name column in the following screenshot:
By selecting the Investigation priority column to filter from highest to lowest, you can sort the users by their Investigation priority score. This score is a point-value system that is comprised of alerts and risky activities based on what MCAS sees as anomalous behavior for the user over a period of the past 7 days. A higher number indicates that the account should have investigation priority and could indicate a compromise or even an internal employee behaving maliciously.
Clicking on the username will bring up a summary of the connected apps where the identity exists. Click on View User to see more detailed information about the user. The following screenshot contains the user summary from MCAS. You can see a timeline of the user score over the past 2 weeks, which days accrued points that led to the increased score, and a timeline view with the summary of actions deemed anomalous:
After the SOC analyst performs their analysis and considers that the account may have been compromised, there are remediation actions under the User actions dropdown that can be taken for immediate remediation. These include the Sign in again, Suspend user, or Confirm user compromised requirements for the user. Selecting Confirm user compromised will raise their user risk level to high and any Azure AD directory policy set for high-risk users will apply.
Clicking on View all user alerts from the user summary will bring up all the alerts associated with the user, which can be reviewed and either dismissed or resolved:
For more information about investigating risky users using MCAS, visit https://docs.microsoft.com/en-us/cloud-app-security/tutorial-ueba.
Azure ATP is a cloud security service from Microsoft that's used to analyze domain network traffic. This solution is helpful for the SOC to identify attacks and malicious movements in your AD environment. Telemetry data is collected by installing the ATP sensor on a domain controller, which forwards that information to the Azure ATP cloud service for investigation using the ATP portal. The ATP portal is a unique instance to your tenant and has a similar style and feel to the other cloud-based security portals that Microsoft offers. The SOC team can use the portal to investigate alerts in a timeline view to correlate activity throughout different phases of the attack kill chain. The ATP sensor will capture the following information and forward it to the ATP service:
- Domain controller network traffic
- Windows events
- Remote Authenticaion Dial-In User Service (RADIUS) account information for a VPN
- User and computer data from AD
In order to use the ATP service, the Forest Functional Level (FFL) of your AD domain must be Windows 2003 or higher.
For more detailed information about the Azure ATP architecture, visit https://docs.microsoft.com/en-us/azure-advanced-threat-protection/atp-architecture.
- An Azure tenant with global or security administrator privileges to configure the ATP instance
- Enterprise Mobility + Security 5 (EMS E5) or a standalone Azure ATP license: https://www.microsoft.com/cloud-platform/enterprise-mobility-security-pricing
- Rights to install the ATP sensor on a domain controller with internet connectivity to the Azure ATP cloud service
For more detailed information about the technical prerequisites, including network configuration, proxy setup, and portal requirements, visit https://docs.microsoft.com/en-us/azure-advanced-threat-protection/atp-prerequisites.
Microsoft also has a capacity planning guide and tool that can help you determine the number of sensors recommended for a rollout. To download the ATP sensor tool and for more information about the CPU and memory requirements, visit https://docs.microsoft.com/en-us/azure-advanced-threat-protection/atp-capacity-planning.
The activation of the Azure ATP portal instance is straightforward. Once you have verified the prerequisites and understand the capacity planning, installation can be achieved by following these high-level steps:
- Using an Azure global administrator or security administrator account, activate your instance of the Azure ATP portal by logging into https://portal.atp.azure.com. Follow the onscreen instructions to confirm the activation.
- From your AD, configure a read-only (domain user) account and password.
- In the ATP portal, click on the Directory Services section of the Azure ATP portal. Enter the username and password of the account created previously. Enter the FQDN of your domain.
- Click on the Sensors tab under System. Download and install the sensor on a domain controller using the access key provided in the portal.
- Once the sensor is installed, it will show up in the Sensors section after you next refresh the page.
For more information about installing the Azure ATP sensor, visit the following link. There are also instructions for a silent installation if you need to deploy the sensor to more than one system using a tool such as Configuration Manager:
Next, let's look at how Azure ATP recognizes sequences of events to identify the attack kill chain.
Once the sensor is installed, the information that is monitored is relayed back to the cloud service for analysis. Azure ATP is designed to detect common attack methodologies within the cyber kill chain. The kill chain is defined as a sequence of events typically followed by a malicious actor to gain knowledge of your environment to ultimately gain domain dominance. Azure ATP recognizes these in the following sequences:
- Compromised credentials
- Lateral movements
- Domain dominance
Having a solution to correlate events and identify these stages in the attack kill chain will help the SOC increase their chances of stopping an attack. This is the real value of adding Azure ATP as a security enhancement to your AD environment. The following table lists the types of alerts that Azure ATP will report on during each stage of the kill chain:
Azure ATP can be integrated with additional security solutions, such as Microsoft Defender ATP, and can forward alerts to MCAS. Combining these solutions creates a robust suite of security tools. Microsoft Defender ATP handles endpoint protection and antivirus with the real-time detection of files and processes, while Azure ATP analyzes the traffic and activity through your domain. These solutions can all be monitored through the MCAS portal, creating a unified view for the SOC. As covered earlier in this chapter, adding MCAS allows organizations to analyze signals from many sources, including Microsoft Online Services apps, such as SharePoint, Exchange, and OneDrive, Microsoft Azure, Defender ATP, Azure ATP, and other third-party app providers, such as Box, Amazon Web Services, Salesforce, and Google Cloud, to name a few. For more information about enabling Azure ATP integration with Microsoft Defender ATP, visit https://docs.microsoft.com/en-us/azure-advanced-threat-protection/integrate-wd-atp.
For more information about using Azure ATP with MCAS, visit https://docs.microsoft.com/en-us/azure-advanced-threat-protection/atp-mcas-integration.
After logging in to the ATP portal, you are taken to the timeline view, where alerts are listed in chronological order with the latest opened alert first. The filter options allow you to view the alert status by Open, Closed, and Suppressed, as well as by High, Medium, and Low severity. Clicking on the three dots, you can choose to close, suppress, or download the alert details. Clicking on the alert heading will open the alert details page. In the following screenshot, the ATP security timeline shows two open alerts that were flagged with medium severity. The first is for the use of a Honeytoken account and the other is for a user and IP address reconnaissance activity:
Let's look at the Honeytoken activity first. Honeytoken accounts are useful detection mechanisms to lure attackers. They can help identify whether there is potentially active reconnaissance going on inside the domain that may have malicious intent. In the Azure ATP portal, a Honeytoken account can be configured from the Configuration menu in the Entity tags settings, under Detection:
In the Sensitive section of Entity tags, you can add additional accounts, groups, or servers that are considered sensitive and easily identifiable and are tagged in the Azure ATP portal. By default, sensitive entities already include domain admins, administrators, domain controllers, enterprise admins, and other privileged objects. The following screenshot shows the Domain Admins security group labeled as sensitive. Any sensitive entity that shows up in an alert will have an S icon beside it:
Heading back to the security alert timeline, let's look at the second alert for User and IP address reconnaissance (SMB). Clicking on the heading will bring up more information about the activity. In the following screenshot, you can see that Emily Young enumerated SMB session details on the MTLABDC01 domain controller from the WinSVR1 source system. This activity could potentially indicate malicious intent and that the account has been compromised. Using this reconnaissance technique, the actor successfully enumerated the SMB session details, as shown at the bottom of the alert and now has access to a list of network locations and accounts to be targeted:
Now that the user accounts and locations have been discovered, further reconnaissance can be carried out. The BreakTheGlass account looks enticing, and more information can be found by running a net user "BreakTheGlass" /domain command. If the user is a member of a highly privileged group, such as Domain Admins or Enterprise Admins, they now have a viable target that is worth going after. Let's look into this a little further by clicking on the user in the alert to pull up the user-specific activity log. As seen in the following screenshot of the user activities list, there are multiple entries that show successful Security Account Manager Remote Procedure Call (SAMR) queries for all domain groups and the Domain Admins security group:
Based on the preceding example, it's highly probable that the account in question has been compromised and is currently in the reconnaissance phase of the kill chain. The SOC team should reach out to the user and attempt to stop the attack by changing the user's password or blocking sign-ins all together. If they are using MCAS or if integration with Microsoft Defender ATP has been configured, they can check for additional alerts triggered from the user's PC and perform advanced hunting to try and identify the source of the compromise.
It's important to understand that once a user's account is compromised, the attacker may have the ability and skills to move laterally throughout systems. Using publicly available tools, such as mimikatz, attackers can attempt to dump credentials stored in memory on the compromised system. If the NTLM hashes are successfully captured, the attacker can use the overpass-the-hash technique to acquire a Ticket-Granting Ticket (TGT) of another user who may have logged into the system and gain further access by acting on their behalf. As shown in the following screenshot, the attacker was able to dump the BreakTheGlass account's NTML hash and acquire a TGT. Luckily, Azure ATP has flagged it as a suspected overpass-the-hash attack:
Assuming BreakTheGlass has elevated permissions on many systems, the attacker can move laterally by executing code remotely with tools such as PSExec and mimikatz to dump credentials from remote systems. If successful, this will allow them to gain more knowledge of the environment. Azure ATP will also flag attempts to execute services remotely by creating a Remote code execution attempt alert, as in the following screenshot:
Now that the attacker has successfully been able to dump credentials from a remote system and use similar reconnaissance techniques, they have acquired a new user account, who is a member of Domain Admins. In the next phase, the attacker will use the Domain Admins privileges to dump all credentials from AD using the Mimikatz DCSync command or a similar sdomain-replication technique. The following screenshot is of an Azure ATP alert for a suspected DCSync replication:
If the replication is successful, the Kerberos Ticket-Granting Ticket Account (KRBTGT) password hash could become compromised and be used in a golden ticket attack. The KRBTGT account is the master service account used in Kerberos distributions. Once collected, it can be used to forge TGTs for any accounts. The attacker now has everything they need to gain domain dominance and exfiltrate data quite easily. As the following screenshot shows, an Azure ATP alert was generated for a suspected golden ticket attack:
Once this level of access has been obtained, the attacker has control over resources that rely on Kerberos tickets for authentication. Even if a compromised user's password has changed, the attacker can still impersonate the account by using the KRBTGT account. The best protection against these types of attacks is to adopt the recommendations given in these chapters to prevent them from occurring and have monitoring in place to detect and alert you on these intrusions. Ensure that privileged accounts are limited and that appropriate access management solutions are in place. Enforcing security baselines and enabling endpoint protection will allow you to lock down the use of PowerShell scripts and hacking tools commonly loaded into malicious payloads. If the KRBTGT account has become compromised, it is recommended to reset it twice to remove any passwords stored in the password history. Resetting the KRBTGT password will be disruptive and invalidate tickets that have been issued to systems. They will likely need to be rebooted, allowing a new ticket to be issued. Users that have active sessions to resources may be required to re-authenticate to services. For more information on resetting the KRBTGT account password, visit https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/ad-forest-recovery-resetting-the-krbtgt-password.
This was just one example of a potential kill chain within an attack of an AD environment. Hopefully, it shows the value of implementing security solutions such as Azure ATP. It gives the security team valuable insights to hopefully help stop these attacks before domain dominance is obtained. For more information about the security alerts in the Azure ATP portal, visit https://docs.microsoft.com/en-us/azure-advanced-threat-protection/understanding-security-alerts.
Next, we will review how to view and investigate any threats with Azure Security Center.
In Chapter 11, Security Monitoring and Reporting, we enabled and configured the standard version of Azure Security Center to gain the benefits of all the available premium features. ATP is part of the standard feature for your Azure environment, including your Windows machines. To view and investigate any threats that have been triggered by Azure Security Center, do the following:
- Log in to https://portal.azure.com.
- Search for Security Center and open it.
- Click on Security Alerts within the Threat Protection section.
- Here, you will see all the generated alerts from your environment:
To further investigate an alert, simply click on the alert and you are provided with additional details. In addition, you will be provided with any available remediation steps by scrolling further down the details page. The following is an example of the details page of an Antimalware Action Failed alert:
To ensure you receive alerts to your SOC, within the Azure Security Center management console, click on Pricing & Settings in the left-side navigation panel, click on your subscription name, then click on Email notifications to configure an email notification for high-severity alerts. You can also use the Workflow automation section to provide additional dynamics for your alerts, in addition to using the Continuous export section to export alerts to a SIEM.
A nice benefit to provide additional insight into your alerts is the security alert map, which provides a visual of where your alerts are being generated from. To access the map, browse to Security alerts map within the Threat Protection section and click on it. Within the map, you can click on the circles to view details on the alert:
In the next section, we will provide an overview of Microsoft's cloud-built SIEM service.
Azure Sentinel is a modernized SIEM and Security Orchestration Automated Response (SOAR) that is built on Microsoft cloud technology. Azure Sentinel is a centralized SIEM solution that provides an intelligent robust life cycle to allow the collection of data, the detection of threats, the investigation of threats, and responses to incidents. Because Azure Sentinel is a cloud-built solution, the ease of setup and integration makes this service an extremely attractive and powerful service for your security needs, especially compared to a traditional SIEM, which typically requires massive amounts of infrastructure and storage to efficiently support the ongoing log collection and compute power to analyze data.
- Log in to https://portal.azure.com.
- Search for Azure Sentinel and open it.
- Click on Add or Connect Workspace.
- Select a workspace to connect to, then click on Add Azure Sentinel or click on Create a new workspace to build a new Log Analytics workspace to add to Azure Sentinel.
- It will take a few minutes to create, then you will be redirected to the Azure Sentinel console:
As you will have just experienced, your cloud SIEM was deployed within minutes. This goes to show the power of cloud technology and, even more, the benefits for security. Now that Azure Sentinel is set up and ready to use, first, you will need to set up your data sources to begin collecting your data. To do this, follow these steps:
- Ensure Azure Sentinel is open.
- Click on Data Connectors within Configuration.
- Within Connector name, select the connector you would like to connect, then click on Open connector page. As an example, search for Azure Security Center and select the connector, then click on Open connector page.
- Follow the instructions to set up the connector. For Security Center, click on Select All or select the individual subscriptions to connect to by clicking on Connect. You can also allow the automatic creation of incidents by clicking on Enable under Create incidents.
- Click on Next steps to review any recommended workbooks, query samples, or relevant analytic templates.
- Once configured, you have just successfully set up a connector to collect logs within Azure Sentinel:
You can go back to set up any other relevant connectors for your environment. Some more important connectors to provide insight to your users and servers include security events, Azure AD, and Office 365, to name a few.
You can view the latest pricing for Azure Sentinel at https://azure.microsoft.com/en-us/pricing/details/azure-sentinel/.
Next, you will want to set up Workbooks to allow enhanced visibility into your environment and data, along with configuring detections to investigate threats. Both of these can be accessed and configured from the navigation pane on the left within the Workbooks and Analytics menu items. There is a lot more to learn with Azure Sentinel than what we have covered here, so visit the Azure Sentinel documentation library to learn more:
The Microsoft Defender Security Center portal is used to investigate and monitor threats directly impacting your Windows devices. To log in to Defender Security Center, visit https://securitycenter.windows.com.
Security Center is useful for SOC teams to monitor, track, and respond to security threats using multiple analysis dashboards, automated investigations, real-time remediation actions, and threat-tracking with an incident management system. The landing page after logging in to Security Center takes you to the Security Operations dashboard. This dashboard provides a high-level overview for the SOC analyst to quickly explore active alerts, investigations, workstations, and users at risk over the last 30 days. The left column in the portal includes the navigation links to all the features of the ATP service. For more information about the Security Center portal, visit the official Microsoft Docs page at https://docs.microsoft.com/en-us/windows/security/threat-protection/microsoft-defender-atp/portal-overview.
By design, access to view the Security Center portal is locked due to the sensitivity of information available through it. Access to the portal can be managed by using Role-Based Access Control (RBAC) or by using basic permissions and assigning Azure AD roles. For basic permissions, the following Azure AD roles can be assigned directly to users:
- Security Administrator allows full access to the portal, including all system information, alerts, and administrative functions.
- Security Reader allows read-only access to log in and view all alerts and other information. You will not be able to perform administrative functions.
For more granular control and to separate permissions based on the SOC job role, use RBAC or roles from the Security Center portal. For example, your organization's SOC may need to read all the data and act on the remediation actions, but they don't need administrative access to change the security settings of your tenant or to enable any features.
If you are using roles for RBAC, any user assigned the Security Reader role will be denied access to the portal until they are assigned to a role.
To create a role, go to the settings icon in the navigation pane and select Roles under the Permissions menu. Click on + Add Role to create a new role. The following screenshot provides an example of a role that a level 3 SOC analyst could be assigned. It would enable permissions to perform almost all actions inside the Security Center portal, except managing the security settings:
Hovering your mouse over each of the Permissions settings will provide additional details about what each setting is scoped to include within the portal. More information about managing portal access using RBAC can be found at https://docs.microsoft.com/en-us/windows/security/threat-protection/microsoft-defender-atp/rbac.
In addition to creating Microsoft Defender ATP roles for RBAC to the portal, creating machine groups allows you to organize your workstations and devices into groups and assign users access to manage them. Machine groups also allow setting the automation level for remediation actions from the action center and automated investigations, which we will discuss next.
A use case for creating machine groups is if workstations are in different regions or if there is a need to separate servers from Windows 10 workstations. Different teams may be responsible for security remediation on servers compared to workstations or PCs in another region. To grant permissions to manage a machine group, an Azure AD security group must be assigned a role in Microsoft Defender ATP. To create a machine group in the Security Center portal, go to Settings and choose Machine groups under Permissions. Currently, the criteria for members include the following conditions:
Click on the User access tag to select an ATP role group that would be responsible for managing this machine group. In the following screenshot, you can see the conditions, as well as the different automation levels that can be set for investigations for the group:
After specifying the member criteria, there is a preview button to show up to 10 machines that match the conditions specified before saving. In order to better organize your workstations based on the conditions when creating a machine group, it is recommended to leverage tags. Tags are a useful way to create logical groups and provide additional identification that may be limited by the member criteria. Tags can be set manually through the Security Center portal or the HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows Advanced Threat Protection\DeviceTagging registry key:
- Registry key value (REG_SZ): Group
- Registry data: Name of tag
The registry keys can be deployed through Group Policy or with a Configuration Manager baseline applied to specific collections of devices. Then, use the Tag condition when creating a machine group to create the logical grouping in Security Center. More information on managing machine groups and tags can be read at https://docs.microsoft.com/en-us/windows/security/threat-protection/microsoft-defender-atp/machine-tags.
Next, let's look at reviewing the alerts queue from the portal.
The alerts queue lists the most recent events that were detected by Microsoft Defender ATP signals over the last 30 days. It's a good place for the SOC team to get a holistic view of all the alerts that were triggered by Defender. The ability to filter the alerts by severity helps to understand any current high-priority risks that devices may pose to the organization. The alerts queue also contains a category column, which Microsoft uses to label the alert type by analyzing the signals against the Enterprise Tactics definitions by MITRE. More information about the Enterprise Tactics categories can be found at https://attack.mitre.org/tactics/enterprise/.
The following screenshot is of the Microsoft Defender ATP alerts queue filtered by severity over the last 30 days:
Clicking on each alert will bring up the alert investigation page with additional details and real-time remediation actions. Depending on the number of devices in your organization, the alerts queue can be noisy and can make it difficult for the SOC team to investigate. To help with this, Microsoft Defender ATP includes a feature known as Automated Investigations.
Automated investigation and remediation in Defender ATP is a feature that can help keep the noise down on alerts and reduce the volume so that the SOC team can focus more on higher-priority incidents that require immediate action. An automated investigation is initially triggered by an alert analyzed by a series of algorithms known as security playbooks. Once the investigation begins, it is automatically placed in the investigations queue for categorization. Here, the analyst can see additional information, such as the alert that triggered the investigation, the assigned investigation ID, the detection source, and entities such as the workstation's PC name. In the following screenshot, you can see an investigation that has a Pending approval status and needs to take action from the action center:
During an open investigation, any additional alerts generated from the compromised device or similar alerts detected on other workstations will automatically be grouped under the initial investigation.
From the customized column's dropdown, select the Remediation Actions option. That will show the number of actions required to perform the recommended remediation, which is determined by the automation level configured for the machine groups. To view the pending actions, open the action center from the Automated Investigations menu in the left-side navigation pane. Here, you can view all the pending remediation actions and the remediation history of past actions. In the following screenshot, the investigation ID correlates with the ID from the Automated Investigations dashboard:
The action center contains the actions that require user approval before a remediation action can occur. By default, all onboarded machines are placed into an ungrouped machines group, which has the automation level set to require approval. As a result, any investigation that requires action will be placed under a Pending approval status and into the action center for review. By setting the automation level to Full – remediate threats automatically, all remediation actions will be performed automatically and no approval is needed from the SOC team. To view more information about how threats are remediated with Automated Investigations, visit https://docs.microsoft.com/en-us/windows/security/threat-protection/microsoft-defender-atp/automated-investigations.
Reviewing the incidents queue
The incidents queue is another place where the SOC analyst can go and review all the incidents that were created in the Security Center portal. Using the incidents system allows an additional view to sort the incidents by severity, the number of active alerts included in the incident, the detection source, and how many machines are affected. The queue also contains the categories based on the MITRE enterprise tactics for each incident. The following screenshot shows the incidents queue in the Security Center portal. By default, only 30 days worth of incidents are shown:
Additional value that the incidents queue provides comes from a service management perspective. Each incident can be assigned an owner for a follow-up and additional investigation. Here, the SOC analyst can set the incident status to an active or resolved state and determine whether the alert is real by setting the classification label as true or false. Setting the alert classification label helps Microsoft Security Center learn from the alerts and improve the efficiencies of the alert-identifying algorithms and security playbooks.
Selecting an incident will open the quick summary incident management pane in the web browser. This quick view lets you perform actions such as viewing all the alerts tied to the incident, assigning an owner, or quickly setting a classification:
Click on Open incident page to view the full incident details. The incident page view further breaks down what comprises each incident by sorting alerts, machines, the number of investigations, the evidence that triggered the alerts, and a visual graph to help understand the attack from a graphical perspective. The following screenshot shows the evidence summary that lists all files, processes, and persistence methods that correlated to trigger an alert in Microsoft Defender ATP:
The Graph option is currently in beta release at the time of writing.
Additional information about investigating and managing incidents in the Security Center incidents queue can be found at https://docs.microsoft.com/en-us/windows/security/threat-protection/microsoft-defender-atp/view-incidents-queue.
We have just covered an overview of how the SOC can leverage Microsoft Defender Security Center for daily operational security tasks. Next, let's look at how organizations can approach BCP and DR.
To finish this chapter, we are going to cover BCP and DR and the importance they play as they relate to security. When we look at BCP and DR, it is important to understand that these are separate functions that serve different purposes. BCP is a business-specific function that focuses on the business as a whole to ensure the continued operation of the business. The DR function is technical in nature and focuses on the recovery of IT infrastructure and systems. The DR plan falls within the larger BCP plan for the entire organization.
BCP is not a simple plan to build and put in place as it requires a lot of time and resources to build the plan. In addition to building a well-documented plan, it is just as important to ensure that everyone is familiar with the plan and that it has been coordinated and tested in some way. When it comes to executing the BCP plan in a real-world scenario, you don't want to be doing so for the first time without at least being familiar with the process and steps involved. An example of a situation that could require the execution of BCP includes a natural disaster, such as hurricanes, earthquakes, and floods. Depending on the severity, fires or power outages are also examples of incidents that could cause a BCP plan to execute, as well as the more common threats that we see today, such as cyberattacks, which can easily bring a business to a halt based on their dependency on technology.
At the time of writing, we are currently undergoing one of the biggest BCP exercises of our lifetime—the COVID-19 pandemic. This situation has forced most of the world to shift to a fully remote workplace almost overnight, a situation that most companies were certainly not prepared for. Fortunately, technology has enabled many businesses to continue their operations, but the situation has revealed a major gap for many—that of security. There will be a lot of lessons learned from this situation, and it is one that I'm sure will have many companies re-visiting their BCP strategy and looking at security very closely. This situation alone will bring more visibility to the importance of BCP and this will be an area of focus for many companies for years to come.
As we look at today's threats in the security space, there has been an increase in advanced cyberattacks, and we are seeing more sophisticated attacks around ransomware, which is preventing businesses from operating efficiently, or even at all in some instances. Many businesses aren't prepared for these levels of attacks and it could take them days, weeks, or even months to get back to normal operation. There's a chance that if a business isn't prepared for these types of attacks, they could even lose business-critical data and in some instances, go out of business, depending on the damage. Therefore, it is critical that your organization fully understands the possible threats to your organization and how you should best deal with them with a well-defined BCP plan that allows ongoing operations in the event of a cyberattack. One important part of this process as it relates to security is to ensure your leadership team has gone through some form of cyber-incident tabletop or simulation exercise. These types of exercises will show the importance of having a good BCP plan and will provide insight into some of the difficult decisions that may need to be made as part of an actual event.
When we look at DR, it's just as important to have a well-defined and documented plan specifically for DR based on its specialty. This plan will fall within the overall BCP program, but the execution of a DR plan will be unique and based on the situation, it may not impact the entire business. As stated, DR is focused on the technical aspects and ensuring that the technology, systems, and applications that support the business continue to operate in an outage. As mentioned earlier, events that impact BCP will most likely have some form of impact on your systems and may require the execution of DR. Some examples include a natural disaster, such as a hurricane or a fire taking out a data center, as well as a cyber incident that could take down all your systems.
Your DR plan may not need to be fully executed depending on the situation. If a business-critical application becomes corrupted, you may only need to execute DR for that specific application to ensure restoration. There are many different instances that could cause DR to be required, so it's critical to account for each of these scenarios and make sure you are able to accommodate recovery for individual systems or entire data centers. An important part of your DR plan is understanding the impact of a service or function and the Maximum Tolerable Downtime (MTD) a service or function can withstand before your business is negatively impacted. Two other important factors that also come into play are the Recovery Time Objective (RTO) and Recovery Point Objective (RPO). The RTO is the maximum amount of time a system can be unavailable for before negatively impacting a service or function within the business and preventing it from being able to operate normally again. The RPO is the point in time in which the service or function can afford to lose data without being negatively impacted.
As you build out your recovery plan and understand the expected restoration of each system based on the RTO and RPO, you are going to need to ensure that you have the technology and proper planning in place to meet those requirements. A few examples of considerations include the following:
- Having High-Availability (HA) configurations for systems may help restoration from any local issues within a data center but will come at a cost.
- Understanding your backup strategy as it relates to full backups, incremental backups, and differential backups, as well as considering how often each backup needs to be taken and retained.
- Considering what type of failover is needed for a complete restoration of a data center should you have a cold, warm, or hot site available.
All of these considerations will depend on the business requirements and needs, along with the cost. Having a hot site on standby will come at a much greater cost than that of a warm or cold site. More importantly, as you implement your DR plan and backup strategies, the role security plays in each should be understood. Ensuring that your data is backed up securely, that any off-site storage of data is secure, and that your standby data centers maintain the same level of security as your primary data centers should all be taken into consideration. Referring back to the ransomware cyberattack creates new challenges for DR, also. These types of attacks can make your entire network and systems inoperable and unrecoverable, so having an offline back-up system that is isolated from your production system will be critical. If you rely on highly available technologies only and have back-up systems connected to your network without the correct measures in place, a well-executed ransomware attack could also impact your back-up or failover systems. For many, a full, clean restoration may be the only option in some cyberattack situations.
Thorough BCP and DR planning requires a lot of thought and collaboration between both the IT architecture and the operations and security teams, as well as the business stakeholders. We want to ensure that the importance of these activities is called out and accounted for as they influence the structure of your overall security program. One of the biggest advantages of having an effective BCP and DR plan today is the use of the cloud. The cloud allows the enablement of services at a pace not seen before. The ability to span your data center, services, and applications regionally or even geographically all over the globe allows the true redundancy of services, including isolated backups of your data.
There are many frameworks available to assist with your BCP/DR programs. Since we've referenced NIST throughout this book, we will also reference the SP 800-34 Rev. 1, Contingency Planning Guide for Federal Information Systems NIST framework publication as a great resource for your BCP/DR planning:
In this chapter, we covered security operations and reviewed the tools and technologies available from Microsoft that offer enterprise-class protection. We began the chapter with an introduction to the SOC and the importance of its place in an enterprise. We then introduced the M365 security portal and provided an overview of the feature. Next, we reviewed Microsoft's version of a CASB, known as MCAS. Then, we learned how to activate an instance of Azure ATP and review alerts throughout the cyber kill chain.
Other tools and features reviewed in this chapter included Azure Security Center to review and investigate alerts, Microsoft's SIEM, known as Azure Sentinel, and Microsoft Defender Security Center for alert and incident management. We finished off the chapter with an overview of BCP and DR.
In the next chapter, Chapter 13, Testing and Auditing, we will review validating controls to ensure the security measures that have been agreed on are actually in place. We will then review vulnerability scanning and testing to ensure your controls are working correctly, before finishing off with an overview of penetration testing and remediation.