With the announcement of Microsoft 365 Copilot availability for businesses of all sizes (no minimums) many people have been eager to get their hands on the widely hyped product. Before blindly reselling this product, it’s important to understand the important security and data privacy concerns that AI brings to an organization. In this article, I will be covering those concerns with Copilot and provide some recommendations on best practices for security and data governance. By adopting the recommended best practices, Managed Service Providers (MSPs) can not only unlock a lucrative revenue stream but also significantly enhance their clients’ security and data management stance.
First things first, why should I be concerned?
At a high level, when Copilot is enabled in an organization, it begins to index data across various sources such as user mailboxes, SharePoint repositories, Teams chats, etc. The architecture is built to respect existing access controls and Microsoft 365 compliance policies for what data it provides back to a user when they prompt for information.
This means that if the user asks a question about data in their mailbox such as “Provide me a summary from all emails from Bruce Wayne over the past week” it will provide that summary of data but if the user ask “Provide me a summary of the emails from our CEOs mailbox in the past week” it will tell the user it cannot fulfill that request. Concerns come when we talk about files/file repositories/sensitive documents a user may have inadvertent access to. I showed this more clearly in my last YouTube video where I asked Copilot to provide me a list of all documents and chats that referenced Live Chat. It provided me a citation of a document I didn’t know existed and a Teams Channel that I didn’t know I was a member of. While this document was not confidential and contained no sensitive information, it depicts a glaring problem you might come across for sensitive data in your organization. A common example that people are using is asking Copilot to give salary information from other employees and it providing that data because the user has inadvertent access. The questions to start asking would be:
- What if someone got access to sensitive HR documents?
- What if someone had access to key IP or financial information about the company?
- As a healthcare company, what if someone was able to access medical records?
And on and on, your mind can wonder thinking about the risk to your business or the businesses you manage.
Some other key security considerations:
- Insider Threat: Expanding off the above but any insider could have access to sensitive data and choose not to disclose that information. When/If they leave, they could exploit that information.
- Data Exfiltration: Users with inadvertent access to data could end up sharing or storing the data in unsecure or unauthorized locations. They could easily save a copy or share with other users (internal OR external) because they have rights to the document(s).
- Threat Actors: If you consider a user getting compromised, the speed in which a bad actor could get access to sensitive information has exponentially increased. They could easily go to M365 Copilot chat and leverage prompts to get what they need or use that as a social engineering tactic to move laterally throughout an organization. Copilot will be a powerful weapon to these individuals.
Here is the current list of supported file types for the user-level index and tenant-level index that Copilot works with:
How can I protect my organization?
I think the good part of AI and Copilot is that it will start to force organizations to formulate a data governance strategy. Most businesses (especially in SMB) are not going to have a good answer to questions such as the following:
- How does your organization define sensitive data within the business?
- Do you know how data flows through your organization?
- Where does sensitive data exist within your organization?
- How are users granted access to sensitive data or documents?
- What are your sharing policies within your Microsoft environment?
These are some fundamental definitions businesses need to make as part of an AI readiness assessment (more on that to come in future blog post).
When we think about putting protections in place, it really comes down to Access Controls and Data Protection. If we mirror this to a framework like the CIS Controls, it would be controls 3 (Data Protection) and 6 (Access Control Management). I am going to propose a hypothetical assessment an organization could go through to perform an audit and begin to define a data governance framework tailored to their business.
1. Define what sensitive data means to your business
Sensitive or confidential data will vary by business. Sensitive data at a healthcare company, for instance, will be a lot more complicated than a small business that repairs iPhones. You need to be thinking about data you store that contains both internal information as well as external customer/patient information. Some high-level topics to think about:
- Where is our HR, Payroll, and Expense information stored?
- Where are the company financials located?
- Do we have any PII such as credit card info, social security numbers, routing numbers, etc.?
- What data would we be concerned about getting in the wrong hands?
I encourage you to perform this exercise with key executive peers in the company.
2. Identify where sensitive data lives in Microsoft 365
This is challenging to do and can vary depending on how your organization operates. SharePoint is the backbone of both SharePoint sites as well as Teams channels for document repositories, so it is the key auditing location. This is where all shared documents will live. Depending on your licensing model, you may have access to more advanced features of Microsoft Purview (Microsoft compliance solution) that allow you to do more of an automated scanning of documents within the organization. Most business licensing (Like M365 Business Premium) is not going to come with these advanced features and we still have to consider that even if you did have the licensing, what it is looking for is more traditional forms of PII vs what YOUR business might consider confidential. I would recommend the following:
- Start with the key departments in your company (think HR, Finance, Legal, etc.)
- Identify Teams channels and SharePoint sites that these departments leverage
- Audit these locations first as a priority.
- Document any sensitive files and folders
We need to consider Time + Impact here and spending time in these areas will provide the biggest impact in the sense of blanketed protection. I think many companies will look to a 3rd party solution that can help them find sensitive data in their organization (I have no solid recommendations as of yet).
3. Evaluate Existing Sharing Policies (Internal and External)
By default, the settings in Microsoft are configured to allow users to share files with anyone either internally or externally. While this is done to not cause user friction, it’s not the recommended setup if you want to avoid data exfiltration or inadvertent sharing.
You should review these settings in the SharePoint admin center as well as the subsequent sites that are tied to your review from the previous step
4. Formulate a data classification taxonomy
Microsoft Purview has a product feature called information protection labels. It comes with licenses such as Microsoft 365 Business Premium and allows you to tag documents with labels that match their sensitivity. These labels support custom controls such as encrypting the document, preventing external sharing, limiting visibility, and more. Common labels you could use are Public, Private, and Confidential as a basic example. Coming back to our Copilot architecture, this is an additional control you could put into place on the documents which would restrict users from analyzing or getting responses about that document when they are asking copilot questions, EVEN if they have access to the repository. As an example, lets say Bruce Wayne has access to a Teams channel called Finance because he was added to a group that is part of that channel inadvertently. Bruce is not part of the finance team and works on the help desk. Bruce has access to the Finance document repository because he is part of that channel. Flash Gordan is also in the Finance teams channel but actually works in Finance. Flash creates a new document that details out all of the companies financials for 2023. Flash tags the document with a “Confidential” label which has protections to not let anyone outside of 3 members in the org view the document. Flash saves this document in the Finance teams channel. If Bruce were to ask Copilot questions about 2023 financials, Copilot would not provide him any data because it would recognize the label applied to the document. This is a basic example but is very powerful when you think about protecting your sensitive data. If you are just getting started with information protection labels, here are some of my recommendations:
- Start simple: Even just starting with one label of “Confidential” is a great way to protect your most important data. You can always add labels over time. Even later on, you should not have more than 5 labels total in my opinion to not confuse your users.
- Apply more granular protections over time: I will talk more on this later, but getting users to adopt labeling is a hard thing to do. You can make it mandatory that they have to apply a label to every document they save but it creates chaos if there has not been proper training in place
5. Evaluate existing access controls
Part of your audit should be a review of your existing access controls as it relates to your users and groups. Group membership is the preferred method of applying rights to sites/document repositories vs assigning by individual users. This requires you to take a step back and look at your change management process, specifically user onboarding, offboarding, and lateral changes in the organization. The preferred method of rights management within Entra ID would be to leverage Dynamic Groups (requires Entra ID P1) so that you can automate access controls based on specific attributes of users such as their job title, department, and/or if their account is enabled. This will streamline users getting access to the correct documents, Teams Channels, and SharePoint sites over time and also removing that access when applicable as well.
You need to evaluate memberships to sites and Teams Channels to determine if there are members that do not or should not need access to these repositories.
If you have Entra ID P2 licensing (most customers in SMB won’t have this but most MSPs will. Enterprise licensing such as E5 comes with this) then I would also recommend looking into catalogs and entitlement management with Entra ID Governance.
6. Apply Sensitivity Labels to the most sensitive data
Now that you have completed more of your audit and prep work, you will want to start to take action. The first thing you can do is apply the sensitively labels you created earlier to the most sensitive information you discovered in your audit. This will create the biggest impact as it relates to preparing the coming for AI/Copilot
7. Update Access Controls
Depending on your audit this could include (but not limited to) the following:
- Removing users from repositories, Team Channels, SharePoint sites, etc.
- Updating group memberships
- Creating new Dynamic Groups
- Archiving certain teams channels/SharePoint sites
- Defining a new SOP for change management (user onboard, offboard, lateral movement, Teams access, SharePoint site access, etc.)
8. Apply Sharing and Repository creation restrictions
As noted above, you should have more granular controls as it relates to the internal and external sharing policies. It is also key to note that if everyone in your company can create a new Teams Channel then that is A PROBLEM. Outside of the mess this can make, it can easily lead to data exfiltration. There are steps you can take to limit new teams creation. Here are some helpful resources on these topics:
- Restrict Users who can Create Teams Channels – Tminus365 Docs
- Private Channels shall be utilized to restrict access to sensitive information – Tminus365 Docs
- External User Access SHALL Be Restricted – Tminus365 Docs
- File and Folder Links Default Sharing Settings SHALL Be Set to Specific People – Tminus365 Docs
- Sensitive SharePoint Sites SHOULD Adjust Their Default Sharing Settings – Tminus365 Docs
- Exclude SharePoint sites from Copilot Semantic index: Semantic Index for Copilot | Microsoft Learn
9. Develop a plan for data and access control lifecycle
Executing on the items above will certainly be a great win but there needs to be an ongoing process that the organization follows and definitions need to be created for the organizations data governance policy. I think it’s also important to form a gap analysis today on where you are and set some goals for the future. For example:
- Maybe you only have one label today and want to have 3 defined by the end of the year
- Maybe you have a manual process today for how you review groups memberships and you want to look at automated Access Reviews in Entra ID in the future.
- Maybe you do not force end users to apply labels to documents when saving but want to do so by the end of the year after proper training
- Maybe you want to incorporate automatic labelling if you are a company with more sensitive PII within documents
The list could go on here, but the key point is that you are setting goals and progressing.
I think another key call out here is formulating retention policies against your data as part of the lifecycle. CIS Control safeguards call for a data destruction definition and I think that is important to apply within your organization. Defining how long you keep your data can reduce the data privacy and security footprint if you are not keeping things indefinitely.
More Tips and helpful articles:
- Data Loss Prevention Policies: I didn’t touch on these here but these can also be leveraged to help protect sensitive data from moving across the organization to users who should not have access (this is part of Microsoft 365 Business Premium).
- Privileged Identity Management (PIM): One of the other components you have to consider here is highly privileged roles (think SharePoint admin, Teams admin, etc.) Because of these users rights, they will have access to more information via Copilot. For this reason, it’s my recommendation to:
- Keep these roles assigned to service accounts vs users with an active license
- Whether or not you take my first piece of advice, make these roles eligible upon activation with PIM. PIM applies just in time and just enough access for these roles so that users do not perpetually have these rights. (PIM requires Entra ID P2)
- Scanning Documents from on-prem file shares: Learn about the Microsoft Purview Information Protection scanner | Microsoft Learn
- Microsoft Purview data security and compliance protections for Microsoft Copilot | Microsoft Learn
- Apply principles of Zero Trust to Microsoft Copilot for Microsoft 365 | Microsoft Learn
- Data, Privacy, and Security for Microsoft Copilot for Microsoft 365 | Microsoft Learn
- SMB AI Readiness Assessment: https://aka.ms/CopilotM365assessment
Licensing Considerations
Copilot for Microsoft 365 is an add-on plan with the following licensing prerequisites:
- Microsoft 365 E5
- Microsoft 365 E3
- Office 365 E3
- Office 365 E5
- Microsoft 365 A5 for faculty
- Microsoft 365 A3 for faculty
- Office 365 A5 for faculty
- Office 365 A3 for faculty
- Microsoft 365 Business Standard
- Microsoft 365 Business Premium
A majority of the advanced access control and compliance features are only going to be found in licenses like E5. If I only had Business Standard, I would not touch Copilot as there is really no automated way to govern the data management within the tenant.
For customers in SMB, Business Premium has the best mix of features including:
- Information Protection Labels
- Data loss prevention Policies
- Retention Policies
- Dynamic Groups
- On-prem file share scanner
You still will not have any automated way to see where data exist in the org without bumping up to a higher enterprise plan. For this reason, I think many users will look to a cheaper 3rd party solution to help them find sensitive data.
For MSPs, who likely have access to Entra ID P2 licensing, I highly recommend checking out some more of the advanced Governance Features for access controls: