🧩 Understanding Sensitive Information Types (SITs) in Microsoft Purview

In today’s cloud-driven world, organizations store vast amounts of data across multiple platforms — Microsoft 365, Azure, SharePoint, Teams, file servers, and SaaS apps.
The challenge? Knowing what data is sensitive and where it lives.

That’s where Sensitive Information Types (SITs) come in — they are the core detection mechanism behind Microsoft Purview’s Information Protection and Data Loss Prevention (DLP) capabilities.

🔍 What Are Sensitive Information Types?

A Sensitive Information Type (SIT) is a pattern-based rule used by Microsoft Purview to automatically detect and classify sensitive content in your environment.

Each SIT is built using:

Regular expressions (regex) for pattern detection
Keyword dictionaries for contextual matching
Checksum algorithms for validation (e.g., credit card numbers)
Confidence levels to score accuracy (Low, Medium, High)

When Microsoft Purview scans data, it uses SITs to detect things like credit card numbers, ID documents, health records, or financial data, helping organizations protect and govern this information intelligently.

💡 SITs are the detection engine behind features like DLP, Auto-Labeling, the Information Protection Scanner, and Insider Risk Management.

🧠 How SITs Work

Each Sensitive Information Type has primary and secondary elements:

Primary element – The actual pattern (e.g., 16-digit number).
Supporting element – Keywords or context that increase accuracy.
Confidence level – Indicates how sure the system is about a match.

For example, the built-in Credit Card Number SIT:

Uses regex to find 16-digit sequences
Validates the number using the Luhn checksum
Looks for keywords like Visa, Mastercard, or Amex nearby

If these conditions are met, Purview flags the content as Sensitive with high confidence.

📦 Built-in Sensitive Information Types

Microsoft provides over 400 built-in SITs across 50+ regions, covering major regulatory and compliance frameworks.

🌍 Common Categories

Category	Examples
💳 Financial	Credit Card, SWIFT Code, IBAN, Bank Account Number
🧾 Personal Identifiers	Passport, Driver’s License, National ID, SSN
🏥 Health	ICD-10 Code, NHS Number, Health Insurance ID
💼 Corporate Data	Employee ID, Payroll Number, Tax File Number
🌐 Regional Regulations	EU ID, Aadhaar (India), SIN (Canada), INSEE (France)

You can view all available SITs in the Microsoft Purview Compliance Portal → Data Classification → Sensitive Info Types.

🧰 Custom Sensitive Information Types

If your organization has unique data formats, you can create custom SITs to identify them.

For example:

Law Firm: Case File Numbers (e.g., CFN-1234)
Bank: Loan Reference IDs
Manufacturer: Product Serial Numbers

⚙️ Steps to Create a Custom SIT

Go to Microsoft Purview → Data Classification → Sensitive Info Types
Click Create → define using regex or keyword patterns
Set confidence levels (Low / Medium / High)
Test detection using sample files
Publish for use in DLP, auto-labeling, or retention policies

Advanced users can also upload SIT definitions via PowerShell XML templates.

🧠 Trainable Classifiers (AI-Based SITs)

Beyond regex and keywords, Purview offers Trainable Classifiers powered by machine learning.
These classifiers learn from real examples of your documents — identifying content by context and meaning, not just patterns.

Built-in Classifiers Include:

Resume
Contract
Source Code
Financial Document
Health Record

You can also create custom classifiers for business-specific documents by uploading a labeled training set in the Purview portal.

🧠 Trainable classifiers help discover unstructured or context-rich data that static SITs can’t easily detect.

⚙️ Where SITs Are Used in Microsoft Purview

Feature	Purpose of SITs
🧭 Data Loss Prevention (DLP)	Detects sensitive data in motion and applies rules to block or warn.
🏷️ Auto-Labeling	Automatically applies sensitivity labels based on detected SITs.
📊 Information Protection Scanner	Scans file shares and on-premises repositories for sensitive data.
🔎 Data Classification Reports	Provides visibility into where sensitive information exists.
🧠 Insider Risk Management	Correlates user activities with sensitive data access and sharing.

SITs form the foundation of data discovery, labeling, and protection in Microsoft Purview.

🧾 Roles and Permissions

To view, manage, or create Sensitive Information Types, you need specific Purview roles:

Role / Group	Access Level
🛡️ Compliance Administrator	Full control to create and manage SITs
🔍 Security Administrator	Monitor SIT detections and alerts
📄 Information Protection Contributor	Create custom SITs and manage classifiers
👁️ Content Explorer Viewer	View SIT matches in files
🧰 Global Administrator	Full tenant access (for initial setup only)

📖 Microsoft Doc: Purview permissions

💼 Licensing Requirements

The ability to use and manage Sensitive Information Types depends on your Microsoft 365 license.

Feature	Required License
Use built-in SITs in DLP or labeling	Microsoft 365 E3 (partial), Microsoft 365 E5 (full)
Create custom SITs	Microsoft 365 E5 / A5 / G5
Use Trainable Classifiers	Microsoft 365 E5 / E5 Compliance
Auto-labeling with SITs	Microsoft 365 E5 Information Protection & Governance add-on

📚 Microsoft Doc: Purview Service Description

✅ Best Practices

Start with built-in SITs — they’re optimized for accuracy.
Test before enforcing — use simulation or audit mode first.
Combine SITs with sensitivity labels for layered protection.
Use trainable classifiers for complex or free-text documents.
Continuously review detections in Content Explorer to fine-tune policies.

📚 Check Microsoft Documentation

For more detailed technical references, visit the official Microsoft Learn articles:

🧭 These pages provide up-to-date lists, XML structures, and API references for developers and administrators managing SITs in enterprise environments.

🏁 Final Thoughts

Sensitive Information Types (SITs) are the intelligence behind Microsoft Purview’s data protection ecosystem.
They enable automatic detection, labeling, and governance of sensitive information — ensuring your organization stays secure and compliant.

💡 If Sensitivity Labels are the wrappers, SITs are the detectors that tell Purview when to protect your data.

#MicrosoftPurview #SensitiveInformationTypes #DataClassification #InformationProtection #Compliance #Microsoft365 #DLP #MicrosoftSecurity

Abou Conde's Blog

Cloud and Infra Security

🧩 Understanding Sensitive Information Types (SITs) in Microsoft Purview

🔍 What Are Sensitive Information Types?

🧠 How SITs Work