🧩 Understanding Sensitive Information Types (SITs) in Microsoft Purview

In today’s cloud-driven world, organizations store vast amounts of data across multiple platforms — Microsoft 365, Azure, SharePoint, Teams, file servers, and SaaS apps.
The challenge? Knowing what data is sensitive and where it lives.

That’s where Sensitive Information Types (SITs) come in — they are the core detection mechanism behind Microsoft Purview’s Information Protection and Data Loss Prevention (DLP) capabilities.


🔍 What Are Sensitive Information Types?

A Sensitive Information Type (SIT) is a pattern-based rule used by Microsoft Purview to automatically detect and classify sensitive content in your environment.

Each SIT is built using:

  • Regular expressions (regex) for pattern detection
  • Keyword dictionaries for contextual matching
  • Checksum algorithms for validation (e.g., credit card numbers)
  • Confidence levels to score accuracy (Low, Medium, High)

When Microsoft Purview scans data, it uses SITs to detect things like credit card numbers, ID documents, health records, or financial data, helping organizations protect and govern this information intelligently.

💡 SITs are the detection engine behind features like DLP, Auto-Labeling, the Information Protection Scanner, and Insider Risk Management.


🧠 How SITs Work

Each Sensitive Information Type has primary and secondary elements:

  • Primary element – The actual pattern (e.g., 16-digit number).
  • Supporting element – Keywords or context that increase accuracy.
  • Confidence level – Indicates how sure the system is about a match.

For example, the built-in Credit Card Number SIT:

  • Uses regex to find 16-digit sequences
  • Validates the number using the Luhn checksum
  • Looks for keywords like Visa, Mastercard, or Amex nearby

If these conditions are met, Purview flags the content as Sensitive with high confidence.


📦 Built-in Sensitive Information Types

Microsoft provides over 400 built-in SITs across 50+ regions, covering major regulatory and compliance frameworks.

🌍 Common Categories

CategoryExamples
💳 FinancialCredit Card, SWIFT Code, IBAN, Bank Account Number
🧾 Personal IdentifiersPassport, Driver’s License, National ID, SSN
🏥 HealthICD-10 Code, NHS Number, Health Insurance ID
💼 Corporate DataEmployee ID, Payroll Number, Tax File Number
🌐 Regional RegulationsEU ID, Aadhaar (India), SIN (Canada), INSEE (France)

You can view all available SITs in the Microsoft Purview Compliance Portal → Data Classification → Sensitive Info Types.


🧰 Custom Sensitive Information Types

If your organization has unique data formats, you can create custom SITs to identify them.

For example:

  • Law Firm: Case File Numbers (e.g., CFN-1234)
  • Bank: Loan Reference IDs
  • Manufacturer: Product Serial Numbers

⚙️ Steps to Create a Custom SIT

  1. Go to Microsoft Purview → Data Classification → Sensitive Info Types
  2. Click Create → define using regex or keyword patterns
  3. Set confidence levels (Low / Medium / High)
  4. Test detection using sample files
  5. Publish for use in DLP, auto-labeling, or retention policies

Advanced users can also upload SIT definitions via PowerShell XML templates.


🧠 Trainable Classifiers (AI-Based SITs)

Beyond regex and keywords, Purview offers Trainable Classifiers powered by machine learning.
These classifiers learn from real examples of your documents — identifying content by context and meaning, not just patterns.

Built-in Classifiers Include:

  • Resume
  • Contract
  • Source Code
  • Financial Document
  • Health Record

You can also create custom classifiers for business-specific documents by uploading a labeled training set in the Purview portal.

🧠 Trainable classifiers help discover unstructured or context-rich data that static SITs can’t easily detect.


⚙️ Where SITs Are Used in Microsoft Purview

FeaturePurpose of SITs
🧭 Data Loss Prevention (DLP)Detects sensitive data in motion and applies rules to block or warn.
🏷️ Auto-LabelingAutomatically applies sensitivity labels based on detected SITs.
📊 Information Protection ScannerScans file shares and on-premises repositories for sensitive data.
🔎 Data Classification ReportsProvides visibility into where sensitive information exists.
🧠 Insider Risk ManagementCorrelates user activities with sensitive data access and sharing.

SITs form the foundation of data discovery, labeling, and protection in Microsoft Purview.


🧾 Roles and Permissions

To view, manage, or create Sensitive Information Types, you need specific Purview roles:

Role / GroupAccess Level
🛡️ Compliance AdministratorFull control to create and manage SITs
🔍 Security AdministratorMonitor SIT detections and alerts
📄 Information Protection ContributorCreate custom SITs and manage classifiers
👁️ Content Explorer ViewerView SIT matches in files
🧰 Global AdministratorFull tenant access (for initial setup only)

📖 Microsoft Doc: Purview permissions


💼 Licensing Requirements

The ability to use and manage Sensitive Information Types depends on your Microsoft 365 license.

FeatureRequired License
Use built-in SITs in DLP or labelingMicrosoft 365 E3 (partial), Microsoft 365 E5 (full)
Create custom SITsMicrosoft 365 E5 / A5 / G5
Use Trainable ClassifiersMicrosoft 365 E5 / E5 Compliance
Auto-labeling with SITsMicrosoft 365 E5 Information Protection & Governance add-on

📚 Microsoft Doc: Purview Service Description


✅ Best Practices

  1. Start with built-in SITs — they’re optimized for accuracy.
  2. Test before enforcing — use simulation or audit mode first.
  3. Combine SITs with sensitivity labels for layered protection.
  4. Use trainable classifiers for complex or free-text documents.
  5. Continuously review detections in Content Explorer to fine-tune policies.

📚 Check Microsoft Documentation

For more detailed technical references, visit the official Microsoft Learn articles:

🧭 These pages provide up-to-date lists, XML structures, and API references for developers and administrators managing SITs in enterprise environments.


🏁 Final Thoughts

Sensitive Information Types (SITs) are the intelligence behind Microsoft Purview’s data protection ecosystem.
They enable automatic detection, labeling, and governance of sensitive information — ensuring your organization stays secure and compliant.

💡 If Sensitivity Labels are the wrappers, SITs are the detectors that tell Purview when to protect your data.


#MicrosoftPurview #SensitiveInformationTypes #DataClassification #InformationProtection #Compliance #Microsoft365 #DLP #MicrosoftSecurity

Leave a comment