Designing a Scalable, Resilient PI System Architecture
A robust PI System architecture keeps data flowing reliably, supports growth without rework, and remains maintainable under operational pressure (patching, outages, upgrades, changing data sources, and expanding use c...
Designing a Scalable, Resilient PI System Architecture
Meta description: Practical guidance for PI System architecture: building blocks, high availability patterns, DMZ zoning, multi-site scaling and common anti-patterns.
A robust PI System architecture keeps data flowing reliably, supports growth without rework, and remains maintainable under operational pressure (patching, outages, upgrades, changing data sources, and expanding use cases). This guide focuses on enduring design principles and proven patterns for PI Admins, PI Engineers, and OT/IT architects.
What “good architecture” means for PI
Good PI architecture is outcome-driven, not diagram-driven. Key outcomes:
- Data integrity: trustworthy historian (correct timestamps, no silent gaps, predictable backfill).
- Operational resilience: single-point failures (NIC, host, service, link) do not break business-critical functions.
- Performance headroom: add tags, users, analyses and visualisations without brittle tuning.
- Security by design: clear access boundaries, intentional data flows, and restricted admin interfaces. See Security, Identity & Compliance
- Maintainability: patching and upgrades with minimal downtime and low risk.
- Clear ownership: defined responsibilities, change control, and practical runbooks (see Running the PI System Day-to-Day: A PI Admin’s Playbook).
Architecture should make the right thing easy and the wrong thing hard.
Core architectural building blocks
Most PI deployments use the same functional blocks. Design choices are about placement, scaling and failure modes.
- Data ingestion layer (interfaces, connectors, edge)
- Sources: PLCs, DCS, SCADA, MQTT, files, APIs.
- Decisions: collector placement (local vs central), store‑and‑forward behaviour, tag naming, buffering and time synchronisation.
See: Data Ingestion & Integration
Practical note: collector placement is as much an operational and network decision as a technical one — match ownership and operational capability.
- PI Data Archive (historian services and storage)
- Consider tag count, event rate (current and 3–5 years), point configuration standards, backup/recovery objectives and HA approach.
- Plan disk layout, antivirus exclusions and peak query patterns early.
See: Keeping PI Fast, Stable, and Predictable at Scale
- Asset Framework (AF) and analytics
- AF is the semantic layer: models, templates, attributes and analyses.
- Decisions: AF server placement relative to PI‑DA and clients, SQL Server HA strategy, boundaries between enterprise vs site models, and governance for templates and analyses. Treat AF as critical infrastructure.
- Visualisation (PI Vision)
- Drives concurrency and load patterns. Consider user population, authentication, concurrency capacity, and dependency on AF and PI‑DA. Decide whether Vision requires the same availability level as PI‑DA.
- Identity, security and access control
- Define trusted boundaries, align with enterprise identity, implement least privilege and auditability, and plan remote access without “permanent quick fixes”. Start with Securing the AVEVA PI System in Modern Enterprise Environments
- Supporting services: time, DNS, certificates, backups, monitoring
- These are essential: time drift, DNS failures, certificate/TLS issues and untested backups are common root causes of PI outages.
High availability patterns
HA should be driven by business outcomes and realistic failure scenarios: detection speed, failover process and recovery without data loss matter more than mere duplication.
High availability for PI Data Archive (PI‑DA)
- Goals: continue collection through outages, maintain key read access, prevent split‑brain and minimise recovery effort.
- Common patterns:
- Buffering/store‑and‑forward at collectors: the most effective first line of resilience. Ensure buffer capacity, robust time synchronisation and tested backfill behaviour.
- PI Data Archive collective: provides archive redundancy but increases operational complexity (change discipline, runbooks, replication monitoring). Use only when uptime targets justify the overhead.
- Separate collection from user query tiers: keep collectors and PI‑DA focused on uninterrupted collection; allow user tiers to be less available if acceptable.
High availability for PI AF
- Focus on SQL Server availability (Always On, clustering) and redundant AF application tier. SQL HA meets strict targets but requires DBA maturity (patch sequencing, quorum, backups). Test AF changes for performance in non‑production.
High availability for PI Vision
- Typical approach: multiple Vision servers behind a load balancer, with consistent authentication and session handling. Design load‑balancer health checks to reflect actual service health. Separate scaling for concurrency from redundancy for outages.
Multi‑site and enterprise designs
Multi‑site failures often stem from forcing a single pattern across sites with different networks, maturity and latency.
Key early questions:
- Do sites need autonomy if the WAN fails?
- Where are historians located: per site, regional or central?
- Are models and naming standards global or local?
- Who owns uptime: central IT, site OT, or a shared model?
Common patterns
- Site historian + enterprise aggregation: each site collects locally, then shares/replicates enterprise data. Pros: local resilience and performance. Cons: more servers, governance required.
- Central historian with remote collection: fewer historians to manage but the WAN is critical; buffering mitigates but does not eliminate risk.
- Hybrid: local collectors with buffering, central PI‑DA/AF for enterprise access, and local visualisation where needed.
Governance for AF
- Avoid “one global model” that doesn’t fit or “every site on its own” that breaks reporting. Standardise template conventions and attribute names for enterprise KPIs, allow controlled local extensions, and apply change control for shared templates.
Network zoning and DMZ patterns
Zoning determines how robust and secure the PI architecture will be. For every connection, document: initiator, protocol/ports, crossed zones, business justification and monitoring.
Refer to Security, Identity & Compliance for fundamentals.
Typical zones
- Control network/OT: PLCs, DCS, SCADA — low tolerance for change.
- Operations zone: OT servers, collectors, site PI infrastructure.
- DMZ: controlled boundary for OT↔IT sharing.
- Corporate IT: enterprise apps and reporting.
- External/partner access: vendor and remote support.
DMZ patterns
- Pattern A — Data egress via controlled intermediary: avoid IT‑initiated inbound connections to OT; publish/replicate data across a monitored path. Reduces blast radius and simplifies firewall justification.
- Pattern B — Dedicated DMZ services for visualisation/access: possible, but adds certificate, authentication and patch complexity. Explicitly define allowed data flows and directions.
- Pattern C — “It’s just a dashboard” (avoid): exposing user‑facing services without boundaries leads to permissive firewalls, shared accounts and permanent exceptions. Treat user services as full applications with lifecycle management.
Scaling considerations from day one
Most PI performance issues are architectural. Design with headroom.
- Understand growth and peaks: capture current event rates, tag counts, planned instrumentation, consumer growth and peak patterns (shift changes, reports).
- Separate workloads: avoid co‑locating PI‑DA, AF, Vision and SQL on one server. Separation improves fault isolation and troubleshooting.
- Storage and I/O often limit PI‑DA: plan archive disk layout, antivirus exclusions, backup/snapshot strategies and query patterns.
- Plan for consumer scale, not just tag scale: simultaneous PI Vision users, heavy AF analyses and poorly designed integrations create load.
- Design for operability: patching strategies, representative test environments, rollback capability and monitoring that alerts before users do. See: Keeping PI Fast, Stable, and Predictable at Scale and Running the PI System Day-to-Day: A PI Admin’s Playbook
Architectural anti‑patterns
Avoid recurring fragile or risky patterns:
- “One server to rule them all”: simplifies initial deployment but complicates patching, scaling and troubleshooting.
- No or untested buffering: expect WAN and network outages; untested buffering leads to data loss and frantic recovery.
- DMZ as a dumping ground: an improvised DMZ signals unclear boundaries and technical debt.
- Mixing environments and roles: dev/test systems on production infrastructure cause unpredictable load and change risk.
- Designing around a single expert: centralised tribal knowledge slows recovery and project delivery. Mitigate with diagrams, runbooks, change control and cross‑training. See Careers in the AVEVA PI System World
- “Just open the firewall”: undocumented temporary exceptions become permanent liabilities.
- Treating naming standards and AF modelling as optional: skip these and you pay later in unusable dashboards and integrations.
When to involve an integrator
Consider an external integrator when:
- Crossing major boundaries (OT↔IT, multi‑site rollouts, DMZ design).
- You require strict HA, uptime targets and auditability.
- Migrating legacy PI versions or consolidating historians.
- Internal teams lack time to prototype and test failure scenarios.
- Multiple stakeholders (network, cybersecurity, DBAs, OT) need structured design authority.
A good integrator delivers an operable design and transfers knowledge.
Getting help without handing over ownership
For short‑term expertise (design reviews, HA runbooks, migrations, DMZ patterns) find specialist integrators or browse https://piadmin.com/directory. Choose partners who document decisions, transfer knowledge and align with your operational ownership of patching, monitoring, backups and on‑call.
Pragmatic reference architecture
A common resilient baseline:
- OT/site zone: local collectors with buffering, clear ownership and patching model.
- PI Data Archive tier: sized for event rate and query load; HA where justified; tested backups.
- AF + SQL tier: aligned with enterprise SQL standards; HA where required; AF governance.
- PI Vision tier: separated and scalable; load‑balanced if necessary; designed authentication.
- Network boundaries: explicit data flows; minimal inbound connections to OT; DMZ only when justified.
- Operations tooling: monitoring, runbooks and a change process tied to day‑to‑day admin practice.
Adjust to constraints (WAN reliability, cyber policy, support capability, budget).
