Jan 10, 2025
Designing a Rule Engine for Industrial Automation
Master industrial rule engines: from simple alerts to complex multi-device workflows. Learn how Proxus combines visual rules with C# scripting to handle 10,000+ rules per second with zero allocation overhead.
Industrial automation projects typically start with a simple requirement:
"Send an alarm if this tag exceeds that value."
But real plants quickly outgrow simple threshold checks. Within weeks, customers ask for:
- Multi‑step workflows (IF X THEN Y, THEN Z)
- Time-based dependencies (IF X for >5 minutes THEN...)
- Dependencies across multiple lines or sites (IF Line1.OEE < 60% AND Line2.Running THEN...)
- Integration with ERP, CMMS, and ticketing systems (THEN create SAP maintenance order)
- Feedback loops and state machines (Rule A triggers Rule B, which triggers Rule A again?)
Most off-the-shelf rule engines falter here. They either offer simplicity (and shatter under complex logic) or complexity (and require a PhD to understand). Proxus takes a different approach: combining visual rules for the 80% case with code for the 20% case, all on a single, unified execution runtime that scales to 10,000+ rules per second.
The Problem with Traditional Rule Engines
Before diving into Proxus's solution, let's understand why rule engines are so hard:
Challenge #1: Scope Creep
A simple threshold rule grows into a state machine. What started as "alert when temp > 80°C" becomes "alert when temp > 80°C for >2 minutes, but not if we're in cooldown mode, but do alert if temp > 85°C immediately."
Challenge #2: Multi-Device Correlation
"Trigger alarm when Robot A is running AND Pressure Sensor B shows declining pressure AND not in maintenance window." This requires:
- Reading multiple devices
- Correlating their states
- Checking context (time windows, override flags)
- Triggering actions conditionally
Challenge #3: Performance at Scale
A plant with 5,000 devices and 2,000 rules needs to evaluate millions of conditions per second. A naive approach (query database for each rule, parse JSON) hits performance limits fast.
Challenge #4: Versioning & Deployment
When you update a rule, how do you:
- Test it without affecting production?
- Roll back if it causes issues?
- Track which version is running where?
Most rule engines offer poor answers.
Proxus Rule Engine: Visual + Code Unified
Proxus solves these by offering two complementary approaches on a single runtime:
1. Visual Rule Builder (No-Code)
For engineers without programming background, the visual rule builder provides an intuitive drag-and-drop interface. Think of it as "Excel for automation."
Building Blocks
Triggers (When to evaluate)
- Tag change: Rule triggers when a specific tag changes value
- Schedule: Rule triggers on time-based patterns (daily 8 AM, every hour, etc.)
- Event from external system: Rule triggers when MES publishes a message or webhook fires
- Manual trigger: Operators can trigger rules via UI for testing
Conditions (What to check)
- Simple comparisons:
Temperature > 80 - Range checks:
Humidity between 40% and 60% - Boolean logic:
(A > 10) AND (B < 5) OR (C == "Fault") - Time windows:
During 06:00-18:00orNot during maintenance_window - State checks:
If device status == "Running" - Thresholds with hysteresis: Prevents flickering (trigger at >80, clear at <75)
Actions (What to do)
- Notifications: Email, SMS, Slack, Teams
- PLC writes: Write values back to devices (turn on pump, close valve)
- MQTT publish: Send messages to the UNS for other systems to consume
- HTTP calls: Webhook integration (trigger webhook on external API)
- Create tickets: Automatic CMMS/SAP integration (create maintenance order, change request)
- Log event: Record to local database or ClickHouse
- Escalation: If not acknowledged in X minutes, escalate to higher tier
Example Visual Rule
TRIGGER: Tag "Line1.Robot.Temperature" changes
CONDITION:
IF Temperature > 85°C
AND Robot.Running == true
AND NOT (MaintenanceMode == true)
AND (Alert_Temperature_Cooldown < 5 minutes ago)
ACTION:
- Stop Robot (write 0 to "Line1.Robot.MotorCommand")
- Publish to MQTT: "ProxusMfg/Istanbul/Line1/alerts/TemperatureEmergency"
- Create SAP notification: "Emergency stop triggered - high temperature"
- Set flag: Alert_Temperature_Cooldown = now()
- Send Slack: "@production_team Robot stopped due to temperature" 2. C# Scripting for Power Users
When visual rules cannot express the logic, drop down to C# code. Proxus provides a well-defined SDK:
Why C#?
- Familiar: C# is widely used in industry
- Safe: Compiled, typed (no runtime interpretation errors)
- Fast: JIT-compiled to machine code
- Batteries included: LINQ, async/await, memory pools
Example: Multi-Device Correlation with ML
public class PredictiveMaintenanceRule : RuleScript
{
public async Task EvaluateAsync(RuleContext context)
{
// Get latest values for multiple devices
var vibration = context.GetTag("Production/Motor/Vibration");
var temperature = context.GetTag("Production/Motor/Temperature");
var age_hours = context.GetTag("Production/Motor/AgeHours");
// Complex logic: apply ML model
var anomaly_score = await context.AI.ScoreAsync(new[]
{
vibration, temperature, age_hours
});
if (anomaly_score > 0.85)
{
// Multi-step action
await context.Publish("alert/motor_failure_risk", new
{
score = anomaly_score,
timestamp = DateTime.UtcNow,
recommended_action = "Schedule maintenance within 48 hours"
});
// Create CMMS ticket
await context.External.CreateTicket(new TicketRequest
{
System = "SAP",
Type = "Maintenance",
Priority = "High",
Description = $"Motor failure predicted (ML score: {anomaly_score:P})"
});
}
}
} Auto-Injected Safety
Proxus automatically wraps your code with:
// Your code is wrapped like this:
try
{
// Your code executes here
// With automatic disposal of resources
}
catch (Exception ex)
{
logger.LogError($"Rule failed: {ex.Message}");
// Automatic alerting
} This means you cannot accidentally crash the platform or leave resources open.
Zero-Allocation Patterns
For rules that execute 1,000+ times per second, Proxus recommends pooling:
// Allocate once, reuse forever
private static readonly ArrayPool<byte> _pool = ArrayPool<byte>.Shared;
public async Task EvaluateAsync(RuleContext context)
{
// Rent from pool, not allocate
var buffer = _pool.Rent(1024);
try
{
// Use buffer
ProcessData(buffer);
}
finally
{
_pool.Return(buffer);
}
} This minimizes garbage collection pressure (critical for sub-millisecond latency rules).
3. A Single Execution Model
Here is the magic: both visual rules and C# scripts compile to the same actor-based runtime. This means:
Unified Performance
Visual rule: Compiled to bytecode → Actor execution engine
C# script: Compiled to IL → Actor execution engine
Result: Same performance profile, same execution semantics Horizontal Scalability
Single gateway: 10,000 rules/second ✓
2 gateways: 20,000 rules/second ✓
10 gateways: 100,000 rules/second ✓
Rules execute in parallel across gateways with zero coordination needed Clear Evolution Path
Start: Simple visual rule
Grow: Add conditions and time windows (still visual)
Scale: Extract complex logic to C# (reuse visual rule as scaffold)
Evolve: Full C# engine with caching and optimization No rewriting. No rip-and-replace. Just smooth evolution.
Advanced Use Cases
Use Case 1: OEE Calculation
Trigger: Every minute (on schedule)
Action: Compute OEE for each production line
Availability = (Planned Time - Downtime) / Planned Time
Performance = (Ideal Cycle Time × Pieces Produced) / Run Time
Quality = Good Pieces / Total Pieces
OEE = Availability × Performance × Quality
Result: Publish OEE to UNS for dashboards Use Case 2: Dynamic Threshold Based on Context
Trigger: Tag "Production.Temperature" changes
Condition (C# logic):
- If night shift: Temperature threshold = 75°C
- If day shift: Temperature threshold = 78°C
- If production line warming up: Threshold = 85°C (first 10 min)
- If maintenance mode: No alert (bypass rule)
Action: Alert only if exceeds context-aware threshold Use Case 3: Multi-Site Correlation
Trigger: Any line reports downtime
Action (C#):
1. Query all lines: Are 2+ lines down simultaneously?
2. If yes: Check if it's a utility failure (power, compressed air, water)
3. If utility failure confirmed: Escalate to site director
4. If specific to one line: Alert line supervisor
5. Log correlation for analytics Performance: 10,000+ Rules Per Second
How does Proxus achieve this?
1. Compiled Execution
Rules are compiled to native code, not interpreted.
2. Actor-Based Concurrency
Each rule runs in isolation. No locks, no contention.
3. Smart Caching
First evaluation: Read tags from device (slow)
Cached result: Read from in-memory cache (fast)
Invalidation: Cache cleared only when tag actually changes 4. Lazy Evaluation
If Condition A is false, Condition B and C are never evaluated.
5. Batch Processing
Multiple rules that depend on the same device are evaluated together, with a single device read.
Result: A modern laptop can evaluate 10,000 rules/second with <1 ms latency.
Security & Code Injection Prevention
Allowing users to upload C# code is risky. Proxus mitigates this via:
1. Namespace Blacklist
Dangerous namespaces are blocked:
- ❌
System.Reflection(code generation) - ❌
System.Net(external network access) - ❌
System.IO(file system access) - ✅ Whitelisted namespaces:
System.Linq,System.Collections,System.Text
2. Code Analysis
Before compilation, the C# code is analyzed for suspicious patterns:
- Infinite loops
- Allocations exceeding limits
- Calls to blacklisted APIs
3. Sandboxing
Each C# rule runs in its own AppDomain with resource limits:
- Memory: 512 MB max
- Execution time: 5 seconds max (timeout)
- CPU affinity: Constrained to specific cores
4. Audit Trail
All rule uploads, modifications, and executions are logged with user attribution.
Versioning & Deployment Strategy
Rules evolve. The question is: how do you manage versions safely?
Proxus Versioning Model
Rule: "Emergency Stop - Temperature"
Version 1 (2025-01-10):
Threshold = 85°C
Status: ACTIVE (all gateways)
Version 2 (2025-01-15):
Threshold = 82°C (more conservative)
Status: STAGING (test gateway only)
Testing for 7 days...
Version 3 (2025-01-22):
Threshold = 80°C (even more conservative)
Status: APPROVED (but not deployed yet)
Ready to deploy on user approval
Version 2 (2025-01-20):
Status: ROLLED_BACK (bug found, reverted to v1) Safe Deployment
- Write rule on central server
- Test on designated gateway
- Approve after validation
- Deploy to production gateways (can schedule for off-hours)
- Monitor execution for anomalies
- Rollback instantly if issues detected
Integration with External Systems
Rules do not live in isolation. They trigger actions in:
Webhooks
Action: HTTP POST to external API
Example: When production exceeds target
POST https://api.supplier.com/orders/create
Body: { "product_id": "X", "quantity": 100, "due_date": "2025-01-20" } Message Brokers
Action: Publish to Kafka topic
Example: Real-time anomaly detection
Topic: "manufacturing/anomalies"
Message: { "device_id": "motor_1", "anomaly_type": "vibration", "score": 0.92 }
Consumers: ML models, dashboards, archival systems ERP Integration
Action: Create SAP maintenance order
Proxus automatically:
- Authenticates to SAP (OAuth/SAML)
- Creates notification
- Links to equipment master data
- Assigns to maintenance team FAQ: Rule Engine Myths
Q: Can I use visual rules for everything?
A: ~80% of use cases. The remaining 20% (complex correlations, ML) need C#. Start visual; evolve to code only where needed.
Q: What if my rule has a bug?
A: Rules execute in sandboxes with timeouts. A runaway rule cannot crash the platform. Plus, you can rollback instantly to a previous version.
Q: Can I run thousands of rules?
A: Yes. Proxus can handle 10,000+ rules/second per gateway. Distribute across gateways for higher throughput.
Q: How do I test rules?
A: Deploy to a test gateway, execute with replay data, validate results. Rules are versioned, so you can test in parallel with production.
Q: Can rules call external APIs?
A: Yes, via webhook actions. But be cautious: external APIs can fail or be slow. Use timeouts and circuit breakers.
Migration from Legacy Rule Engines
If you are migrating from Wonderware, FactoryTalk, or other platforms:
Phase 1: Audit Existing Rules (1 week)
- Export all rules from legacy system
- Document logic and actions
- Identify patterns
Phase 2: Re-implement in Proxus (2-4 weeks)
- Start with most-used rules (80/20 principle)
- Implement in visual builder first
- Migrate complex rules to C# as needed
Phase 3: Validation (1-2 weeks)
- Run both systems in parallel
- Compare outputs
- Validate edge cases
Phase 4: Cutover (1 day)
- Switch to Proxus
- Rollback procedure ready
- Monitor closely
Best Practices
- Start simple: Use visual rules for initial implementations
- Version everything: Never modify rules; create new versions
- Test rigorously: Use test gateways before production rollout
- Monitor executions: Track rule latency, errors, and trigger frequency
- Document logic: Add comments explaining why, not just what
- Use timeouts: All C# rules have built-in timeouts; respect them
- Iterate based on data: Tune thresholds based on production telemetry
- Use sandboxing: Contain complex rules to prevent cascading failures
Getting Started
Simple Rule (5 minutes)
- Open Proxus UI
- Click "New Rule"
- Select trigger: Tag change
- Select condition: Temperature > 80
- Select action: Send email
- Deploy to edge gateway
Advanced Rule (30 minutes)
- Create C# rule script
- Implement multi-device correlation
- Test locally
- Deploy to test gateway
- Monitor for 24 hours
- Promote to production
Conclusion: Rules as First-Class Citizens
In Proxus, rules are not an afterthought bolted onto the platform. They are first-class citizens: versioned, tested, monitored, and optimized. Whether you need simple alerts or complex industrial workflows, the unified visual + code approach scales from hobby projects to mission-critical automation running thousands of rules per second.
Ready to automate? Deploy your first rule or explore complex rule engine patterns in our guide.
For advanced consultation on rule design and optimization, contact our automation team.