Mainframe Log Analytics in Dynatrace: From Ingest APIs to SYSLOG Parsing Magic
Who This Article Is For
This article is intentionally written for multiple audiences:
- Engineers - you will see concrete APIs, real configuration paths, working parsers, and practical examples. Nothing here is abstract or theoretical.
- Architects - you will see a platform-native, scalable approach built on Dynatrace OpenPipeline and Grail — not a collection of scripts or one-off integrations.
- Decision makers - you will see that this is not a “hack” or a niche solution, but a clean, extensible observability model that fits naturally into an enterprise observability strategy.
The Journey We Will Take
The structure of this article mirrors the natural lifecycle of observability data in Dynatrace:
- Ingesting logs and metrics using official Dynatrace APIs
- Understanding how data flows through OpenPipeline into Grail
- Querying data using Dynatrace Query Language (DQL)
- Dealing with Mainframe SYSLOG and its structural challenges
- Applying Dynatrace Pattern Language (DPL) to build a parser that is resilient to Mainframe-specific edge cases
By the end of this article:
- Mainframe logs will be searchable and analyzable in Dynatrace,
- SYSLOG messages will be parsed into meaningful, structured fields,
- and the foundation will be in place for automation, security workflows, and advanced analytics.
Why This Matters
Mainframe environments generate exceptionally rich operational data, but historically this data has been:
- difficult to access,
- hard to parse,
- and expensive to operationalize.
The combination of z-Rays and Dynatrace changes this dynamic:
- lowering the barrier to entry,
- improving data quality,
- and enabling fast, iterative value creation.
What follows is not theory — it is a practical, production-grade approach you can build on.
Log and Metric Ingestion Fundamentals in Dynatrace
Before we touch OpenPipeline, Grail, DQL, or DPL, we need to establish how data actually enters Dynatrace.
This matters because everything that follows — parsing, analytics, automation — depends on ingestion semantics. If you understand ingestion, the rest of the platform becomes predictable rather than mysterious.
Dynatrace exposes two primary ingestion primitives relevant for Mainframe observability:
- Log Ingest API
- Metric Ingest API
They solve different problems, and using both together is not only valid — it is often essential.
Log Ingest API – what it is and when to use it
The Log Ingest API is designed to ingest unstructured or semi-structured event data into Dynatrace.
Official documentation: https://docs.dynatrace.com/docs/discover-dynatrace/references/dynatrace-api/environment-api/log-monitoring-v2
Typical use cases include:
- application logs,
- system logs,
- audit and security events,
- Mainframe SYSLOG streams.
At a conceptual level, the Log Ingest API:
- accepts log records over HTTPS,
- allows attaching metadata and attributes,
- forwards the raw content into Dynatrace’s ingestion pipeline.
Key characteristics:
- logs are stored as events, not metrics,
- full message content is preserved,
- structure can be applied later using parsing and enrichment.
This is especially important for Mainframe environments:
- SYSLOG messages are dense and irregular,
- semantics are often implicit (message IDs, positional meaning),
- premature normalization destroys valuable context.
Rule of thumb
If you need the entire message and want to interpret it later — use the Log Ingest API.
Metric Ingest API – why metrics still matter
While logs explain what happened, metrics explain how the system behaves over time.
The Metric Ingest API is used to ingest:
- numeric time-series data,
- at high frequency,
- with minimal overhead.
Official documentation: https://docs.dynatrace.com/docs/discover-dynatrace/references/dynatrace-api/environment-api/metric-v2/post-ingest-metrics
Typical Mainframe-related metric examples:
- throughput and transaction rates,
- queue depths,
- subsystem response times,
- capacity and utilization indicators.
Why metrics remain essential:
- they scale extremely well,
- hey are cheap to store and query,
- they are ideal for baselining, alerting, and anomaly detection.
Metrics ingested via this API:
- become native time series in Dynatrace,
- can be correlated with logs and events,
- are directly usable by Dynatrace AI.
Logs vs metrics: complementary signals, not alternatives
A common mistake in observability projects is treating logs and metrics as competing approaches.
In reality:
- logs explain behavior,
- metrics quantify behavior.
On Mainframe platforms, this distinction is even more important:
- a single SYSLOG message can explain why a metric spiked,
- a metric anomaly can guide you to the right subset of logs.
Dynatrace is designed to handle both signals natively — which is why understanding both ingest paths upfront is critical.
Checkpoint
At this point, you should understand:
- how logs and metrics enter Dynatrace,
- why Mainframe observability requires both,
- and why ingestion is deliberately flexible.
Now we follow the data further.
From Ingest API to Grail: Understanding the OpenPipeline Flow
Once logs or metrics are ingested, the next natural question is: What happens to this data inside Dynatrace?
The answer is OpenPipeline.
What OpenPipeline really does
OpenPipeline is Dynatrace’s data processing layer that sits between ingestion and storage.
Official documentation: https://docs.dynatrace.com/docs/platform/openpipeline
Conceptually, OpenPipeline:
- receives raw data from multiple ingestion sources,
- applies optional processing steps, thu it’s super flexible on whatever 3rd party data ingested
- and routes the data into Grail Data Lakehouse,
- performs at super large scale
It is important to understand what OpenPipeline is not:
- it is not a custom script engine,
- it is not a vendor-specific black box,
- it does not require writing code.
Instead, it is a configuration-driven pipeline designed for:
- parsing,
- enrichment,
- clasification,
- normalization,
- and routing of observability data.
Why OpenPipeline matters for Mainframe logs
For Mainframe observability, OpenPipeline is a critical enabler because it allows you to:
- ingest raw SYSLOG messages without losing information,
- defer parsing decisions until you understand the data,
- apply robust, schema-aware parsing later using DPL,
- enrich events with derived fields and classifications.
This separation is intentional:
- ingestion stays simple and resilient,
- transformation becomes explicit and controlled,
- changes can be applied without touching the source system.
In other words:
OpenPipeline allows you to evolve your understanding of Mainframe logs without re-ingesting or rewriting data.
From OpenPipeline to Grail
After processing, data flows into Grail, Dynatrace’s unified data lakehouse.
At this stage:
- logs, metrics, events, and traces coexist,
- everything is queryable using DQL,
- data becomes available for analytics, dashboards, and automation.
We will dive into Grail in detail in the next section — but for now, the key takeaway is this:
OpenPipeline is the bridge that turns raw Mainframe data into analyzable, enterprise-grade observability signals.
Checkpoint
You should now understand:
- where OpenPipeline fits in the Dynatrace architecture,
- why it is essential for Mainframe log analytics,
- and how it prepares data for Grail and DQL.
Grail Data Lakehouse: The Foundation for Our-case Mainframe Observability
Mainframe Observability Data Flow (High-Level)
Before we go deeper into Grail, it helps to visualize the full data path we are building:
Mainframe
|
| (SYSLOG, system metrics, subsystem events)
|
z-Rays Agent
|
| HTTPS (Logs & Metrics Ingest APIs)
|
Dynatrace Ingest Layer
|
| OpenPipeline
| - optional parsing
| - enrichment
| - routing
|
Grail Data Lakehouse
|
| DQL / DPL / Analytics / Automation
This flow is intentionally simple:
- ingestion stays lightweight,
- intelligence is applied centrally,
- and all data ends up in a single, queryable foundation.
With ingestion and OpenPipeline in place, all Mainframe data ultimately lands in Grail.
Understanding Grail is essential, because it defines what is possible in Dynatrace — not only for logs, but for metrics, events, traces, and business context.
Official documentation: https://docs.dynatrace.com/docs/platform/grail
What Grail actually is
Grail is Dynatrace’s unified data lakehouse for observability data.
Unlike traditional log storage or metric databases, Grail:
- stores logs, metrics, events, traces, and business data together,
- keeps them in a schema-flexible but strongly queryable form,
- is optimized for high-volume, high-cardinality data.
For our Mainframe use case, this matters because:
- SYSLOG data is irregular and verbose,
- metrics are dense time series,
- security and audit events require long retention and fast search.
Grail is designed to handle all of this in one place, without forcing you to choose between “log tooling” and “metric tooling”.
Why a data lakehouse matters for Mainframe data
Historically, Mainframe observability relied on:
- specialized log viewers,
- siloed performance databases,
- offline reports and dumps.
This fragmentation made correlation expensive and slow.
A lakehouse model changes the equation:
- raw data is stored once,
- structure can evolve over time,
- analytics and queries operate on a shared foundation.
In practical terms:
- a SYSLOG event and a performance metric can be queried together,
- security events can be correlated with system load,
- anomalies can be detected across signal types.
For Mainframe platforms — where cause and effect are rarely visible in a single signal — this is critical.
Grail and scale: why this works for Mainframe
Mainframe environments generate:
- millions of log lines per day,
- high-frequency metrics,
- bursts of security and audit events.
Grail is built to:
- ingest at this scale,
- store data efficiently,
- and query it interactively.
Key properties relevant to our case:
- columnar storage optimized for analytics,
- late binding of schema (parse when you need it),
- fast, cost-efficient queries even on large datasets.
This is what allows us to:
- ingest raw SYSLOG safely,
- delay parsing decisions,
- iterate on DPL parsers without re-ingesting data.
Grail as a foundation for automation and AI
Because all data in Grail is:
- time-aligned,
- queryable,
- and semantically enrichable,
it becomes a natural foundation for:
- automation workflows,
- security response actions,
- AI/ML-driven anomaly detection.
This is especially important for Mainframe observability:
- the volume of data exceeds human inspection,
- early signals are often weak and distributed,
- automation must be driven by correlated context, not single alerts.
Grail enables this by design.
Checkpoint
At this point, you should understand:
- what Grail is and why it matters,
- why a lakehouse model is particularly well-suited for Mainframe data,
- and how it enables both analytics and automation downstream.
With this foundation in place, we can finally start asking questions of the
Querying Mainframe Data with DQL
With all data landing in Grail, Dynatrace Query Language (DQL) becomes the primary way to explore, understand, and operationalize Mainframe observability data.
Official documentation: https://docs.dynatrace.com/docs/platform/grail/dynatrace-query-language
What DQL is — and what it is not
DQL is:
- a declarative query language,
- designed for observability data,
- optimized for large-scale analytics.
DQL is not:
- SQL with a different syntax,
- a scripting language,
- tied to a single data type.
Instead, it allows you to:
- search logs,
- aggregate metrics,
- correlate events,
- and prepare data for further transformation.
Thinking in DQL
DQL encourages a pipeline-based mindset:
- select a dataset,
- filter what matters,
- extract or compute fields,
- aggregate or visualize.
For Mainframe logs, this is crucial:
- messages are noisy
- structure is implicit,
- meaning emerges through filtering and grouping.
DQL lets you do this incrementally, without committing to a fixed schema.
Preparing data for parsing and automation
Before applying DPL parsing:
- DQL helps identify patterns,
- isolate message classes,
- validate assumptions.
This reduces risk and makes parsing deterministic rather than guess-based.
DQL is how you learn your data before you formalize it.
Checkpoint
At this stage:
- Mainframe logs are searchable,
- metrics are correlated,
- and DQL gives you full visibility into what you actually have.
In the next section, we will move from exploration to practice and show concrete DQL examples — including screenshots — before tackling SYSLOG and DPL parsing.
DQL in Practice – Example Queries
Sample notebook helps to report the most errorprone Sysnames and LPARs.
- Grouping SYSLOG by message
fetch logs|filter SOURCE == "logstream"
| filter sysname != "Other"
| fields MFID, sysname, message_id, message
| filterOut isNull(MFID)
| summarize amount = count(), by:{message}
| sort amount desc - grouping errors by subsystem
fetch logs, from: now()-1h, to: now()
| filter SOURCE == "logstream" //and MSGTYPE == "ERROR"
| fields MDIF, LPAR, sysname, MSGTYPE
| summarize amount = count(), by:{MDIF, LPAR, sysname, MSGTYPE}
| sort amount desc - time-based analysis
fetch logs , from: now()-7d, to: now()
| filter SOURCE == "logstream" and MSGTYPE == "ERROR"
| fields MFID, LPAR, sysname, message_id
| summarize amount = count(), by:{MFID, LPAR, sysname,message_id}
| sort amount desc
To get the details, you’d need to see sample notebook available in my GitHub public repo: https://github.com/s2mich/dt-playground/blob/main/mainframe/openpipeline/syslog-processor.dql
Mainframe SYSLOG: Why This Format Is Nightmare for Regex (and others)
Mainframe SYSLOG is fundamentally different from modern log formats – it appears deceptively simple at first glance. It's just text, right? Lines of system messages that should be easy to parse with regular expressions. This assumption has led countless organizations down a path of frustration, custom development, and ultimately, incomplete observability.The reality is far more complex. Mainframe SYSLOG represents decades of evolution across multiple system components (MVS, JES2, TSO, CICS, DB2, IMS), each with its own logging conventions, message formats, and quirks. What looks like a structured log format is actually a collection of different formats masquerading as one.
Ask any mainframe engineer about SYSLOG format, and you'll hear: "It's fixed-position."
This is partially true and wholly misleading. Yes, some fields are fixed-position:
But the challenge is hidden in the details:
- Optional fields - Many fields appear only in certain contexts
- Component-specific formats - JES2, TSO, and MVS messages follow different conventions
- Variable message IDs - Not all message IDs are 8 characters (IEF403I vs $HASP373 vs IKJ56455I)
- Continuation lines - Multi-line messages with context-dependent formatting
- Timestamp variations - At least three different timestamp formats across components
Each of these lines requires different parsing logic. The job name appears in different positions. The timestamps use different formats. The last message spans multiple lines with indentation indicating continuation.
Why Regular Expressions Fail?
The typical first attempt at parsing Mainframe SYSLOG involves regex patterns. This seems reasonable - after all, we can see patterns in the data. Position-based extration will fail, optional fields make parsing complex, multiline cases make it even harder to gain.
It’s nothing bad with regex – this is supercool for simple cases – but it’s too short for SYSLOG advantage.
Event so-called coll vendors like Elastic will have a challenge to do so. In Elastic, each line arrives as a separate event. The aggregate filter must:
- Detect continuation lines (indentation pattern)
- Buffer lines until message complete
- Reconstruct the full message
- Parse the structured data within
This approach is fragile, slow, and requires deep knowledge of each message type's continuation patterns.
The Magic of DPL: Building a Resilient SYSLOG Parser
Dynatrace took a fundamentally different approach. Instead of treating SYSLOG as generic text requiring pattern matching, DPL understands Mainframe-native data structures and semantics.
Dynatrace Pattern Language (DPL) allows building robust parsers.
DPL doesn't just match patterns - it understands what it's parsing. That’s a way beyound the SYSLOG parsing use-case.
Example DPL processor:
parse content, """(
(DIGIT:reply_id SPACE)?
(('$' | '+')? ([A-Z]{3,4}[AEIWS]{0,1}[0-9]{3,5}[A-Z]*):message_id SPACE)
(([A-Z0-9]{4,}):sysname (SPACE | ','))?
((TIMESTAMP('MMM dd')):date SPACE)?
((TIMESTAMP('hh:mm:ss')):time SPACE)?
([a-zA-Z0-9]{8,}:job_id ':'? SPACE)?
(
(LD LF?)
(LD LF?)?
(LD LF?)?
(LD LF?)?
):message
(EOF)
)"""
| fieldsAdd mf.sysname = if(isNull(sysname), "Other", else: sysname)
| fieldsAdd mf.sysname = if(startsWith(message_id, "ICH"), "RCAF", else: mf.sysname)
| fieldsAdd mf.sysname = if(startsWith(message_id, "IEA"), "z/OS", else: mf.sysname)
| fieldsAdd mf.sysname = if(startsWith(message_id, "IEF"), "z/OS", else: mf.sysname)
| fieldsAdd mf.sysname = if(startsWith(message_id, "IOE"), "ZFS", else: mf.sysname)
| fieldsAdd mf.sysname = if(startsWith(message_id, "HZS"), "CHECKIBM", else: mf.sysname)
| fieldsAdd mf.sysname = if(startsWith(message_id, "HAS"), "JES", else: mf.sysname)
| fieldsAdd mf.pipeline_check == "Hit"
| fieldsRename sysname = mf.sysname
This parser:
- tolerates missing fields,
- extracts semantics,
- enriches events,
- avoids brittle regex logic.
Go to pipeline UI (https://your-tenant-id.apps.dynatrace.com/ui/apps/dynatrace.settings/settings/openpipeline-logs/pipelines/) in your tenant to access the settings for logs.
Create custom pipeline for M/F syslog – it’s one time action.
And then paste DPL processor definition from our example.
Note: Very recent version of log processor is available here: https://github.com/s2mich/dt-playground/blob/main/mainframe/openpipeline/syslog-processor.dql
Some conslusions
Real Mainframe environments generate millions of SYSLOG records daily. Parsing efficiency matters.
Elastic Stack:
- Must check multiple Grok patterns per line
- Aggregate filter requires buffering and state management
- Custom Painless scripts add overhead
- Performance degrades with pattern complexity
- Typical throughput: 5,000-10,000 messages/second per node
Dynatrace DPL:
- Single parsing pass with built-in intelligence
- Efficient state management for continuation lines
- Native Mainframe optimizations
- Consistent performance regardless of message complexity
- Typical throughput: 50,000+ messages/second per node
10x performance difference translates to lower infrastructure costs and real-time observability.
Let's calculate the real cost of "free" open-source parsing:
Elastic Stack Implementation:
- Initial development: 3 engineer-months
- Pattern coverage: ~60% of message types
- Ongoing maintenance: 0.5 FTE
- Infrastructure: 2x capacity for parsing overhead
- Annual cost: ~$200K+ (engineering + infrastructure)
- Limitations: Poor continuation handling, no semantic understanding
Dynatrace DPL:
- Initial setup: 2 days
- Pattern coverage: ~95% out-of-box
- Ongoing maintenance: Minimal (catalog updates included)
- Infrastructure: Minimal overhead
- Annual cost: License fee
- Advantages: Complete parsing, semantic understanding, job tracking
The "free" tool costs significantly more in total cost of ownership.
Parsing Mainframe SYSLOG isn't impossible with generic tools - it's just impractical. You can spend months building custom parsers, maintaining pattern libraries, and handling edge cases. Or you can use a tool purpose-built for the job.
Dynatrace DPL represents decades of Mainframe expertise codified into parsing intelligence. It understands:
- Every major message format
- Component interactions
- Job lifecycles
- Error patterns
- Resource topology
This isn't pattern matching - it's Mainframe fluency.
The question isn't whether DPL parses better. It's whether you want to spend your time building parsers or gaining insights from your Mainframe infrastructure.
Getting the Data: Streaming SYSLOG with the z-Rays Agent
Collecting SYSLOG in enterprise environments is often technically complex, particularly in legacy and mission-critical systems where changes to logging configurations carry operational risk. As a result, logs are frequently available only locally, inconsistently formatted, or entirely inaccessible for central analysis, security monitoring, or compliance use cases. z-Rays addresses this gap by providing a low-impact, system-level agent that enables reliable SYSLOG collection without requiring invasive reconfiguration of existing platforms, making critical operational data available where traditional mechanisms fall short.
z-Rays agent provides:
- reliable SYSLOG streaming,
- hundreds of Mainframe metrics,
- native Dynatrace integration.
Z-Rays Installation & Deployment
Z-Rays is designed for rapid deployment in mainframe environments. The installation is lightweight, structured, and can be completed in just a few simple steps with minimal impact on existing systems.
Setup Overview
- z/OS – RACF setup, PARMLIB & PROCLIB updates
- OMVS – Directory creation, file deployment, permissions
- DB2 – Required grants
- Configuration – Parameter update & address space start
Mainframe Infrastructure Observability in Dynatrace
z-Rays provides a native Dynatrace app for comprehensive Mainframe infrastructure observability.
Coming soon: Article 3 will dive into infrastructure monitoring, dashboards, and capacity management.
Stay tuned – Soon I'll demonstrate a straightforward approach to Mainframe observability.
Sneak peek: Screenshot from the z-Rays App for Dynatrace below:
Vice President of Omnilogy
Mainframe expert.