An Overview of Pyroclast

Pyroclast is a Platform as a Service for building realtime event-processing applications. In this guide, we'll walk through the concepts of Pyroclast from the ground up. We'll look at how event data is stored, processed, and queried. By the end, you'll have a good understanding of what Pyroclast is and how to build your own applications with it.


The big picture: Streaming as a Service

You can think of Pyroclast as a managed, high performance stream processor as a service to bridge the gap between your events and the rest of your architecture. Fire off events from any data source into our hosted storage. Spin up services to enrich and aggregate those events. Query those services using HTTP from your existing applications.




It's all about events

Pyroclast is a platform for working with events. In essence, an event is a single unit of data representing something that happened. It's a fact that's true as of a particular point in time. Every software system has events, whether or not they're called that by name.

{"event-type": "purchase-items",
 "timestamp": 1495564557000,
 "user": "mauvestorm3630",
 "first-name": "Ralphie",
 "last-name": "Bertilo",
 "payment": {
     "payment-token": "n1qbO45"
 }
 "cost": {
     "currency": "USD",
     "value": 132.50
 }
"items": [
     {
         "item-id": "B000M8YMEU",
         "quantity": 3
     }
 ]
}
      

Pyroclast stores events

Whenever an event happens in your system, you can send it to Pyroclast to durably store it for you. Pyroclast accepts events from our API language bindings, data extraction tools we call Plumes, or over a plain HTTP API. The language bindings offer comfortable libraries of functions to start sending events from your applications with minimal effort. Plumes are useful for extracting data from mediums that aren't running applications - like files, database transaction logs, and OS monitors. The HTTP API enables access from any language.

Events can be sent in realtime as they are generated in your application, or from previous history. Plumes are particularly helpful for extracting events that already happened. These tools make it relatively straightforward to backfill historical events into Pyroclast.

Pyroclast processes events

Pyroclast lets you build services ontop of your event streams. You can design services to enrich your events, join them, deduplicate them, aggregate over them with queries, and more. When we say "service", we mean like any ordinary web service that you'd stand up for your architecture. It's a web server, it speaks HTTP, it has documented endpoints for talking to it, it's authenticated - the whole nine yards. There's a bonus though.

While these will feel like ordinary services from the outside, on the inside there's something incredible happening. Every time a service is started with Pyroclast, it automatically and invisibly pairs with a very high performance stream processor. The backing stream processor for every service translates your service definition into something that it can execute, and begins reading incoming events. Pyroclast's backing stream processor operates with extraordinarily fast performance, and scales up to handle billions of events with ease. Latency in processing is on the order of milliseconds.

While your service reads events and answers requests through its endpoints from your application, on the inside the stream processor that it pairs with performs the work you specified, seamlessly recovering from any internal problems.

Pyroclast queries events

Pyroclast services are good for a number of streaming tasks, but one in particular stands out and deserves attention - the ability to ask a query about all of your events. Service definitions may include queries about all events that pass through them. These aren't normal queries like what you'd see in a database, though.

With Pyroclast, queries are incrementally updated. As events pass over them, the answer to the query is adjusted, factoring in the newfound information. When you want to know the answer to a query, you send an HTTP request to the service to get the answer. Since queries are incrementally updated, no work is performed at the time of reading. This means queries return instantaneously.


Services with Superpowers

Services in Pyroclast have a simple interface, resembling any other web service that you've worked with before. By invisibly pairing every service with a backend stream processor, Pyroclast makes it easy to work with your events in ways that previously required high amounts of engineering effort.


Share events without conflicts

When events are stored within Pyroclast, multiple services can non-destructively consume them without conflicts or contention. Pyroclast is able to achieve this without expensive locking or replication techniques. This approach enables teams to build various applications that do not depend on each other, eliminating contention across use cases and improving scalability.

Time travel

The hallmark attribute of an event is its timestamp. Pyroclast tracks both the event's occurrence time via the provided timestamp as well as the time that Pyroclast received the event. Pyroclast is able to leverage this information to let you travel through time. Services can "rewind" their event streams to the past, only factoring in event for particular regions of time. Time travel is the ultimate ability for working with events. It removes the barriers that make auditing and speculative queries intractable for most teams.

Incorporate new information

Sometimes you don't have all the information to interpret a series of events accurately - you might learn new information later that changes the way you view an event stream. Not only does Pyroclast let you travel through time, Pyroclast also lets you rewind history to the beginning. Launch services with updated configurations that reflect your new world view, and Pyroclast will optionally replay events from the very first moment your application started sending them.

Correct mistakes

In contrast to incorporating new information about how to interpret events, you might currently be interpreting events with incorrect information. Mistakes can be transparently fixed in the same with time travel approach. Instead of making corrections inside a database by hand, Pyroclast lets you launch another service side-by-side with the existing one. Compare results to ensure your fix worked, then spin down the old service. Pyroclast allows you to deploy fixes without downtime.


The Details

Now that you have a high level understanding of what events and what Pyroclast does with them, let's explore how exactly Pyroclast stores and computes them. We'll look closer at what's going on inside each service that you launch, too.


Topics

A topic is a named stream of events, typically representative of a feed or a category of information. When your data enters a topic, it is durably stored and replicated for a specified amount of time. Topics have several characteristics that make the highly suitable to event stream processing. We’ll go through each of these in turn.

Events are immutable

Once an event be written to a topic, it cannot be deleted unless the entire topic is deleted. This is called immutability. The topic can only be modified by appending more events to its tail. An event represents a fact of information about something that occurred. Facts are true as of a particular point in time, and cannot be modified or erased from history. This mode of thinking enables rich analysis of historical information, and maximizes knowledge about the present by retaining as much data about the past as possible.

Event retention

Events are stored in a topic for a specified duration - both in terms of number of gigabytes and number of days. While the events in a topic are immutable, they will be deleted when they go beyond their retention policy. This is a matter of practicability. Information about the past is typically only useful within a certain bound of time. For example, if the retention policy for a topic is to store events for no more than 10 days or 50 gigabytes, events that are either more than 10 days old, or events that are not within the most recent 50 gigabytes received will be deleted.

Replayable

The central idea behind Pyroclast is that you author services which read events from a topic and materialize a view of those events, which is consumed from an HTTP API. Since topics are fully immutable, and thus can be shared, topics may also be replayed by a service. We say that a service replays a topic when that service is started, reads same data from the topic, is stopped, and then started again. When you restart a service, you can specify where to in the topic events should start to be read from. You can either resume from the point in the topic where the service previously left off, or you can replay the topic from the beginning (or from any arbitrary point time).

This ability is incredibly useful for repairing services that had a bug, or a change in requirements. A brand new view of events can be materialized without disrupting the contents of the topic.

Chaining services

Just as you can write information to a topic using HTTP or Plumes, services which read information from a topic can transform events and write them back to a topic. Such types of services are loosely named "data pipelines". They’re given their name because they route events through a sequence of operations and out the other end to a new topic, similar to water flowing through a pipe. This is most typically used by having one main topic for ingestion, and several other "derived" topics which contain events that have been transformed from the initial topic. This encourages you to write small, focused services, and thus gain reusability.

It is possible to create a service that reads from a topic and writes back to the same topic, creating a cycle.

Conflict-free Sharing

Topics are different from traditional databases in that sharing their contents to different consumers is completely safe.

When considering which topic an event should be written to, it might be tempting to write it to multiple topics and to design a service per topic to each perform a different task. This is an anti-pattern that is discouraged in Pyroclast. Instead, write each event to exactly one topic and design several services that each consume from the same topic. Since topics are immutable, there is no penalty to sharing topics across as many services as you’d like. Fewer topics also means fewer sources of data to manage.


Services

While topics store events, services are responsible for processing those events to produce meaningful information. Services are definitions of how to transform, filter, and aggregate the events that it receives. Each service automatically and invisibly pairs itself with a backing high-performance stream processor. Services can both read from and write to topics.

Elements of a service

Designing a service involves creating a sequence of steps to be applied to events being read from a topic and choosing what to do with the result of those steps being applied. A service that reads from a topic can apply three functional changes to the events it receives: transform, filter, and aggregate.

Transformations

A transformation function takes an event, updates it in some way, then returns the updated event. An example of a transformation function is the Capitalize function. This function takes a key in an event which has the value of a string, then returns an updated event with the specified key’s value changed into a capitalized form.

Filters

A filter function takes an event and applies a predicate function to it. If the predicate evaluates to true, the event proceeds to the next step in the service. If the predicate evaluates to false, it is discarded. This is useful for extracting a particular selection of events in a topic that a service is interested in. Filter functions that one event, and either return that event, or no events.

Filter predicates can be negated, allowing only events that return a false predicate. Predicates can optionally throw an exception they fail to match on an event. This is useful when filters are being used as a sanity check, or a mechanism to detect malformed events. In those cases, it’s useful not only to not process those events, but to also throw an exception, which will in turn display the contents of the event that failed. In this way, filters can act as a debugger.

Aggregates

An aggregate function reduces a sequence of events into a smaller value representative of the entire topic. Aggregates, like sum, min, max, and so forth, are rolling computations whose value updates as more events are processed from a topic. Aggregates automatically roll up sequence of events for you from a topic, and do not operate on a single key within each event. Aggregates are specified by windowing type. A global window aggregates all events in a topic, irrespective of their timestamps. Other window types (fixed, sliding, and session) take into account a timestamp associated with each event to group aggregations over particular intervals of time.

Templated services

When a service is created, it needs to be given a name so that we can refer back to it over time. A service also needs to be created from a template. Templates are lightweight suggestions on how to structure the sequence of steps that a service will employ to reshape your events. Templates simply set up a few example steps to get you going. A template doesn’t lock you into a particular kind of service. You can change every aspect of how your service functions after it’s been created.

Data Pipeline

A data pipeline is a loose term for a service that reads events from a topic and writes its output events to another topic. When the template for this is chosen, the editor automatically sets up the sequence of steps to terminate in an output step.

Materialized View

A materialized view is a data structure that represents an aggregate of your data. This is the value that is accessed via an HTTP endpoint when a deployment has begun running, allowing you to extract information from your event stream out of Pyroclast. When this template is chosen, the editor automatically adds a final step to terminate in an aggregate.

Metrics

A metrics service operates on metadata of events from a designated input topic. When this template is chosen, the input task step has the "Meta topic?" checkbox automatically selected. When this service is deployed, instead of receiving a sequence of events, the service instead receives a meta-event. Meta-events contain the number of bytes in the ingested record, a timestamp when Pyroclast received the record, and the record itself. This is useful for writing auditing services that produce time-delineated aggregates of how much data was received when for particular attributes.

Manifest file

Clone a previously-saved service with the manifest option. This is useful for sharing services.


Deployed services

The running instantiation of a service is called a deployment. Deployments actively process events from a topic and update aggregates, which may be queried from HTTP endpoints. Deployments run within Pyroclast’s cloud.

Service Endpoints

So far, we’ve covered how to make a topic to ingest events, how to design a service that will change those events in some way, and how to launch that service. Endpoints are the means by which data is extracted from Pyroclast. Each aggregation in a service will receive an endpoint, such that when a deployment for that service is launched, invoking the endpoint will return the contents of the aggregate. This is similar to querying a database, except that the results to this sort of query are continuously computed and constantly available. Each aggregate gets its own endpoint, and endpoints are "stable" for a deployment. That means repeatedly deploying the production build of a service with an aggregate will maintain the same endpoint information, and downstream consumers who invoke that endpoint won’t need to change anything.

Runtime Exceptions

Sometimes services will try to apply a function to an event in an incompatible way, and an exception will be thrown. For example, you might choose the plus function to add 50 the value in key "score" of the event. If the event doesn’t have a value for that key, an exception will be thrown. Exceptions do not stop the deployment from proceeding. Each exception (up to 100) will be logged on the right side of the deployment screen, denoting the type of exception, the event that caused the exception, and it’s stacktrace. This information can be used to debug what went wrong, and you might modify and rebuild your deployment in response.

Metrics

Each running deployment maintains a set of metrics about how many events are being processed, and how latent they are. Pyroclast exposes the amount of records being received in the input step per second, the amount of records either being aggregated or output per second, and the 90th percentile of latency in milliseconds that it took each event to fully makes its way through all the steps.

Activating & Upgrading

After designing a service, the next thing you’ll want to do is launch it. Choose "Commit" in the upper right hand corner of the editing screen, then click on the Deployments bar the bottom of the screen. This will drop up a list of deployments that can be activated or upgraded. Before a deployment can take place, the service must be committed, and it must be in a valid state. If the service is invalid or has not been committed, the buttons will be disabled.

Choose "activate" (if this is the first time this deployment will run) or "upgrade" (available if this deployment had already been started). A modal will pop up confirming that you are clear for launch. You may either rebuild state, or resume from the last place this deployment left off - depending on the circumstances.

Rebuilding state from scratch

A deployment can always launch and rebuild its state, representing aggregates, from scratch. Rebuilding will begin this deployment at the very first event the input topic has received. You’d usually choose this option if you’re making a correction to an existing service that had been deployed.

Resuming a deployment

The other option that can be chosen is to resume this deployment. Resuming will pick back up from the last point where this deployment stopped processing. Pyroclast will only let you resume when it’s safe to do so. This button will be disabled if you changed any property that would cause an aggregate to be computed in a different way that would make it inconsistent with how it previously operated. This option is handy for avoiding recomputing the entire stream of records in a topic when picking back up from the last record would work just as well.