LMDB Backend¶
LMDB Backend utilizes python lmdb package wrapper for LMDB data store.
This backend provides following characteristics:
storage of all event structures
querying by all query data parameters
optimization for latest events querying
optimization for time series querying
fragmentation of time series entries based on event types
in memory caching of event registration with periodic disk persistence
configurable condition rules for filtering event by their content
Storage models¶
Storage of events in LMDB Backend is based on models:
key-value storage
time series storage
event id based reference storage
Key-value storage¶
Key-value storage is used as storage model for persisting latest events for each event type. Configuration provides parameter defining which events, based on their event type, are stored in key-value store. During backend initialization, content of whole key-value storage is loaded in memory and continuously updated with new event registrations. This enables quick access to event data without need for disk read operations during query executions.
Time series storage¶
Time series storage provide storage model where all events are orderly stored based on their event timestamp or event source timestamp. Configuration enables definition of multiple sets defined by event types and time source. For each configured set, single series storage is created. This fragmentation of stored events into independent time series enables fine tuning and optimizations based on expected query types. For general cases, usage of single set of configured event types, which includes all event that should be persisted, is suggested. Events without source timestamp can’t be stored in storage configured for source timestamp storage. Querying of events is based on sequential checking of query conditions which occurred in time segments constrained by query data.
event id based reference storage¶
In addition to key-value and time series event storage, additional event references are stored in dedicated event id based reference storage. These references are kept only for events available in key-value or time series storage. Once event is removed (e.g. due to configured limits) and not accessible in other storages, its reference is also removed from reference storage. This storage enables quick and sequential access to events based on their event id.
Event filtering¶
During registration and event querying, this backend discards all event occurrences which do not satisfy configured conditions. Conditions are defined and applied based on event type. These conditions provide plausibility of condition composition (by operators all and any) and testing based on event content.
Query planning¶
During event querying, simple query planning is done based on following steps:
if query data filters only latest events for each queried event type, then only key-value storage is searched for possible event instances
for all other queries, time series partition is searched, in order as they are defined in configuration, for first series that contains at least one possible event type constrained by query data - only that time series storage partition is used as source of event instances returned by query
if no time series storage could be found, empty list is returned as query result
In case of querying event sequences ordered by event id and based on starting event id, event id based reference storage is used (backends query_flushed method).
Warning
Query which utilizes time series search will retrieve only events from single partition. Thus, time series partitioning should be used in accordance with expected queries as additional optimization technique. It is strongly advised to configure only single timestamp and single source timestamp partition as initial configuration.
Disk persisting¶
Registration of new events doesn’t initiate immediate disk writes. All changes
are cached in memory and periodically written to disk as part of single
transaction which includes all storages. This period is defined by
configuration (flush_period
). Writing of memory cache is also part of
backend’s standard closing procedure. Single transaction responsible for
writing all memory caches to disk also include cleanup operation which enables
deletion of oldest entries in time series storages. Each time series storage
can have its own configuration defined limit which specifies event persistent
period based on event timestamp or event source timestamp, number of events in
partition or partitions approximated disk usage. event id based reference
storage is updated only during this periodic write transaction. Because of this,
queries utilizing event id reference storage will return only events that are
persisted on disk.
Limiting time series size¶
Each time series storage can optionally limit number of stored events. This functionality is associated with disk persisting procedure and is part of same database transaction. Supported limits include:
min_entries
Minimum number of events preserved in database despite of other limits. This property can be used as “low water mark” - number of entries always available.
max_entries
Maximum number of event preserver in database. This property can be used as “high water mark” - number of entries will never exceed this number.
duration
Time in seconds representing maximum duration between current time and time used as time series key. All entries which exceed this duration will be removed.
size
Size in bytes allocated for associated time series. Once time series exceeds this storage size, some of the oldest entries are removed. Number of removed entries is calculated based on average entry size and limiting storage size.
Configuration¶
$schema: "https://json-schema.org/draft/2020-12/schema"
$id: "hat-event://backends/lmdb.yaml"
title: LMDB backend
type: object
required:
- db_path
- identifier
- flush_period
- cleanup_period
- conditions
- latest
- timeseries
properties:
db_path:
type: string
identifier:
type:
- "null"
- string
flush_period:
type: number
cleanup_period:
type: number
conditions:
type: array
items:
type: object
required:
- subscriptions
- condition
properties:
subscriptions:
$ref: "hat-event://backends/lmdb.yaml#/$defs/event_types"
condition:
$ref: "hat-event://backends/lmdb.yaml#/$defs/condition"
latest:
type: object
required:
- subscriptions
properties:
subscriptions:
$ref: "hat-event://backends/lmdb.yaml#/$defs/event_types"
timeseries:
type: array
items:
type: object
required:
- order_by
- subscriptions
properties:
order_by:
enum:
- TIMESTAMP
- SOURCE_TIMESTAMP
subscriptions:
$ref: "hat-event://backends/lmdb.yaml#/$defs/event_types"
limit:
$ref: "hat-event://backends/lmdb.yaml#/$defs/limit"
timeseries_max_results:
type: integer
default: 4096
$defs:
event_types:
type: array
items:
type: array
items:
type: string
limit:
type: object
properties:
min_entries:
type: number
description: |
number of entries kept despite of other limits
max_entries:
type: number
description: |
maximum number of entries
duration:
type: number
description: |
limit for the persisted history based on keys
expressed as duration in seconds
size:
type: number
description: |
memory consumption size in bytes that triggers
additional cleanup based on average entry size
condition:
oneOf:
- $ref: "hat-event://backends/lmdb.yaml#/$defs/conditions/all"
- $ref: "hat-event://backends/lmdb.yaml#/$defs/conditions/any"
- $ref: "hat-event://backends/lmdb.yaml#/$defs/conditions/json"
conditions:
all:
type: object
required:
- type
- conditions
properties:
type:
const: all
conditions:
type: array
items:
$ref: "hat-event://backends/lmdb.yaml#/$defs/condition"
any:
type: object
required:
- type
- conditions
properties:
type:
const: any
conditions:
type: array
items:
$ref: "hat-event://backends/lmdb.yaml#/$defs/condition"
json:
type: object
required:
- type
properties:
type:
const: json
data_path:
$ref: "hat-event://backends/lmdb.yaml#/$defs/path"
data_type:
enum:
- "null"
- boolean
- string
- number
- array
- object
data_value: {}
path:
oneOf:
- type: string
- type: integer
- type: array
items:
$ref: "hat-event://backends/lmdb.yaml#/$defs/path"