Container Monitor with Prometheus

Container Monitor with Prometheus

Container Monitor is a Prometheus-compatible interface to performance metrics for all your instances on Triton.

Container Monitor allows you to use the Prometheus-compatible ecosystem of monitoring solutions to visualize the status of your applications and track alerts for your performance thresholds. Learn more about Prometheus, an open source application that can read Container Monitor metrics, at prometheus.io.

Any solution that can read a prometheus-compatible metrics exporter can use Container Monitor, but the following configuration documentation is written for Prometheus itself.

If you haven't already, add an SSH key to your Triton account. You can choose to upload a key or we'll make one for you. This key is used to authenticate you with all of your account's containers and Joyent's APIs, including Container Monitor. Container Monitor uses this key to identify and authenticate your access to the Container Monitor interfaces.

Container Monitor exposes a metrics endpoint for every instance in your account. Rather than manually configuring (and reconfiguring) Prometheus for every instance, you can use the Triton service discovery configuration in Prometheus to automate it.

The Triton configuration block in the prometheus.yml file looks like the following:

- job_name: 'triton'
scheme: https
triton_sd_configs:
# The account username to use for discovering new target containers
- account: <string>
# The API is versioned, the current version is "1"
version: 1
# The DNS suffix which should be applied to target containers
# For Triton Public Cloud, this is cmon.<data center name>.triton.zone (example: cmon.us-sw-1.triton.zone)
dns_suffix: <string>
# The Triton discovery endpoint
# For Triton Public Cloud, this is cmon.<data center name>.triton.zone (the same value as dns_suffix)
endpoint: <string>
# TLS configuration.
tls_config:
ca_file: '<path to the CA file>'
cert_file: '<path to the cert file>'
key_file: '<path to the key file>'
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_triton_machine_alias]
target_label: instance

You must also enable Triton CNS in order to use Container Monitor. All new containers will get a Container Monitor CNAME record automatically. Similarly, proxy records will be added and removed when proxies come and go, along with corresponding CNAME records.

Each CNAME record represents a virtual Prometheus endpoint backed by a proxy.

To retrieve containers that can be scraped by a Prometheus server for metrics:

$ triton inst get <container>

For users running pre-existing Prometheus servers, the suggested service discovery mechanism will be leveraging file based service discovery in conjunction with our Prometheus Autopilot Pattern. That way the servers will have an equivalent experience to CloudAPI-based discovery without having to upgrade their Prometheus installation.

There are several metrics with a singular endpoint to learn more about your containers:

  • cpu_user_usage: User CPU utilization in nanoseconds

  • cpu_sys_usage: System CPU usage in nanoseconds

  • cpu_wait_time: CPU wait time in nanoseconds

  • load_average: Load average

  • mem_agg_usage: Aggregate memory usage in bytes

  • mem_limit: Memory limit in bytes

  • mem_swap: Swap in bytes

  • mem_swap_limit: Swap limit in bytes

  • mem_anon_alloc_fail: Anonymous allocation failure count

  • net_agg_packets_in: Aggregate inbound packets

  • net_agg_packets_out: Aggregate outbound packets

  • net_agg_bytes_in: Aggregate inbound bytes

  • net_agg_bytes_out: Aggregate outbound bytes

  • tcp_failed_connection_attempt_count: Failed TCP connection attempts

  • tcp_retransmitted_segment_count: Retransmitted TCP segments

  • tcp_duplicate_ack_count: Duplicate TCP ACK count

  • tcp_listen_drop_count: TCP listen drops. Connection refused because backlog full

  • tcp_listen_drop_Qzero_count: Total # of connections refused due to half-open queue (q0) full

  • tcp_half_open_drop_count: TCP connection dropped from a full half-open queue

  • tcp_retransmit_timeout_drop_count: TCP connection dropped due to retransmit timeout

  • tcp_active_open_count: TCP active open connections

  • tcp_passive_open_count: TCP passive open connections

  • tcp_current_established_connections_total: TCP total established connections

  • vfs_bytes_read_count: VFS number of bytes read

  • vfs_bytes_written_count: VFS number of bytes written

  • vfs_read_operation_count: VFS number of read operations

  • vfs_write_operation_count: VFS number of write operations

  • vfs_wait_time_count: VFS cumulative wait (pre-service) time

  • vfs_wait_length_time_count: VFS cumulative wait length*time product

  • vfs_run_time_count: VFS cumulative run (pre-service) time

  • vfs_run_length_time_count: VFS cumulative run length*time product

  • vfs_elements_wait_state: VFS number of elements in wait state

  • vfs_elements_run_state: VFS number of elements in run state

  • zfs_used: zfs space used in bytes

  • zfs_available: zfs space available in bytes

  • time_of_day: System time in seconds since epoch