A black and white photo of many triangles

Enriching and Externalizing Gateway Metrics to Splunk

Overview

Gateway metrics are a critical piece of intel to determine the health of Services and traffic throughput on an API Gateway. Using Layer7 API Management as an example, there are a few ways we can obtain this information: Policy Manager: Dashboard, PAPIM (Precision API Monitoring), sending this information to a monitoring solution via Gateway Metrics Tactical Assertion (version 9.3 and prior), and enabling Service Metrics (version 9.4 onward). These methods would provide standard Gateway metrics data such as number of requests, gateway processing latency, routing latency, error in policies, and error in services. Since the Gateway plays an integral role in our customer’s infrastructure, often this data may not be enough to provide us API level insights, such as user/client authentication latency, authentication methods, API status code, requestor information, etc. In this article, we will demonstrate how we can enrich Gateway metrics, off-box Gateway metrics to Splunk, and build Splunk dashboards.

Implementation Samples

In this article, we will build out the following Splunk Dashboard which displays at-a-glance view of the API Gateways and Services that runs on them. This data is very useful to help us analyze and troubleshoot abnormalities that occurs during runtime:

  • Traffic throughput and request status
  • Service status code
  • Latency (Gateway, Routing, and Authentication) Min, Max, and Average Latencies
  • Message Sizes
Splunk Dashboard with API Gateway Runtime Stats

How Do We Do This?

The diagram below illustrates the components involved and the flow of data. The components involved include the API Gateway, SSG custom logger, Splunk universal forwarder and Splunk Server. We have designed the solution with efficiency in mind to write a single log entry per Gateway inbounded request to a dedicated custom logger, a.k.a. custom log sink. By doing this, we eliminate the costly process to filter out irrelevant data on Splunk server. These entries are written to the local disk and will incur some I/O and it is recommended to ensure data delivery in the event of server and network outages. Once a server or network is recovered, Splunk forwarder can resume from the last log sent position. Alternatively, an efficient option is to send log events from API Gateway to syslog via UDP and have syslog transport data to Splunk server. When comes to data transport, network transport is preferred over writing to disk. Something to consider, from Splunk documentation, the primary method for Splunk to receive network events is via syslog. Use TCP for data transmission reliability or UDP for performance.

Visualization of components and flow of data

Implementation:

High-level steps:

  1. Install Splunk Enterprise Server and configure inbound ports to accept universal forwarder
  2. Install Splunk universal forwarder in the API Gateway(s)
  3. Create dedicated custom logger in Gateway
  4. Create policy to write entries to custom log
  5. Configure Splunk forwarder to ingest custom log
  6. Build/Load Splunk dashboard

Install and configure Splunk Enterprise Server:

  1. On your Linux server, download Splunk rpm file. i.e. Splunk-7.2.1-be11b2c46e23-linux-2.6-x86_64.rpm
  2. Change file permission: chmod 744
  3. Install Splunk: rpm -i
  4. Splunk will be installed in /opt/Splunk, cd to directory and start Splunk: ./Splunk start
  5. To bypass “Terms and Conditions” prompt run, Splunk start –accept-license to automatically accept license
  6. Splunk default ports are 8000 for Browser Web UI and 9997 for data. Ensure both of these ports are open to accept connection on your firewall rules.
  7. Make sure Splunk listens to port 9997 to receive data. You can set this from Web UI under Setting Forwarding and Receiving Receive
    Data.
  8. In CentOS, to list current open ports run: firewall-cmd –list-ports

Install Splunk universal forwarder in each of the API Gateway:

  1. On your API Gateway, download Splunk forwarder rpm file. i.e. Splunkforwarder-7.2.1-be11b2c46e23-linux-2.6-x86_64.rpm
  2. Change file permission: chmod 744
  3. Install Splunk forwarder: rpm -i
  4. Splunk forwarder will be installed in /opt/Splunkforwarder, cd to directory /opt/Splunkforwarder/bin/ and start Splunk forwarder: ./Splunk start –accept-license
  5. To Register/add Splunk server run: ./Splunk add forward-server :9997
  6. For changes to take in effect, restart Splunk forwarder run: /opt/Splunkforwarder/bin/Splunk restart

Create dedicated customer logger in Gateway:

  1. Create a custom log (custommetrics) via Manage Log Sink with the following filters:
Filters for dedicated customer logger in Gateway
Filters for dedicated customer logger in Gateway

Create policy to write entries to custom log:

There are 2 ways we can write custom logs entries: 1. via global policy: Message Completed Policy or 2. via Service policy level. The instructions below will demonstrate the latter approach. Note, if the decision is to use Message Completed Policy to write custom logs, you will need to export Service-level context variable to the global policy via “Export Variables from Fragment” Assertion, this will expose manually set context variables, such as ${custom_metrics.isrequestcompleted}, to the global policy. And it can be accessed via a shared context variable ${request.shared.*} where * is the name of the exported context variable. In our case, ${request.shared.custom_metrics.isrequestcompleted}. An alternative to custom logger is to setup traffic logger and it will capture all service/endpoint traffic events on the Gateway by default.

  1. Open and modify Service policy, in our example: Test Metrics.

Our metrics gathering approach requires adding polices to individual Services. For these Services we need to track and initialize a set of metrics context variables as seen on Line 2-20. Notice, some of these context variables are manually set along the way to capture information such as Authentication status, Routing status, etc. The captured information is then processed by a generic process to write the log entry, Policy Fragment: Metrics Processing, Line: 29.17. A suggestion is to make this metrics gather policy in a form of Policy Fragment or Encapsulated Assertion for reusability. The policy used in this example also highlights a policy development pattern that provides policy flow control, flexibility, and reusability; and using this approach will makes metric gathering a lot easier. There are also other benefits with this approach to policy writing and we will explain this in a separate article.

2. Add an audit details to customer logger: custommetrics:

requesttimems=”${custom_metrics.request_time}”, configname=”${custom_metrics.hostname}”, apiname=”${custom_metrics.service_name}”, requesturl=”${custom_metrics.request_url}”, remoteip=”${custom_metrics.remote_ip}”, httpmethod=”${custom_metrics.request_method}”, responsecode=”${custom_metrics.status_code}”, isrequestcompeleted=”${custom_metrics.isRequestCompleted}”, isrequestauthorized=”${custom_metrics.isRequestAuthorized}”, elapsetime=”${custom_metrics.elapse_time}”, authenticationtime=”${custom_metrics.authentication_time}”, authenticationtype=”${custom_metrics.authentication_type}”, routingtotaltime=”${custom_metrics.total_routing_time}”, requestsize=”${custom_metrics.request_size}”, responsesize=”${custom_metrics.response_size}”

NOTE: We are using key value pair to define the log data structure and this would allow us to easily query in Splunk based on these reference keys.

3. Save and activate.

Configure Splunk forwarder to ingest custom log:

  1. In each API Gateway, set Splunk forwarder to monitor the log file and forward to Splunk server: ./Splunk add monitor
    /opt/SecureSpan/Gateway/node/default/var/logs/
    i.e. ./Splunk add monitor /opt/SecureSpan/Gateway/node/default/var/logs/portalmetrics_0_0.log
  2. For changes to take in effect, restart Splunk forwarder: /opt/Splunkforwarder/bin/Splunk restart
  3. Verify Splunk server is able to retrieve log events by doing a Search query for entries. i.e. source=”/opt/securespan/gateway/node/default/var/logs/custommetrics_0_0.log”

Build custom dashboard or import dashboards:

The above implementation samples of dashboard are listed in this attachment.