Open Source and Infrastructure Management
For some time now it has been increasing obvious that the future of Digital Infrastructure Management belongs to solutions based on Open Source Big Data platforms. This really shouldn’t come as a surprise. Whether monitoring networks, servers or applications, digital infrastructure management was arguably the original Big Data problem. Legacy solutions have long struggled to provide value at the scale required of their largest customers. Despite a few “neo legacy” solutions having tried to bridge the gap the last few years, the age of Big Data Infrastructure Management Solutions is clearly upon us.
Despite a few “neo legacy” solutions having tried to bridge the gap the last few years, the age of Big Data Infrastructure Management Solutions is clearly upon us.
I admit that I have been beating this drum for a while. And while I could point to the many vendors in this space who have struggled the last few years, or to the number of users who are busy exporting data from legacy systems and sending it to their “data lake” for reporting and analytics… I figure I can best make my case by demonstrating some of the things that are within the realm of the possible. So let’s take a look at some of value that can be provided by using Elastic Stack to collect and report on network flow data.
What is Flow Data?
Flow data is collected by network devices, such as routers and switches, by sampling network traffic. Information about each conversation, for example – the connection between my laptop and Koiossian’s web server as I type this article, is tracked and exported to a flow collector. The collector may process the data further or simply forward it to a data store. Flow data provides a lot of detail about the traffic traversing a network, and many insights can be gained if the right solution is used to collect and analyze it.
There are a number of Flow technologies, including Netflow (v5 and v9 are common), IPFIX (also known as Netflow v10) and sFlow (implemented by switches and works at layer-2). There are also a handful of vendor-specific flow implementations, which are closely related to Netflow.
Flow data provides a lot of detail about the traffic traversing a network, and many insights can be gained if the right solution is used to collect and analyze it.
To extract the most value it is important that any Flow collection solution be able to normalize the data from the various disparate flow sources, without losing any of the original information in the flow record. This will allow for common reporting and analytics to be applied to the entirety of the data, without losing the ability to take advantage of things like sFlow’s counter samples.
What is Elastic Stack?
Elastic Stack is a suite of open source solutions developed and managed by Elastic. While these technologies can be applied to a large number of use-cases, from business intelligence to scouting professional sports teams, it is right at-home in world of Digital Infrastructure Management.
Elastic Stack is right at-home in world of Digital Infrastructure Management.
Elastic Stack includes the following primary components, which can be leveraged to build a first-class Flow collection and analytics solution.
- Elasticsearch – a distributed, JSON-based search and analytics engine designed for horizontal scalability, maximum reliability, and easy management. Elasticsearch is where your flow data will be stored.
- Logstash – a dynamic data collection pipeline with an extensible plugin ecosystem and strong Elasticsearch synergy. Logstash will serve as the flow collector in the solution, and will also provide additional processing capabilities. is one of the primary mechanisms to collect data and get it into Elasticsearch.
- Kibana – gives shape to your data and is the extensible user interface for configuring and managing all aspects of the Elastic Stack. This is how you view your data.
As mentioned earlier, my goal is to introduce you to the realm of the possible. While I am not going to cover all of the little details necessary to create the solution, I do want to highlight a few things.
Logstash serves as the flow collector, and is configured to receive Netflow v5, Netflow v9, IPFIX and sFlow. Codecs are used to parse the raw records and provide the initial events fields. Flow data records refer to the source and destination addresses associated with a network connection. For example, when you request a webpage from your PC, your PC is the source and the web server is the destination. When the web server responds, it is the source and your PC is the destination. In some scenarios this can be useful. However the context more frequently desired is to consider this traffic as a conversation between a client and a server. An algorithm, implemented within the Logstash configuration, accurately determines the client and server of any conversation.
The Data Model
At the heart of all Koiossian solutions is the KOIOS Data Model. The event fields from all flow sources have been mapped to the model’s Flow-related types, allowing reporting and analysis from a set of common Kibana dashboards. Index templates enable Elasticsearch to assign the correct data types to each field as dictated by the model.
Exploring the Data
The dashboards built within Kibana have been designed to work together as single application for navigating through the collected network flow data, allowing users to drill into specific areas of interest for more detailed analysis.
The Flow Overview dashboard provides a summary of the basic traffic data. This is a good place to set any filters before drilling into the data. Kibana has been configured to pin filters by default so they persist when navigating between dashboards. This provides a more application-like experience for the user. The navigation pane at the top allows the user to switch between dashboards easily.
You will notice the charts refer to clients and servers rather than source and destination as discussed earlier.
Another concept introduced here is Traffic Locality. The solution is designed to determine whether a conversation is between systems within the private network or whether one, or even both, of the participants are public addresses. This is useful for a number of use-cases, especially those related to possible security issues.
Related to Traffic Locality, the Geo Location of all public participants is determined. This information is summarized here as well as providing the basis for the Geo Analyzer dashboard.
The Traffic Analyzer allows users to easily identify top conversations between clients and servers based on the traffic volume per service in bytes or packets.
You will notice that most of the traffic in our lab is related to the various Elastic Stack components currently running. (Although it does look like we caught someone in the middle of a Facetime call 🙂
Although we are serving this data from a multi-node cluster, this is only due to some other testing we were performing. The solution you see here can easily be implemented on a single node with as little as 4GB of memory. Of course this is Elasticsearch, so it can also scale… MASSIVELY. Ingesting multiple of terabytes of network flow data daily is well within the realm of the possible.
Ingesting multiple terabytes of network flow data daily is well within the realm of the possible.
The Graph Analyzer allows users to graphically browse the connections between hosts both inside and outside of their network. The circles are servers and rectangles are clients. The size of the circle indicates the volume of data in bytes, and the thickness of the line represents the number of connections (i.e. the number of flows).
The Flow Analyzer provides another view of the data using a Sankey diagram.There are some very specific use-cases related to SD-WAN and Quality-of-Service management, where Sankey diagrams can be very insightful, both of which are topics for future articles.
The Geo Analyzer provides insights into traffic that travels between private networks and the public Internet. The reason two maps are used is to provide a quick visual indication into the volume of traffic which may be malicious. If all traffic was valid bi-directional conversations, such as someone accessing a webpage or fetching email, the maps would look identical. This dashboard makes it easy to drill-down and identify suspected bad-actors.
After filtering for only that traffic coming from the public Internet to one of our public facing IP addresses, you can see above a broad range of access attempts from all across the globe. We do serve a bit of content from this location, so some of these connections are valid. However, if looking at something other than web traffic, for example telnet, you can clearly see some questionable access attempts.
Raw Flow Records
After using the other dashboards to navigate and pivot through the data, the underlying raw flow records can be accessed and exported for any other required actions. Navigating here from the Geo Analyzer shows us the records for the each of those suspect telnet access attempts.
Clearing the filters reveals all of the flow records received in the selected time period. In case you are wondering that is about 8,000 raw flow records over 30 minutes, from 20 physical and 8 virtual interfaces. Expanding the time period to the last 7 days returns ~1.7 million raw flow records and all of the dashboards are able to render in about 5 seconds or less.
If you need deeper insights into the usage of your network infrastructure Flow data may provide exactly the help that you are searching for. Open Source Big Data platforms, such as the Elastic Stack based solution described here, allow you to collect, visualize and analyze your flow data, at larger scale, lower cost and with greater flexibility than commercial offerings.
If your organization lacks the skills, experience or simply the time to invest in implementing such solutions, the Koiossian team is ready to help. We have a deep understanding of the challenges of operating today’ complex and dynamic digital infrastructures, and we are experts at applying open source big data solutions to overcome these challenges. Our KOIOS Reference Architecture, which includes the Network Flow capabilities described above, is designed to provide immediate insight into the operational state of your environment. Additionally we can work with you to customize the solution to meet any of your specific requirements.
If you would like to further explore the capabilities of our solutions, don’t hesitate to contact us.