Wednesday, September 10, 2014

ELK and Elasticsearch

Distributed Architecture with Cloud Computing

As the number of applications designed to run in the cloud increases, the distributed architecture has become more main stream. Instead of a monolithic application that scales vertically, more applications are built to scale horizontally and many are even aware of the cloud environment they are in.  These applications are deployed with clusters of components over a network of compute nodes, each specialized for its specific goal. These components constantly communicate with each other as they carry out the business functions for the application.

This distributed design allowed the component to be monitored through infrastructure monitoring tools typically used by Ops team. The nature of distributed architecture means there are now many more processes to watch, logs to collect. The value of single pane of glass to see what is going on at the overall application level has never been greater.

As the information collected at the infrastructure level reveals deeper insight of the running application, the activity between Devs an Ops are now more intertwined. Ops team need to work closely with Devs to help understand the production status through the collected information, and Devs need to be able to put the information from different processes together to interpret the result. Those who can adapt to the new working dynamics can build successful product in the cloud era.

Devs and Ops together need a tool that is horizontally scalable, powerful enough to process all the data efficiently without sacrificing the user experience, a UI that is flexible and easy to use. More importantly, with the explosion of data the cost of total ownership needs to be easily understood and estimated.

ELK stack appeals to the team who are proficient enough with deployment and self-monitoring service. Its built on top of a production tested lucene. Elasticsearch and Marvel both have the smooth out-of-box experience. The decoupling among logstash, elasticsearc, kibana and Marvel provides the most flexibility for deployment, something that always holds the highest value among Ops teams. Elasticsearch has the capability of scale without sacrificing the power of search. Its REST API is a must-have for the cloud era. 

Developer-centric team, often found in early stage startups, might end up spending more time than they deem necessary to work out the details of the ELK deployment that does not come out of the box. The examples are:

  • Tougher out-of-box experience for logstash agent than that with elastic search server, even though both are supposed to be a service
  • The programming experience during the configuration of logstash
  • User still need to dig around the internet to find the right grok patterns for common platform components
  • No out-of-box security, requires the knowledge or research on setting up reverse proxy.
Comparing to SaaS like Loggly, ELK stack still comes with the cost of hosting and maintenance. For example, Some teams probably will need choose to use a monitoring-as-a-service to avoid the monitoring inception problem and ensure a much higher SLA needed for the ELK stack. 

Log analysis has been a challenging problem to solve, and it has become harder than ever with the explosion of processes in Cloud Computing industry. The Devs and Ops see their goal as solving the production issues and keep systems running. They will build and tinker with tools if necessary but it is not considered as the primary job function. These users see a product like ELK be the essential part of their toolset, and are willing to put high value on services that will enable them to spend less time doing the chores and more time following the insights. With the ever growing community behind elastic search and log stash, Elasticsearch can become the thought leader in the community, drive discussion and foster innovation that leverages ELK stack. In addition to support license and monitoring license, there can also be revenue from training and consulting.

At the same time, this is already a competitive market in big data archive, search and analysis on machine generated data. Since Lucene is of Apache Software License, there is not a barrier of entry for using the technology. The Apache Solr is one of the examples (a two-year old feature comparison can be found here). As Cloud Computing market continues to grow, the need for a log/eventing system will as well. Other factors like the cost of compute, disk, or UI development will continue moving to lower the
 barrier of entry to form a new end-to-end solution, encouraging new entrants to enter the market by attacking the problem from another angle. For example, there will always be the appeal of logging-as-a-service type of product like Loggly, which has the pay-as-you-go model and maintenance free for the users. By nature, the early adopters of ELK will always seek for better alternatives. 

In summary, with the community behind ELK stack, teams formed organically from the main contributors to the open source projects and a clear go-to-market strategy, Elasticsearch as a company does have the wind behind its back to grow user base, increase adoption and usages of the ever popular ELK stack.

Beyond ELK and Democratize the Google Search
Given the existing user experience of ELK, designed to appear to the Devs and Ops, and the fact that the technology behind Lucene is for a more general purpose, it might be worth asking the question of what other customer Elasticsearch might be able to serve.

Google is known for its simplicity to use.  There are many teams willing to push their email data to Google, sometimes violating company policy, because it is much easier to find related email and get their job done than Outlook or Exchange server. There will always be a need to search at local level with a limited scope and yet there is no clear winner in this category like what Google has for the internet search. 

Can there be a product with the vision of "Democratizing the Google Search" to "Search UI to every company,  organization and maybe even home"? How big would that market be?

If we consider a license and support fee on average $50 per person in work force, given the latest number of 150 million working population from Bureau of Labor Statistics, the total market value can be estimated at 150 million * $50 = $7.5 billion.

Assuming one third would use a SaaS solution, charged based on volume, and assuming on average each person will have 10 GB charged by $0.20/GB/month, the total market value can be estimated at $3 billion.  To clarify, this is different from the online storage like Box, Dropbox. The product is more like a search plugin for Yahoo! or, where the user can search at ease just like the Gmail users.