Does WebRTC need a change in governance?
Is it time to change the governance of WebRTC in order to keep it growing and flourishing?
Read MoreA modern approach to monitoring applications with a set of “off-the-shelf” open source tools.
[Nir Dotan is a system architect at Amdocs, and one of those people I enjoy working with.. Here’s what he has to say about monitoring large scale systems.]
If you have software application running in production, you surely want to monitor it, in order to make sure that it is doing what it’s supposed to. Because if it isn’t, then someone is probably losing money, and it may very well be your operation.
It seems that Monitoring made its penetration into the IT world from the direction of its next door neighbor, the network family. These 2 families may be moving in together, because it seems that their real-estate agent, Mr Virtual Machine, sold them the same house (or should I say leased?) but that’s a different topic altogether.
Back to the kingdom of network elements, where SNMP and PING are king and queen, it is often a complex and expensive operation to setup effective monitoring, especially if it’s a large network with lots of proprietary equipment.
In the IT world, things are actually getting easier and cheaper to monitor, despite or maybe thanks to the migration of the industry to highly distributed architectures and the penetration of cloud.
If you have basic development skills, and just a little bit of Linux background, you can setup a pretty effective monitoring system with relatively little effort, no license costs (FOSS), and most importantly, you might enjoy the journey. I certainly did or actually am still.
The applications are providing services to our customers. The first thing that we want to make sure is that they are available to their users. We also need to verify that their response time is reasonable, error rates are low, and that backlogs are not growing in our asynchronous queues, because if they are growing too rapidly, we may not be able to catch up, and the enemy once again, is not a CPU or disk, the enemy is poor customer experience which must be avoided.
Having said this, of course we do want to monitor our OS and HW. Applications cannot serve customers well if they don’t have sufficient resources.
There are many FOSS monitoring tools out there. The first problem with many of them is that they simply do not scale, so look out for that.
The second problem is that apparently, there’s no single tool that does everything that is needed and excels in it. In my onion at least 2 different tools, if not more, are required.
So what are the center pieces of required functionality?
For collection and aggregation, I would go with Etsy’s StatsD, as a matter of fact I would take a close look at anything that Etsy is using or developing in the DevOps area, they’re quite a leader in this industry and have many devoted followers.
You can use the StatsD client from your application code in most programing languages, and send out your metrics to the StatsD aggregator server as a UDP message, with minimal impact on your application’s performance. Sending a metric is as easy as (in PHP)
StatsD::increment("grue.dinners");
Be sure to read all about it in Etsy’s blog http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Now that we have metrics, we can finally do something interesting and get to the central piece of this post: Graphite, which integrates natively to StatsD .
Graphite is a highly scalable product which specializes in efficient storage, trend calculations and graph presentations in its own webapp UI. In addition it exposes a simple yet very robust graph rendering API, which you can accessed from other systems.
Graphite is great, it is very popular, and there’s a ton of information and tips that you can find online. The best thing about graphite in my opinion is that it is very simple and intuitive to use. I could easily write 10 more pages about Graphite, but instead I’ll refer you to the best resources that I’ve come across, that is, in addition to the reference documentation.
Jason Dixon has valuable tips and insights, which he blogs about. I also recommend following Jason’s installation instructions webcast. Once you’re up to speed and really want to understand how the product works, read this article written by Chris Davis, the master himself who’s behind Graphite.
While Graphite is highly regarded as a great product, it falls short in 2 areas:
The Graphite UI does not look good. It lacks the trendy dashboard widgets concept, and does not make use of modern JS charting libraries that people have come to expect.
That’s not a very big problem, because there are many “pretty faces” that you can add on top of Graphite. Here are my favorites:
Graphite lacks alerting capabilities. For some people, alerting is the most important aspect of monitoring. While I favor the approach of taking it easy on the alerts, you do need to be able to alert in critical situations. I must admit that I did not find a perfect FOSS solution to this. The most common solution is Nagios or one of its forks. Be warned that they are all licensed by various types of GPL, which at least in my organization, is hard to get through the legal department with.
If you do use Nagios, just for status and alerting, there’s no real need to start monitoring all your servers, which will quickly get you to Nagios’ configuration nightmare. All you have to do is monitor the Graphite http API, and for this I recommend going with Jason Dixon’s version of check_graphite plugin. Read all about it in his blog post.
Is it time to change the governance of WebRTC in order to keep it growing and flourishing?
Read MoreRTC@Scale is Facebook’s virtual WebRTC event, covering current and future topics. Here’s the summary for RTC@Scale 2024 so you can pick and choose the relevant ones for you.
Read More