Category: Programming

Influx Telegraf and inputs.exec

I’m a fan of Influxdb for capturing data over time. Coupling it with Grafana and interesting dashboards come to life.

Part of Influx’s tool set is Telegraf, their data collection tool. It comes with a slew of data input and output plugins that are reasonably easy to configure and use. I use two of them fairly regularly, inputs.snmp and inputs.exec. The inputs.snmp plugin uses the long standing SNMP protocol to pull data from network devices. Configuration is fairly straight forward. Here’s a sample for collecting data from a network switch:

[[inputs.snmp]]
  agents = ["NAME_OR_IP"]
  version = 2
  community = "COMMUNITY"
  timeout = "60s"

  [[inputs.snmp.field]]
    oid = "RFC1213-MIB::sysUpTime.0"
    name = "uptime"

  [[inputs.snmp.field]]
    oid = "RFC1213-MIB::sysName.0"
    name = "source"
    is_tag = true

  [[inputs.snmp.table]]
    oid = "IF-MIB::ifTable"
    name = "interface"
    inherit_tags = ["source"]

    [[inputs.snmp.table.field]]
      oid = "IF-MIB::ifDescr"
      name = "ifDescr"
      is_tag = true

Change NAME_OR_IP to the device name / IP address and the COMMUNITY to the configured SNMP community on the device and Telegraf will pull data from the switch every 60 seconds.

I put one of these configuration files in the

\etc\telegraf\telegraf.d

directory for each device. I use the device name as the file name. So for network switch ns1, the configuration file is

\etc\telegraf\telegraf.d\ns1.conf

At home, the network has 4 switches and there are 4 .conf files in the telegraf.d directory. The inputs.snmp plugin handles all the .conf files and processes the data from all the network devices as expected.

The second Telegraf plugin I often use is inputs.exec. This will launch a program and collect the output to send to the influx database. CSV, JSON, etc. all work to feed the Influx engine.

A typical configuration file looks like:

[[inputs.exec]]
  commands = [
    "/usr/local/bin/purpleair_json.py https://www.purpleair.com/data.json?show=DEVICEID&key=APIKEY"
  ]

  interval = "60s"
  timeout = "10s"
  data_format = "json"
  name_suffix = "_purpleair"
  tag_keys = [
    "ID",
  ]

In this case, the exec will run the /usr/local/bin/purpleair_json.py program and capture the data from a PurpleAir device every 60 seconds.

The problem is that the inputs.exec plugin doesn’t allow for multiple instances as with the inputs.snmp plugin. If there are more than one .conf file with inputs.exec, only the last one read by telegraf will be used. As such, more than one program cannot be used to feed via telegraf into influxdb. Rather annoying.

To get around this, I create another instance of the telegraf service. That includes a new systemd service file, a separate /etc/telegraf_EXECNAME folder and supporting configuration files.
In /lib/systemd/system/telegraf_EXECNAME.service:

[Unit]
Description=The plugin-driven server agent for reporting metrics into InfluxDB
Documentation=https://github.com/influxdata/telegraf
After=network.target

[Service]
EnvironmentFile=-/etc/default/telegraf_EXECNAME
User=telegraf
ExecStart=/usr/bin/telegraf -config /etc/telegraf_EXECNAME/telegraf.conf -config-directory /etc/telegraf_EXECNAME/telegraf.d $TELEGRAF_OPTS
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartForceExitStatus=SIGPIPE
KillMode=control-group

[Install]
WantedBy=multi-user.target

In the /etc/systemd/system/multi-user.target.wants directory, a symbolic link to the new services file:

cd /etc/systemd/system/multi-user.target.wants/
ln -s /lib/systemd/system/telegraf_EXECNAME.service .

In the /etc/telegraf_EXECNAME/telegraf.conf:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  logtarget = "file"
  logfile = "/var/log/telegraf/telegraf_EXECNAME.log"
  logfile_rotation_interval = "1d"
  logfile_rotation_max_size = "50MB"
  logfile_rotation_max_archives = 10
  hostname = ""
  omit_hostname = false


[[outputs.influxdb]]

[Add the needed options to the influxdb section for where the influxdb is hosted]

Note that this .conf file has removed all the collection information for the localhost, that remains in the original telegraf instance.

The .conf for the inputs.exec plugin are placed in

\etc\telegraf_EXECNAME\telegraf.d\EXECNAME.conf

To kick off the new service:

systemctl daemon-reload
systemctl enable telegraf_EXECNAME
systemctl start telegraf_EXECNAME

[In all the examples above, replace EXECNAME with a name that describes what’s being run.]

Creating multiple instances of the Telegraf service is annoying but it does allow me to collect the data from multiple places by running programs that reach out, gather the data and format for use with Telegraf and then into an InfluxDB database. See https://github.com/pkropf/telegraf for some examples.

Tags : , ,

Finding Raspberry Pi’s

I’m using Raspberry Pi’s for all sorts of projects. They’re fabulous, inexpensive computers that run Linux and can interact with the physical world. But sometimes, I have trouble finding them on my network. I know they’re there but I don’t have any idea of their IP address. For Pi’s that are connected to the display, keyboard and mouse, this isn’t much of a problem. But when the Pi is running headless, it’s a bit annoying trying to find it.

An easy way is to use nmap to find the all hosts on the local network with a specific MAC address prefix. For Raspberry Pi’s, there are two: B8:27:EB for older RPy models 1, 2, 3 and DC:A6:32 for RPy 4.

#! /bin/sh

nmap -sP 192.168.1.0/24 | awk '/^Nmap/{ip=$NF}/B8:27:EB/{print ip}'
nmap -sP 192.168.1.0/24 | awk '/^Nmap/{ip=$NF}/DC:A6:32/{print ip}'

nmap will find hosts on the specified network and awk will pull out the IP address of the host if the MAC address prefix matches those of the RPy’s.

Note that you’ll need to change the 192.168.1.0/24 to match your local network.

Nagios Monitoring of Directories

I had a recent occurrence at work that caused me to look around for a tool to monitor a directory for any changes made. Since there didn’t seem to be anything out there, I created a check called dirchanged. It looks at all the files in a directory and creates an sha256 has of the names and contents of the files. That hash is compared to a known value to determine if there have been any changes made.

There are a couple of issues with this check specifically that it doesn’t look into subdirectories and that the hash for comparison is passed on the command line from within the Nagios configuration files. I think the first issue will be fixed soon enough w/ a flag to indicate if the directory tree is to be traversed. The second issue is more cumbersome in that the hash value has to be stored somewhere. I’m not yet certain that putting it in the Nagios configuration files is better than putting it somewhere on the target file system. From the security standpoint, having the check not stored on the target file system is better, much less chance of it being changed by bad guys.

I’ll let it run for a while and see how it behaves and if changes are warranted.

Tags : ,

Chasing a Rollover Crash

I was reminded recently that when writing code in C, you have to take care to understand how variable are going to be used when declaring them. I was had just finished working on the code used to control the fire effects at The Crucible‘s Maker Faire 2013 booth when the system just seemed to come to a halt. That’s not quite what it was supposed to do.

The system was designed to have 3 24′ towers as the central part of the booth. On top of the towers would be accumulator based fire effects – a 24″ round sphere w/ a 2″ exhaust port, a 9″ x 24″ oblong tank w/ a massive 3″ pneumatic solenoid / exhaust and three smaller accumulators based on old fire extinguishers. The solenoids on the fire effects would all be controlled with an Arduino. The idea was that there would be no direct user interaction this year but the system would run automatically. Plug in the Arduino and away we go.

The code would run one of a number of possible sequences, pause between 30 and 90 seconds, randomly run the next sequence, pause . . . And it did that, most of the time. A couple of times after starting up the Arduino, several sequences would run and then nothing else would happen. Made me wonder if I had crashed the Arduino.

I added some Serial.print statements to the code to dump out details on what was happening internally and ran the code again. This time it ran without issue for almost 2 hours before coming to a halt. Looking at the output on the serial console showed that that pause value was -31438. Of course everything came to a halt, the system was attempting to pause negative 31,438 milliseconds! This didn’t make much sense until I reread the Arduino docs and saw that ints are 16 bit values. Of course it rolled over into a negative number.

Digging into the code I realized that I had used int’s in several places where an unsigned long was needed. Once fixed, all was right with the world and the system went on to work just fine for both days of the Maker Faire.

Perhaps I need to start writing these systems on a Raspberry Pi where I can use Python 😉

Tags : , ,

Python Meets the Arduino

I gave a talk at PyCon 2012 on using Python to control external devices through an Arduino.

Seems I was just a wee bit nervous giving the talk. Perhaps some more practice next time…

 

Tags : , , ,