Category Archives: DevOps

Tiny linux disk usage reporting script – dux.bash

I decided to write a very small disk usage reporting script that provides some extra information than does just using ‘du‘ directly.  The script of course uses du, among other command line tools and parsing commands to generate the little report.  It suits my needs for the moment. Hopefully others out there will also find it useful.

Firstly, here is the script output to give you an idea of how to execute it, what it does, and how it looks:

$ ./dux.bash /home/jbl /home/jbl
Building extended du reports for /home/jbl in /home/jbl ...
[du extended report]:
13G     ./Personal
4.7G    ./DATA
2.4G    ./Pictures
1.4G    ./Downloads
Total (GB): 21.5
373M    ./core
260M    ./tmp
73M     ./game-saves
37M     ./new-music
33M     ./new-books
32M     ./vim-env
24M     ./random-tools
15M     ./stuff
Total (MB): 847

The script takes two arguments, the directory you want to analyze, and the directory where you want to store the reports.

As you will see, the script provides me an at-a-glance look at the following characteristics of a particular directory structure:

  • Grouping and separation of larger (GB sized) directories from smaller (MB sized) directories.
  • Directories sorted by size in each group.
  • Total sizes for each group of files and directories.

With this output I can see clearly which directories are causing the most contention, and how much of an impact they have compared to other directories.

The script is very crude, and probably needs some work and error correction (accounting for files off or root, etc.)   It also creates some temporary text files (used to construct the report), which is the reason for the second argument to the script.  However for now it’s doing what I need, so I figure it’s worth sharing.  Here it is:

echo "Building extended du reports for $1 in $2 ...";
cd $1
du -sh $1/* > $2/du-output.txt
cat $2/du-output.txt | egrep '([0-9][0-9]M)' > ~jbl/du-output-MB.txt
cat $2/du-output.txt | egrep '[0-9]G'> ~jbl/du-output-GB.txt
cat $2/du-output-MB.txt | sort -hr > $2/du-output-MB-sorted.txt
cat $2/du-output-GB.txt | sort -hr > $2/du-output-GB-sorted.txt
echo '[du extended report]:';
cat $2/du-output-GB-sorted.txt
echo -ne "Total (GB): " && cat ~jbl/du-output-GB-sorted.txt | perl -pe 's/^(\d+\.+\d+|\d+)\w*.*/$1/g' | paste -sd+ | bc
echo ""
cat $2/du-output-MB-sorted.txt
echo -ne "Total (MB): " && cat ~jbl/du-output-MB-sorted.txt | perl -pe 's/^(\d+\.+\d+|\d+)\w*.*/$1/g' | paste -sd+ | bc

I’m not sure what more I will need from it going forward, so it may not get much love in the way of improvements.  Since it would be nice to have it on each system I’m using, I may convert it to a single script that has no need to generate temporary text files.  

Nuff said!  Hopefully you find either the script, or portions of it useful in some way!

For convenience, I’ve also added this script as a public gist so you can download dux.bash from github.


Checking OS Version Across Multiple Hosts with Ansible

Often when you are maintaining a large number of servers, it is useful to be able to query those systems all at once to find out information like IP address, configured hostnames, and even the OS version.  

In this post we’re going to focus on pulling the OS version from multiple systems at once with a single Ansible playbook.  

First, lets get an idea of how our directory structure should look in the end, then we’ll break things down:

Ansible playbook and file-system layout:

$ tree .
├── get-os-version.yaml
├── group_vars
│   └── linux
├── hosts
│   └── home
│   ├──
│   ├──
│   └──
└── host_vars
4 directories, 5 files

Now that we know how things should look in the end, lets setup our host configurations.

Host configurations:

$ tree hosts/
└── home
1 directory, 3 files
$ cat hosts/home/ 

As you can see, the host configurations can be fairly simple and straight-forward to start, following the directory structure outlined above. Now lets setup our group_vars.

Group configuration:

$ cat group_vars/linux 
ansible_ssh_private_key_file: /home/jbl/.ssh/jbldata_id_rsa

In this case, each of the servers I’m dealing with are secured by password-protected SSH keys, so I’m setting up my group vars to reference the correct SSH private key to use when connecting to these servers.  Pretty simple so far?  Great, now lets look at the playbook.

The Ansible playbook: get-os-version.yaml

- name: Check OS Version via /etc/issue
 hosts: linux
 - name: cat /etc/issue
 shell: cat /etc/issue
 register: etc_issue
 - debug: msg="{{etc_issue.stdout_lines}}"

This playbook is very simple, but does exactly what we need.   Here we are specifying the use of the ‘shell’ module in order to execute the cat command on our remote servers.

We use the ‘register’ keyword to save the resulting output of the command in a variable called ‘etc_issue’.  We then use the ‘debug’ module to print the contents of that variable via ‘etc_issue’.  

When executing a command via the ‘shell’ module, there are several return values that we have access to, which are also now captured in the ‘etc_issue’ variable. In order to access the specific return value we are interested in, we use ‘debug’ to dump the STDOUT return value specifically via ‘etc_issue.stdout_lines’.

Now we have an Ansible playbook and associated configuration that allows us to quickly query multiple servers for their OS version.

It’s important to note that since I’m using password-protected SSH keys, that I’m using SSH Agent before I execute the playbook.  This only has to be done once for repeated runs of the same playbook within your current terminal session, for example:

$ ssh-agent 
SSH_AUTH_SOCK=/tmp/ssh-82DKhToCuPUu/agent.4994; export SSH_AUTH_SOCK;
echo Agent pid 4995;
$ ssh-add ~/.ssh/jbldata_id_rsa
Enter passphrase for /home/jbl/.ssh/jbldata_id_rsa: 
Identity added: /home/jbl/.ssh/jbldata_id_rsa (/home/jbl/.ssh/jbldata_id_rsa)

Now, we’re ready to execute the ansible playbook.  Here’s the resulting output:

$ ansible-playbook -i hosts/home get-os-version.yaml 
PLAY [Check OS Version via /etc/issue] *****************************************
TASK [setup] *******************************************************************
ok: []
ok: []
ok: []
TASK [cat /etc/issue] **********************************************************
changed: []
changed: []
changed: []
TASK [debug] *******************************************************************
ok: [] => {
 "msg": [
 "Ubuntu 14.04.3 LTS \\n \\l"
ok: [] => {
 "msg": [
 "Ubuntu 14.04.5 LTS \\n \\l"
ok: [] => {
 "msg": [
 "Debian GNU/Linux 5.0 \\n \\l"
PLAY RECAP ********************************************************************* : ok=3 changed=1 unreachable=0 failed=0 : ok=3 changed=1 unreachable=0 failed=0 : ok=3 changed=1 unreachable=0 failed=0

And that’s pretty much it!  Now we just have to add more hosts under our hosts/ configuration, and we can query as many servers as we want from a single command.  Happy orchestrating!

Ansible Playbooks – Externalization and Deduplication

Image result for ansible

Externalization and Deduplication

Developers who understand the concepts of modularity and deduplication should immediately recognize the power behind being able to include settings and commands from external files.   It is seriously counter-productive to maintain multiple scripts or playbooks that have large blocks of code or settings that are exactly the same.   This is an anti-pattern.

Ansible is a wonderful tool, however it can often be implemented in counter-productive ways.  Lets take variables for example.

Instead of maintaining a list of the same variables across multiple playbooks, it is better to use Variable File Separation.

The Ansible documentation provides an excellent example of how to do this.  However I feel that the reasoning behind why you would want to do it falls short in describing the most common use-case, deduplication.

The documentation discusses the possible needs around security or information sensitivity.  I also believe that deduplication should be added to that list.  Productivity around how playbooks are managed can be significantly increased if implemented in a modular fashion using Variable File Separation, or vars_files.   This by the way also goes for use of the includes_vars module.

Here are a list of reasons why you should immediately consider a deduplication project around your Ansible playbook variables:

Save Time Updating Multiple Files

This may seem like a no-brainer, but depending on the skills and experience of the person writing the playbook, this can become a significant hindrance to productivity.   Because of Ansible’s agent-less and decentralized manner, playbooks can be written by anyone who wants to get started with systems automation.  Often, these can be folks without significant proficiencies in programmer-oriented text editors such as Vim, Emacs, or Eclipse – or with bash scripting experience around command-line tools like awk, sed, and grep.

It is easy to imagine a Java developer without significant Linux command-line experience opening up one playbook at a time, and modifying the value for the same variable, over and over… and over again.

The best way for folks without ninja text-editing skills to stay productive is to deduplicate, and store common variables and tasks in external files that are referenced by multiple playbooks.

Prevent Bugs and Inconsistent Naming Conventions

In a perfect world, everyone would understand what a naming convention was.  All our variables would be small enough to type quickly, clear enough to understand its purpose, and simple enough that there would never be a mis-spelling or type-o.  This is rarely the case.

If left un-checked, SERVER_1_IP can also be SERVER1_IP, Server_1_IP, and server_1_ip.  All different variable names across multiple files, referencing the same value for the exact same purpose.

This mess can be avoided by externalizing this information in a shared file.

Delegate Maintenance and Updates to Variables That Change Frequently

In some environments, there may be playbook variables that need to change frequently.  If these variables are part of some large all-encompassing playbook that only some key administrators have access to be able to modify, your teams could be left waiting for your administrator to have free cycles available just to make a simple change.  Again, deduplication and externalization to the rescue!  Have these often-changing variables externalized so that users who need these changes immediately can go ahead and commit these changes to very specific, isolated files within your version control system that they have special rights to modify.

Cleaner Version Control History (and therefore Audit History)

If you have the same variables referenced by multiple files, and you make changes to each of those files before you commit them to version control, then your version control history can become a complete mess.  Your version control history will show a change to a single value affecting multiple files.  If you come from a software development background, and are familiar with the concept of code reviews, then you can appreciate being able to look at a simple change to a hard-coded value (or a constant), and see that it only affects one or two files.

I hope the reasons above convince some of you to start browsing your playbook repositories for possible candidates for deduplication.  I really believe that such refactoring projects can boost productivity and execution speed for individuals and teams looking to push changes faster while minimizing obstacles around configurations shared by multiple systems.  Send me a note if this inspires you to start your own deduplication project!