Killing a process by name with Ansible Ad-hoc

I’ve been learning my way through Ansible’s ad-hoc method of distributed management. It’s been a great start to re-familiarize myself with the important little details of Linux while getting the basics of Ansible down.

I’m also slowly working my way up in complexity into a distributed performance platform for performance analysis. But this post isn’t about that story. It’s about solving a problem.

The system architecture today is:

  • Deploy VMs from template with DHCP addresses (via PowerShell)
  • Install latest versions of Ansible and fio (via bash… for now)
  • Auto-discover peer systems and separate them into zones inside /etc/ansible/hosts (via python thanks to a coworker)
  • Set crontab with performance profile dictated by host zone (via Ansible)
  • Reboot (via Ansible) and run the workload (via bash)

Then I/O begins. It’s pretty nifty, even in its duct tape and glue design right now.

But I kept finding myself in need of updating the fio workload profile and then restarting the bash script managing workloads. Until today, I’ve been:

      • Using a shared NFS mount to distribute the updates:    ansible all -m shell -a "cp nfs1/*fio ." --ask-pass
      • Rebooting all hosts but this one (learned that lesson):    ansible all:\!10.1.1.131 -m shell -a "reboot" --sudo --ask-pass
      • Then rebooting:    sudo reboot

Then I found that sometimes I just wanted to turn off all workloads in a rush. I knew the answer would have the Linux command kill in it somewhere, but couldn’t figure out how to call it without knowing every system’s process ID (PID) for the command. So I asked on superuser.com and here’s what they taught me in just a few steps.

Step 1 – Find out the process name.

pgrep -lf bash [to find the bash shell script]

Let’s say it returns a process called bash-fio.

Step 2 – Kill it.

pkill bash-fio

And that’s it! You can confirm you killed the process with ps aux | grep bash, which shows you the process is no longer running.

I now have a single ad-hoc command to kill this process across 15 different VMs:

ansible all -m shell -a "pkill bash-fio" --ask-pass

This kind of distributed kill switch comes in handy. I hope you find good use of it as well.

A Simple & Insecure way to use Ansible

[Update Aug 12th @ 9pm]

Ansible has brilliant security options that I have yet to learn or share here. Do not take this post as a reflection of the platform.

Please take note of the comment by Ansible CTO Michael DeHaan at the end of this post! The behavior I go over here is a beginner’s walk through in an isolated lab and NOT anywhere close to best practice.

[/Update]

While I’ve been curious about DevOps since I read The Pheonix Project, I’ve just barely scratched the surface of the toolkits available.

I mentioned I’m managing a few lab environments adding up to 45+ VMs. Running commands one at a time across that many hosts is a major time sink. One typo in a script and you start over again. Writing bash scripts to ssh and execute commands across multiple hosts feels too much like throwaway work. I’ve instead focused on Ansible, what I find to be the simplest of configuration management solutions out there.

If you review the resources mentioned in the first post, you can get a handle of ad-hoc commands and even run through your first few.

Life gets harder, though, as you get more advanced. Here are two easy ways to make running subsequent commands much easier. 

NOTE: In this case, easier is synonymous with insecure.  I would never recommend these actions in an environment with inbound internet access, let alone a production environment.

1. Hardcode your password inside your inventory file

Your ansible inventory file is familiar in an /etc/hosts-kinda way. Get to know the syntax and you can send commands to multiple systems with ease. For instance:

$cat /etc/ansible/hosts
[workloads]
192.168.2
192.168.3
192.168.4
192.168.5

With this file, and with my lack of desire to manage SSH keys, I have to always use --ask-pass when I call ansible so it can prompt me for the password.

Well, we can do better. Given that I’m using simple templates in my VMware infrastructure, I know the username and password of each VM. I can use ansible_ssh_pass right next to each host to bypass that need:

$cat /etc/ansible/hosts
[workloads]
192.168.2 ansible_ssh_pass=Password
192.168.3 ansible_ssh_pass=Password
192.168.4 ansible_ssh_pass=Password
192.168.5 ansible_ssh_pass=Password

2. Disable host key checking

Another security feature that gets in the way of acting quickly within your ad-hoc environment is fingerprint confirmation. You’ll come across failures like these if you forget to accept a host before you pass it a command with --ask-pass:

Screen Shot 2014-08-12 at 3.50.42 PM

To bypass this lovely behavior, change the checking behavior:

$export ANSIBLE_HOST_KEY_CHECKING=False

Now you can run commands with --ask-pass without error, even if it’s the first time you’ve SSH’ed to these hosts.

Bonus: Clean up your inventory file

The two benefits I just outlined can lead to a length and messy /etc/ansible/hosts file. Following this and this advice, you can consolidate the above examples all within the file:

$cat /etc/ansible/hosts
[defaults]
host_key_checking=False

[workloads]
192.168.[2:5] ansible_ssh_pass=Password

You can see we’ve set a default to no longer require the key checking, much like our export did on a per-session basis. We also have a range of hosts selected  instead of listing them one at a time. The major advantage to that is you only have to add parameters once.

What’s next

I’m still a few days away from getting into playbooks. If you have any advice before digging into it, feel free to share.

 

From Zero to Ping with Ansible

I’m fooling around with configuration management across a set of virtual machines in the lab. The reason is one of practicality: manually running commands in the terminal of 15 Ubuntu servers is tedious.

I’d like to make sure all my hosts are responding to ping, are all on the same build of a few key packages and I’d like to do so without scripting from scratch.

Thus, configuration management enters the arena. The reason I go toward Ansible before Puppet or Chef for this need is the simplicity factor: Ansible does not require a server, leverages SSH and is coded in Python. These are all familiar enough for me to dive right in.

So I dove right in.

The Ansible team gets that documentation determines adoption, which is a huge plus. The community is prolific and easy to understand. But they also assume some nuances about host-to-host communication I haven’t touched on in ages. Here’s how I got past the initial roadblocks:

From there I was able to use the ping module and receive responses from all hosts.

ansible-ping-pong

I’m still far from expertise on the matter, but it’s definitely a start.  I made a small and growing list of those I can talk to about Ansible on Twitter. You can subscribe here.

Do you use Ansible? Know of a great how-to article to share? Let me know!

 

 

Troubleshooting Short: Prompting NFS Reconnect with vmkping

I run a small partner validation lab these days and came across a hack worth sharing.

My NFS datastores were unpredictably disconnecting from one of three of my ESXi hosts. We’re running ESXi 5.5 and it was a new behavior.

My instinct — coming from a block protocol background — was to click “Rescan All,” but it has no effect on NFS based storage.

What my colleague Dan Perkins point me toward was an alternative route he discovered: if you run a vmkping pointed out the NFS vmkernel interface, it seems to prompt a reconnection if one is possible.

Sure enough, I ran:

vmkping -I vmk2 10.10.10.100

And the ping showed an ICMP response. Low and behold, the datastore immediately showed as mounted in vCenter!

Mind you, this did not fix the problem.

Dan ended up discovering a faulty port on the storage controller after further isolation. It was handy along the way to run vmkping as a means of isolation between no connectivity and dropped connectivity at load.

For more syntax that may be of help, check out this KB.

A Personal & Professional Pivot

 

I was asked a few weeks ago whether I was “all in on Sales or not,” and chose the latter.

Answering that question with honesty was terrifying and brought about a journey of career reflection, as well as a needed shift in how I saw myself.

The Plan

I joined Infinio as a Sales Engineer (SE).

While the role was a new one for me, the idea of new work has never been a concern of mine. It was actually part of the attraction.

I’ve always thought of my career aspirations as a map that will give me the “big picture” view of running an IT business. After the success of launching EMC Elect, I knew a startup was next for me. Transitioning to a SE role was a bonus: two checkboxes with one move.

Mapping out my career planning. I've always thought of this stage of my career as gathering up the big picture. Infinio gave me two checkboxes for one move.

I’ve always thought of this stage of my career as gathering up the big picture. Infinio gave me two checkboxes for one move.

It made sense from a business angle for Infinio as well.

Infinio launched with the idea of a “download and go” model of sales. It was perfect to have someone with social media marketing experience involved. That is, it made sense 7 months ago.

The Pivot

Remember, startups change all the time. The Lean Startup terminology for this situation is a pivot - a decision by leadership to shift how a product is positioned in order to gain momentum. The team behind Buffer offers a textbook example of this behavior in the wild.

Seamless installation was achievable, but results were totally dependent on sufficient workload.

This fact is no strike against the Infinio Accelerator product — I’ve spoken with half a dozen storage vendors who face this fact everyday. Performance testing is difficult and a single instance of IOmeter doesn’t cut it.

Technical details aside, the sales process required more engagement. More engagement meant longer sales cycles and thus a shift. What the pivot meant to the Sales team, and me as part of it, is that our process now looks different.  The shift made my SE role more core Sales and less cross-functional into Marketing.

Who I Am

Back to our original story.

When I was asked if I was all in on Sales, I had the think about the difference between what I could be and who I am. Here are a the two facts that made the choice clear:

  • My top three concerns related to the business are longer-term brand loyalty, communication strategy, and measuring meaningful data
  • I believe an uncomfortable amount of public honesty keeps us all improving

These made one more detail a truth. If I have to fit into a strict org chart, my career will continue in Marketing (That’s the first time I’ve written it down).

What now?

The conclusion – the scary part of this honesty – was that I would have to find a new job.

I reached out to parts of our community that may know of openings that would fit my recent discoveries. My management team at Infinio was incredibly considerate as I searched out options.

I was near a decision on a few incredible companies when I was surprised by a new offer.

The team at Infinio had reconsidered the marketing budget to allow room for me to join. When the pitch was made to me, my gut told me everything I needed to know: I still had much to build as part of this team.

Today is my second full day as a Technical Marketer for Infinio, running a partner validation lab, owning some content creation and parts of the social media strategy.

Due Thanks

The hardest part of this process was that my first reaction was one of being a failure. It has only been through the incredible mentors I’m fortunate enough to have that I pushed pass this falsity and got to the part that matters.

Thank you to the incredible friends in this community that help me think through these types of priorities. I love how we all make each other better in our careers.

 

Now, it’s time to get back to work. I have a startup to help build.

 

Introducing Neckbeard Influence

I outlined my content strategy in a recent post. It included a known gap in where I explore a big part of my passion for technology: the connection of people and the measurement of their influence.

where-I-blog-mjbrender

Well, I figured out what’s next for my strategy. It’s called Neckbeard Influence.

content-strategy-plus-neckbeards

I’ve thought about the scope of this project for a while now. I see Influence Marketing, from the Influencer Diet that Amy Lewis blogs about to the evolution of Community influence that John Mark Troyer continues to develop, to be the future.

My time on the Geek Whisperers has reenforced my love for Community building. There are tips on how to be an influencer, how to measure influence and how to live as an influencer that not many are covering just yet. I’d like to be one of them.

This site will remain my go-to for sharing technical content — from VMware expertise to my exploration of GitHub. You can read more on my experiences in Influence Marketing at Neckbeard Influence.

I hope you continue to enjoy both.

Quick Post: vCenter “Troubleshooting”

I ran into one of those vCenter Server Appliance (VCSA) situations that can only be called “curious.” The 5.5 server had been left alone for some time. Logging in was a no-go:

Damn you vSphere. Damn you.

As any reasonably respectable admin would do, I googled it. David Hill has an insightful post out there from a few years ago that’s still relevant. I ran through it. I ran into the same error afterwards.

I began toggling services from the UI at :5480, but nothing seemed to make sense.

Then I looked up. I noticed time melting off the clock as I investigated why I couldn’t administrate over this cluster. Doing some quick math, I accepted my fate and clicked the VM reset button. The Web Client worked like a charm after that.

A Disappointing & Important Conclusion

There’s nothing like vCenter to teach me important lessons of administration.

There are only so many hours in our days. As much as I would love to tell you that XYZ service had hung and you can run ABC command to clear it, I can’t. I just reboot the system.

Knowing that a vCenter reboot is reasonably non-disruptive to the data center, I accepted that time was more important than exact answers.

It’s not a satisfying observation, but it’s an example of the time management we all have to make throughout the day.

My Latest Obsession: Admins becoming Developers

More and more of what we do in Enterprise IT can be expressed as code.

While I won’t lead with click bait like Systems Administration is Dead, I do believe we’re not far from a fundamental skill set shift. It involves coding, but has less and less to do with the code.

Stay with me here. 

If you’re a ‘Technologist’ or ‘Engineer’ in anyway that connects to operations and administration, you’ve written code. You learn enough BASH shell to get the job done. You read up on PowerCLI to make tomorrow a little easier than yesterday. You hack together some HTML on a wiki to make your documentation pretty. (Slight aside: please write documentation… even if it’s not pretty.)

Ergo, you’re a coder.

What you — and I, to make this story personal — are not yet is a developer. The core differentiator, as I’ve found it, is a learned set of skills around socialization.

It’s all about how code is shared.

As I play with Chef, dig deeper into Vagrant and get more curious about how DevOps has lead such change in our industry, I notice one commonality: sharing code.

Keeping It Personal

I get more certain everyday that my core skill is Community building (with a capital ‘C’). My people are going on this journey from calculating IOPS to the automation behind DevOps. I spend my evenings and weekends making sure we’re on the same journey so I can continue to be part of the Community we’ve made.

If you’d like a tactical takeaway from this observation, learn Git. The best resource for that need is Git Immersion.

The more I explore, the more I find branching strategy as crucial to success as well.

Lastly, find a project worth forking. As a VMware vExpert, I am inspired to give back to the pyVmomi project. I hope you do as well.

It’s all a start to something bigger. I hope you’re join me for the ride.

Where I Blog: A New Content Strategy

 

Every time I’m about to write a long post, a few mental steps are taken automatically. They help me decide whether the idea will see the light of day or whether I spend my time elsewhere.

I take this process for granted given that I’ve been posting here and there on the internet since high school.  Well, I’ve had enough side conversations about why I post here and what I do there to want to explore it visually.

So here it is: my mental map to publishing content.

where-I-blog-mjbrender

Here’s the breakdown:

  • Infinio – Contributing to my company’s blog at Infinio helps me articulate value and understand the server-side caching industry in a way I enjoy. I keep that fact in mind and aim to post every two weeks.
  • Industry – This blog has become a home base for me. The more my career evolves, the more I see the value of this site as being a technical reference as I continue to grow my expertise. My posts come in waves at times as I explore, though I am to keep it to once a week.
  • Marketing – There’s another part of my brain that I think deserves a proper place to live. It’s inspired by Geek Whisperers and continues to be a bigger part of my mind. How do businesses engage customers and employees in meaningful ways? How does storytelling and metrics impact business? There’s a big story to tell here and I plan to focus on it soon.
  • Other – Sometimes writing is like plumbing: you can’t get one idea out until you clear out the others. I find there are times when I need to dig into a concept via public dialogue and that conversation doesn’t fit nicely into my available places. Thankfully I was introduced to Medium by my friend and old colleague Diego. It’s design and tagging makes it perfect for my potpourri of interests.

There’s one major omission — I contribute on The Geek Whisperers as we take turns writing up the podcast notes. I don’t think of that process going through the same chain of events however.

Unlike Twitter, where I stand by a unified self, I find longer-form content is most meaningful when well organized. I believe every conversation has its place.

 

Infrastructure as a Means, not an End

In philosophy, the term means to an end refers to any action (the means) carried out for the sole purpose of achieving something else (an end)

I run two small clusters of ESXi hosts for the SEs at Infinio.

They act as  a microcosm of real infrastructure: shared amongst the team, at the will of the network team, often capacity or performance constrained, occasionally faulty for unexplained reasons.

What’s most realistic about this side-task is that the goal of my job is not managing this cluster. My work uses the clusters as a resource for my actual work. Maybe I need to test new code. Maybe I need to document a user experience. No matter the end, running the infrastructure is not what my work is about. It’s simply a means.

This last point resonates with me most of all. As I look to experience what my customers experience, I find this setup to be a perfect way to do so. Running a small cluster in order to demo our product makes it a means to a much more business-centric end. If something is broken for a while, I mostly don’t care. If it doesn’t work  when I need to demo, I care a great deal.

That’s real world infrastructure for you, and probably is more true than we like to admit. Those who prioritize their tasks do not always fix what’s broken. They fix what’s broken before it is needed and as quickly as they can while doing it well. It’s a different mindset.

This experience teaches me two lessons I wish to share with you:

  1. If you want to sell infrastructure (that includes you, Marketing), you should run one. No matter how small. Download Autolab and get one running on your laptop.
  2. As organizational silos fall between storage, network, and compute you can imagine even less time spent twiddling with infrastructure knobs. Automation – even if it’s good enough automation – will eat up these menial tasks in even the smallest organizations. The only people left needing to know the details will be Technical Support.