Hire Like the Israeli Military

Recently I was tasked with hiring three software developers to join my team at work for a six-month project. I didn't have any previous experience hiring but based on what I've seen at my company and interviews I've had with other companies, here's what I'd typically do. I'd start by creating an exhaustive list of skills and traits for my future developer. Of course, any candidate for this position will need to be able to code in several languages. Naturally, they'll need experience with databases. Ideally, they'll have previous industry experience in silicon device manufacturing equipment. They'll need softer skills like leadership, teamwork, attention to detail, and organizational skills. And most importantly, they'll need to be "deeply dedicated" to building "robust code" and "insanely passionate" about "customer satisfaction".

I'd finalize all these skills and requirements in a job description and send them to my HR department to start receiving resumes. While reviewing the resumes, I'd select the ones that list the most skills in common with my job description. I'd evaluate the resume formatting to determine if the candidate has good organizational skills. And I'd immediately throw out all resumes with spelling errors because that demonstrates a lack of attention to detail so critical to this position.

During the interviews, panel of my colleagues and I would interrogate each candidate about their work experience. And just like good interrogators, we would closely look for subtle reactions from the candidate to divine the true nature of not only his ability but also his moral fiber.
After each interview, my colleagues and I would discuss our overall impressions of the candidate's performance.

This candidate stumbled a little bit answering that heapsort question, so I'm worried about her ability to perform under pressure.
That candidate, despite having a weaker resume, was so obviously passionate about his previous project that we'll definitely hire him.
We're on the fence about an another candidate, so we'll want more resumes to review.

After the discussion, we'd make a final hiring decision in the room.

We've all experienced a hiring process like this from one side of the interview table or the other. As interviewees, more often than not, we probably leave the room feeling exhausted and misunderstood. But as interviewers, we're pretty confident that we can pick a strong candidate with such a rigorous trial.

As interviewers, we are wrong - as Google discovered when they evaluated their own hiring practices. After looking at "tens of thousands of interviews" Google found "zero relationship" between the interviewer's rating of a candidate and that candidate's ultimate performance at Google. In other words, hiring with a process like this isn't any better than random chance. You might as well shuffle resumes and pick the one on top.

Why doesn't this work? Because a process like this relies heavily on our intuitions. By intuitions, I mean the heuristics and biases that we developed to help us solve problems in the wilderness where our brains evolved. While these intuitions work great in situations like tree climbing when we get quick feedback and have lots of chances to correct our errors, they are ill equipped to help us with the difficult task of selecting an employee from a large pool of candidates to perform a job for years based on a piece of paper and an hour long interview. What a typical interview really tests, according to the book Quiet: The Power of Introverts in a World That Can't Stop Talking, is whether the candidate is an extrovert or an introvert. It allows us, the interviewers, to substitute the candidate's confidence for their intelligence, their likability for their ability. ¹

So should we just shuffle the resumes or are there better ways to evaluate candidates? Industrial-Organizational psychologists suggest behavioral-based interviews. As Dr Kathryn Keeton of Minerva Work Solutions, an IO psychology consultancy, explained to me:

"There are essentially two ways to do behavioral-based interviews- one in which you ask the candidate to describe situations in which they exhibited certain behaviors (eg tell me about a time in which you dealt with a conflict at work- how did you handle it); the other asks the candidate what they would do in certain situations (eg if this situation happened what would you do)."

Behavioral interviews are the intellectual descendants of Daniel Kahneman, a Founding Father of cognitive heuristics. In his book Thinking Fast and Slow he describes how, at the age of 21, he was tasked with finding a way to evaluate candidates for combat duty in the Israeli military.

Before Kahneman, the military evaluated candidates based only on a single interviewer's holistic impression after a 20-minute interview. And just like the hiring process described above, these holistic impressions had no correlation with a candidate's eventual success in combat.

Kahneman, based on Paul E. Meehl's book Clinical Versus Statistical Prediction, believed he could more effectively predict candidates' success by scoring them on independent, specific, objective tests. To do so he created a simple set of questions to evaluate traits relevant to combat duty like responsibility, sociability, and masculine pride. ² The interviewers only had to mechanically score the answers to these questions during the interviews. Kahneman then created an algorithm to weight these trait-scores to determine the final overall score for each candidate. As you might have guessed, Kahneman's algorithmic evaluation of the candidates was more correlated with success in combat than the holistic impressions.

But the algorithmic evaluation was only part of the story. Kahneman's interviewers were frustrated at being reduced to mechanical counters. They, like most people, believed their intuitions were very accurate. So as a sop to their egos, Kahneman asked them to close their eyes after each interview, imagine the candidate as a combat soldier and score a final, holistic impression on a scale of one to 10. Surprisingly, Kahneman found that these holistic scores, which were previously useless, were now just as predictive of success as his algorithm. Kahneman believes that the specific questions of his algorithm primed the interviewers to think more analytically about each candidate. This priming ultimately made their intuitive impressions more accurate.

I decided to try Kahneman's process for hiring my three software developers. You can see the Excel file I used for my process here. To start with I created the job description. As per Kahneman's recommendation, I limited myself to six total skills - three Required, two Preferred, and one Nice-to-Have. I didn't require any education level or years of experience.

I rated each skill on the resumes I received by two methods. First, I found where the skill ranked in the resume's list of skills. Since people tend to list their strongest skills first, I gave more points to skills ranked higher in the list according to the table below.

Rank	Points
1	10
2	9
3	8
4	7
5	6
6	5
Greater than 7	5

For instance, George 's resume ranks Python 4th in his skills list, so I would give him 7 points for Django. But if he ranked it 10th, I would give him 5 points.

Second, I counted the number of projects or positions on the resume in which the candidate used the skill. I gave 2.5 points for each project up to a maximum of 10 points. For instance, George's resume lists 3 projects with Python, so I would give him 7.5 points. But if he listed 6 projects, I would give him 10 points.

I then added the rank and project points together to get a final score for the given skill. Since George received 9 rank-points and 7.5 project-points for Django, his total Django skill-score is 16.5.

I then weighted each individual skill-score by importance, multiplying Required skills by 10, Preferred skills by 5, and Nice-to-Have skills by 1. George would have the following weighted skill-scores.

Skill	Rank Score	Project Score	Raw Skill-Score	Skill Importance	Skill Weight	Weighted Skill-Score
Django	7	7.5	14.5	Required	10	145
Python	10	10	20	Required	10	200
SQL	5	10	15	Required	10	150
Javascript	5	7.5	12.5	Preferred	5	62.5
HTML	0	0	0	Preferred	5	0
Angular.js	5	2.5	7.5	Nice-to-Have	1	7.5
Raw Resume-Score						565
Normalized Resume-Score						69

I then added the six weighted skill-scores to get the raw resume-score, in George's case 565. Finally, I normalized the resume-score such the that maximum possible score was 100 to make it easy for myself and others to understand. George's normalized resume-score is 69. In other words, he received 69% of the maximum score for his resume. Below is a graph of all scores for the resumes I received.

Graph Source Code

I invited candidates with high scoring resumes to an interview with myself and two developers from my team. I wanted every interviewee to perform their best ³ so to help them relax I first explained the project and asked them to talk about their previous experience.

All developers on my team need to be able to translate a user's problem statement into a fully coded feature without much specific direction from me. They also have to work with the other developers to integrate their code into the entire application. For these softer-skills, I asked three specific behavioral questions of the interviewees:

Tell us about a time that you had to develop a new feature or improve an old one based only on a user's problem statement without a list of requirements.
Pick one of the projects or positions on your resume and tell us how the team was organized and what your role within the team was.
Tell us about a time when you had to work with another developer to either solve a problem with the existing code or develop a new feature.

My two team members and I scored the interviewees' answers to the questions on a scale of one to 10.

Probably the biggest reason interviews fail to separate good employees from bad is that they really only test one skill - interviewing. But the candidate is not hired to be a professional interviewer. In my case, they are hired to code. While, as we saw from Quiet, interviews favor confident extroverts, coding typically favors thoughtful introverts.

To avoid or at least mitigate this misalignment, I wanted the most important part of my interview to resemble the actual coding environment as closely as possible. So I developed a coding challenge. I asked the candidates to code a simple address book application to help a user keep track of his contacts. Here is the challenge as I read it to the candidates:

Create a simple application based on the following problem statement from the user.
The user needs a simple address book to keep track of his contacts. At a minimum, he needs to be able to add, delete and view his contacts.
You will have a half an hour to code this application. You should try to solve the user's problem as best as possible. Feel free to add any features you think might be necessary beyond the user's requirements. Also feel free to use Google or any other resources while coding. We will view your application and let you explain to us how it works after you are done.

I chose an address book because it's simple and easy to understand. Because I wanted to test how well the candidate could turn a user's problem statement into functioning code, I didn't provide any specifics about how the application should work. I encouraged the candidate to use Google or any other resources since Googling is an essential skill for programmers today. Finally, I didn't want candidates to feel like they had to explain their code as they went, since this isn't what they will actually do when coding in the real world.

To grade the candidate's application, my team and I scored the following questions on a scale of one to 10.

Did the candidate solve the user's problem?
Did the candidate add any hidden features the user did not request?
Grade the candidate's code.
Grade the candidate's User Interface.

Half an hour is not a very long time to code even a simple application, so most candidates did not finish. ⁴ Even so, this test was far and away the most valuable part of our interview. One candidate was a native Chinese speaker and thus struggled with the behavioral questions, but after watching him code, we knew we had found our man.

Finally, based on Kahneman's example, I asked my team to close their eyes and imagine working with the candidate, then score their overall impression on a scale of one to 10.

I created the final interview score by summing the average of my team's scores to the behavioral questions, coding challenge grades, and overall impression. Like the resume score, I normalized the raw interview score such that the maximum possible score was 100. Here are the final results of the interviews for each candidate (zero means they weren't interviewed).

Graph Source Code

Unlike the typical hiring example, my team and I did not discuss candidates after the interview. To make the final hiring decisions, I relied exclusively on a combination of the resume and interview scores. In another section of Thinking Fast and Slow, Kahneman recommends giving more weight to a candidate's resume than her interview. ⁵^, After all, a person's resume represents the entirety of their life's work, which should carry more weight than an hour-long interview. So for the candidate's final score, I multiplied their normalized resume score by 1.5 and added it to their normalized interview score. Here are the final results.

Graph Source Code

After creating the final hiring score, picking which candidates to hire was easy. I simply picked the highest scoring candidates. Outside of making the final decision a snap, the Israeli military method of hiring provides several benefits.

For one, it makes evaluating resumes fast and easy. Instead of reading each resume in detail, I could just count up skills and plug the results into a spreadsheet. The weighting of more important skills made sure I didn't treat Required skills and Nice-to-Have skills the same, which, because of our human tendency to think categorically, would be difficult without an algorithm. Finally, the resume scoring system helped me ignore irrelevancies like spelling errors and formatting which are tempting to use as proxies for character traits like attention to detail and organization.

The interview process ensured that my team was able to separate different, important characteristics like independence and teamwork from more holistic traits like extraversion and likability. Most importantly, the coding challenge tested the candidates' skills under the conditions in which they would actually work.

Finally the numeric scoring systems of both the resumes and interviews made it easy to objectively compare candidates to one another and make a final decision. When I developed this hiring plan I expected all of these benefits. However, there were a number of surprising benefits which I did not anticipate.

Contrary to what I indicated above, I didn't actually receive all the resumes up front, score each resume, select the best resumes for interviews, interview the candidates, score each interview, and, after completing all interviews, select candidates based on the highest final scores. The real world is much messier than that. I was constantly receiving new resumes while interviewing previous candidates. Often, I had to accept or reject a candidate before I could even see new resumes.

Didn't the messy real world destroy my nice, clean hiring plan? Actually just the opposite. After interviewing a few candidates, I realized I could correlate the resume scores and final scores with a linear regression. This way I could predict how good a new candidate's final score would be based only on their resume, making it easy to compare them to candidates I'd already interviewed. Here's a graph of the final scores versus the resume scores.

Graph Source Code

The linear regression line shows the predicted final score, based on a resume score. ⁶ Since we were hiring three developers, the cutoff line shows which resumes might belong to candidates better than my current top three. I calculated the cutoff line using the resume score of the third best candidate - in this case, Winston - minus 10%. ⁷ If a candidate's resume score was higher than the cutoff line, I brought them in for an interview. The best part of this process is that it constantly updated, becoming more accurate as I interviewed more candidates.

There was one more surprising benefit to a numerical hiring plan, it prevents, or at least mitigates stereotypical hiring. It so happened that the first few candidates we interviewed were women. After these interviews, a coworker explained to me that since this was a six-month project we would be ill-advised to hire a woman because they were more likely to take extended periods of time off, which we could not afford. As he explained, he had been burned by this before. I simply told him that we would hire the candidate with the highest score regardless of any other factors. And if he consistently rated women worse, I would know it.

Our innate cognitive biases lead us to believe that we are good at looking into a candidate's eyes and reading their soul. Science clearly tells us we are not. Therefore, it's critical to structure every step of the hiring process to combat these biases. However, as Kahneman says, implementing his hiring strategy "requires relatively little effort but substantial discipline". Acting on our intuitions feels good and people like to believe they can "take the measure of the man". ⁸

Throughout my hiring process, I received a lot of resistance. One member of my hiring team told me he didn't pay attention to the behavioral questions. Rather, he judged candidates on more intuitive metrics, like how long of a pause they took before responding to questions. ⁹ Another acquaintance told me that this method worked fine in our situation, but he served on an interview panel where they didn't have time for such a rigorous method and thus had to make up questions on the spot. I suggested that he should then trash the interview entirely and just shuffle the resumes since that would take less time and be just as effective.

The resistance leads to the larger lesson of analytical hiring and other analytical practices. As Kahneman says, his method is not perfect. It improved the Israeli military's evaluations from "completely useless" to "moderately useful". No intuition or algorithm can perfectly predict the future, despite how much comfort it might give us to believe that either can. But an improvement to "moderately useful" can, in the long run, provide a business with a significant edge and, as we've seen, mitigate many iniquitous results of intuitive practices. Perhaps the world would be a better place if we accepted "moderately useful" more often.

^{1. The book also suggests this is one reason why extroverts are overvalued in our society. ^}

^{2. Only men were eligible for officer training school in 1955. ^}

^{3. For some reason I find many interviewers intentionally try to be intimidating. ^}

^{4. I tested this coding challenge on my development team before the first interview to make sure it worked and help them sympathize with the interviewees. ^}

^{5. Again this is the opposite of what people do in a typical hiring process. In a discussion after the interview people naturally consider only the interview performance, rarely even referencing the resume. ^}

^{6. Note that the R² value is 0.76, so as we'd expect candidates with good resumes tend do well in interviews. ^}

^{7. I subtracted 10% because someone might outperform their resume score and thus end up being a better choice than Winston. ^}

^{8. Or, as we've seen, dismiss the woman. ^}

^{9. Fortunately the numerical scores made it easy to deal with his intuitive methods. I just threw out his ratings. ^}

Contents