Moving Beyond Black Box Fears: New Questions for AI in Hiring

New rules for evaluating AI hiring tools

October 10, 2025

It's conference season. In 2025, that means many talent leaders will be evaluating AI tools for hiring. If you’re preparing to sit down with vendors, you’ll probably hear familiar questions surface:

Those questions made sense a few years ago. They were shaped by real scandals and public missteps. But they don’t always fit the reality of how large language models (LLMs) are being used in hiring today.

Why the Concerns Exist

The skepticism around AI in hiring didn’t appear out of thin air. A series of very public failures gave people reason to be cautious:

Those examples created a lasting impression. Many buyers came away thinking of AI as unsafe, biased, and impossible to understand.

From Black Box ML to LLMs

It’s worth remembering that these failures were built on machine learning approaches that were common at the time. Those models were trained on historical data, they operated like black boxes, and they often just reinforced the patterns of the past.

Large Language Models (LLMs) have brought us to a different place. Instead of ranking résumés or copying old decisions, they can understand natural language, probe reasoning, and evaluate responses to specific questions. Old machine learning tried to predict “fit.” LLMs are better at analyzing individual answers to structured questions and then showing the evidence behind the score.

What LLMs Are Trained On

Models like GPT-4 or Claude were trained on a mix of licensed data, publicly available internet data, books, articles, and code. The important point is that they were not trained on résumés or past hiring outcomes.

The bigger issue that should concern buyers in 2025 is not the base training itself, but how each vendor uses the LLM in practice. The real questions should be: How is the model fine-tuned for this use case? How are prompts designed to evaluate skills instead of people? How are the outputs validated for fairness and consistency?

A Better Foundation: Assess Skills, Not People or Résumés

The safest and most compliant approach to hiring starts with a simple principle: assess individual skills directly. Do not infer them from résumés, job titles, or a sense of pedigree.

That means:

  1. Define the skills that really matter for the role. Break them down into observable knowledge, skills, and abilities.
  2. Map structured questions to each skill. For example, questions that probe problem solving, technical reasoning, compliance knowledge, or customer empathy.
  3. Score every response on its own. Each answer is evaluated against job-relevant criteria, not lumped into a single score.
  4. Compile scores into a skills profile. Results show strengths, emerging skills, and areas for growth.
  5. Keep human judgment in its proper place. Recruiters and hiring managers evaluate the person using the evidence, not the model.

Why AI Interviews Add Value

(yes, this one little section is the content marketing piece. Skip to the next section to avoid it) Once you have this structure in place, AI interviewing makes it possible to do at scale what humans cannot. Large language models don’t have to judge people or résumés. They can analyze responses, one question at a time, and provide depth that a résumé will never give you.

A résumé says, “This person held a certain title, so they must know X.” An AI interview shows, “Here is how this candidate actually responded to a question about X.”

One is an assumption. The other is evidence.

The Better Questions for 2025

Instead of asking, “What data did you train on?” ask, “How are individual question responses mapped to skills, and how is expertise scored?”

Instead of asking, “Won’t this amplify bias?” ask, “How do you make sure each skill is scored independently and transparently, with evidence I can review?”

Instead of asking, “What were LLMs trained on?” ask, “How is the model adapted to evaluate skills in a fair, explainable way for this role?”

Bottom Line

The scandals of the last decade explain the skepticism that still lingers. But ranking résumés, whether by humans or by machines, is not the right path forward.

The stronger path is structured skill assessment. Large language models make that possible at scale by evaluating responses to job-relevant questions, not résumés or people. Each answer can be scored independently, the results are transparent, and recruiters have real evidence to work with.

This isn’t about guessing who is qualified. It’s about giving every qualified candidate the chance to show what they know, fairly and consistently.

Explore more resources

The latest news, interviews, and resources from industry leaders in AI.

Go to Blog

Built by Leading AI Researchers

Talk to your Workforce Solutions Consultant Today

Book a free consultation and let hiring pains become a thing of the past.
Talk to an Expert