2017 Graduation Speech to UNC Department of Statistics and Operations Research

Text as prepared.

YOU LOOK GREAT. A SEA OF BLUE. 20 YEARS AGO, I WAS ONE OF THOSE DROPLETS IN THE SEA OF BLUE.

I’M HONORED TO BE SPEAKING WITH YOU HERE TODAY. BUT I’M NOT HONORED SO MUCH AS I AM SURPRISED. BECAUSE THE DATA 20 YEARS AGO WOULD NEVER HAVE MODELED A SCENARIO IN WHICH I WOULD BE SPEAKING AT GRADUATION IN THE DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH.

Shannon Paylor and Ryan Thornburg
2017 STOR graduate and Carolina Data Desk fellow Shannon Paylor with me at the department’s graduation.

I WAS A WORDS GUY WHO IN COLLEGE TOOK ABSOLUTELY AS LITTLE MATH AS POSSIBLE. THE ONE QUANTITATIVE COURSE I DID TAKE IN COLLEGE WAS A STATISTICS CLASS. PROFESSOR LEDBETTER — WHO IS RETIRING THIS YEAR — WAS THE INSTRUCTOR, AND IT HAS THE HONOR OF BEING THE LOWEST GRADE ON MY PARTICULARLY UNREMARKABLE TRANSCRIPT.

IF 2017 RYAN HAD GIVEN A GRADUATION SPEECH TO 1997 RYAN, WHAT HE MIGHT HAVE TOLD HIM WAS THAT AT THIS MOMENT IN YOUR LIFE, ALL THE POSSIBLE FUTURES THAT YOU’VE IMAGINED FOR YOURSELF — AND EVEN THOSE YOU CAN’T IMAGINE — HAVE A PROBABILITY GREATER THAN 0 BUT LESS THAN 1. AND AT THIS POINT, YOU STILL DON’T YET HAVE ENOUGH DATA TO BUILD A GOOD MODEL FOR WHAT LIFE HAS IN STORE FOR YOU. AND THAT DATA, WHILE GOOD AT TELLING US WHAT HAS BEEN, AND WHAT IS, AND WHAT MIGHT BE — IS NOT AS POWERFUL AS STORIES TO HELP US IMAGINE THE LIFE WE WANT AND TO HELP US MOVE THE ODDS OF GETTING THERE CLOSER TO 1.

DURING MY 20-YEAR CAREER IN JOURNALISM, DATA HAS BECOME A POWERFUL TOOL FOR FINDING STORIES THAT WE MIGHT MISS IF WE RELIED ON INTERVIEWS AND OBSERVATION ALONE. AS JOURNALISTS WORK TO SHINE LIGHT IN DARK PLACES, HOLD POWERFUL PEOPLE ACCOUNTABLE AND EXPLAIN THE INCREASINGLY COMPLEX WORLD WE ALL SHARE, DATA HELPS US DEVELOP THE WHO, WHAT, WHEN AND WHERE OF OUR STORIES. BUT DATA ALONE CAN’T TELL US “WHY?” OR “SO WHAT?” UNTIL WE ADD SOME PLOT TO THE STORY, DATA IS JUST A COLD FOSSIL.

SEE, TO HAVE A GOOD STORY, YOU HAVE TO RICH MOMENTS. LIKE THIS MOMENT THAT WE’RE ALL SHARING NOW. IT IS ONE OF THE MOST IMPORTANT OF YOUR CAROLINA STORY, NOT BECAUSE WHAT I’M SAYING RIGHT NOW WILL BE PARTICULARLY MEMORABLE FOR YOU IN 20 YEARS. THIS MOMENT IS POWERFUL BECAUSE IT’S ONE TO WHICH YOU’VE ALREADY BEEN DOZENS OF TIMES IN YOUR IMAGINATION. THIS MOMENT — NOT AS IT ACTUALLY EXISTS, BUT AS IT EXISTED IN THE STORIES YOU TOLD YOURSELVES ABOUT IT — PROPELLED YOU THROUGH HARD TIMES OVER THE LAST FOUR YEARS, TIMES WHEN THE DATA MIGHT HAVE TOLD YOU YOU WOULDN’T MAKE IT.

DATA CAN TELL US A LOT OF IMPORTANT ATTRIBUTES OF THIS MOMENT. I CAN TELL YOU HOW MANY PEOPLE ARE SITTING HERE. AND, IN FACT, I CAN TELL YOU THAT YOUR NAME IS MORE LIKELY TO BE EMILY OR MATTHEW THAN ANYTHING ELSE. AND WITH ENOUGH DATA, I MIGHT EVEN BE ABLE TO UNDERSTAND HOW EACH OF YOU GOT HERE. BUT WHAT I CAN’T TELL YOU — AND WHAT NO DATA CAN TELL YOU — IS WHY YOU ARE HERE.

THE DAY IS NOT FAR OFF WHERE WE COULD GATHER ENOUGH DATA ABOUT THIS MOMENT TO RECREATE IT IN VIRTUAL REALITY WITH PRETTY HIGH FIDELITY. IN FACT, COLLABORATIONS AMONG JOURNALISM, STATISTICS AND COMPUTER SCIENCE RIGHT HERE AT UNC WILL MAKE THAT DAY ARRIVE SOONER RATHER THAN LATER.

BUT WHILE DATA MIGHT ONE DAY BE USED TO RECREATE MOMENTS, IT CAN’T TELL STORIES. BECAUSE STORIES ARE RICH MOMENTS THAT ARE STRUNG TOGETHER WITH MEANING. STORIES NEED THE WHO, WHAT, WHEN AND WHERE OF DATA. BUT THEY AREN’T STORIES UNTIL THEY ANSWER THE QUESTION “WHY?”

SO, WHY ARE YOU HERE? WHAT’S YOUR STORY?

YOU SEE, FACEBOOK, GOOGLE, THE UNIVERSITY—THEY ALL HAVE DATA ABOUT YOU. BUT YOU ARE THE ONLY ONES WHO GIVE THAT DATA MEANING, AND YOU GIVE IT MEANING BY TELLING YOURSELVES AND THE REST OF THE WORLD STORIES. AND IT’S THOSE STORIES THAT PROVIDE THE MORAL PROPULSION TO CHANGE THE DATA ITSELF.

SO, WHY ARE YOU HERE?

NOW, IF I COULD HAVE SUCKED UP ALL YOUR ANSWERS TO THE QUESTION WHY ARE YOU HERE AND RUN THEM THROUGH SOME NATURAL LANGUAGE PROCESSING APPLICATION I MIGHT BE ABLE TO REDUCE ALL OF THAT DATA TO A SINGLE, MEAN ANSWER: YOU’RE HERE BECAUSE YOU HAVE SUCCESSFULLY COMPLETED THE DEGREE REQUIREMENTS OF THE DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH AT THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL IN 2017.

THAT’S THE COMMON REASON WE ARE ALL HERE. BUT WHAT’S THE REASON ANY ONE OF YOU ARE HERE. WHY ARE YOU HERE?

SO, WHILE YOU’RE ALL MEDITATING ON THAT QUESTION, I’M GOING TO TALK TO THE PEOPLE WHO GOT YOU HERE — YOUR FRIENDS, YOUR FAMILIES, AND THE MILLIONS OF NORTH CAROLINIANS WHO WORK HOURLY WAGE JOBS AND PAY THEIR TAXES BECAUSE THEY SEE YOU AND YOUR PURSUIT OF THIS DEGREE AS A WORTHY INVESTMENT.

TO THOSE OF YOU WHO CONTRIBUTED SO MUCH TO THESE STUDENTS’ SUCCESS, WHAT YOU HAVE BEFORE YOU ARE BUDDING PROFESSIONALS IN SOME OF THE HIGHEST-PAID AND FASTEST-GROWING FIELDS. IN THE U.S. ALONE, THERE ARE ABOUT 100,000 JOBS FOR STATISTICIANS, AND IN 2024, THERE WILL BE ANOTHER 30,000. THE MEDIAN SALARY FOR THESE FIELDS IS ABOUT DOUBLE THE MEDIAN SALARY FOR A NEWS REPORTER. ACCORDING TO MY FORMER EMPLOYER, USNEWS, THE JOB OF STATISTICIAN IS THE FOURTH “BEST” JOB IN AMERICA. OR, ANOTHER WAY OF LOOKING AT IT, IT’S THE TOP JOB WHERE YOU DON’T HAVE TO DEAL WITH OTHER PEOPLE’S BLOOD.

AS THIS DEPARTMENT’S WEBSITE WILL TELL YOU, THE GRADUATES SITTING HERE ARE WELL QUALIFIED TO “TACKLE CHALLENGING QUANTITATIVE PROBLEMS IN A BUSINESS OR SCIENTIFIC ENVIRONMENTS.” THEY WILL GO ON TO CAREERS IN GENOMICS, BIOLOGICAL MODELING, ENVIRONMENTAL STATISTICS, INSURANCE AND FINANCIAL MATHEMATICS, REVENUE, WORKFORCE, AND SUPPLY-CHAIN MANAGEMENT, TRAFFIC FLOW AND CONGESTION, AND TELECOMMUNICATIONS.

BUT REGARDLESS OF THE FIELD, A BIG PART OF MANY OF THEIR JOBS WILL BE ASSESSING RISK AND UNCERTAINTY.

GRADUATES, YOU ARE EMBARKING ON CAREERS IN WHICH YOU WILL BE ASKED TO REDUCE RISK AND UNCERTAINTY JUST AT THE MOMENT THE WORLD SEEMS TO PREFER EXACTLY THOSE THINGS.

AS JOURNALISTS WHO TRIED TO PREDICT THE OUTCOME OF LAST YEAR’S ELECTION CAN TELL YOU, OLD MODELS ARE SUDDENLY INSUFFICIENT. AND IT’S NOT JUST IN POLITICS WHERE THE WORLD SEEMS ON UNCERTAIN FOOTING. GLOBAL CLIMATE CHANGE IS ACCELERATING AND BECOMING MORE ERRATIC. MODELS OF MILITARY POWER AND THE VERY RELEVANCE OF NATIONS ARE UP FOR DEBATE. AND CLOSER TO HOME, THE PATH TO ADULTHOOD AND ECONOMIC INDEPENDENCE IS VEERING OF A COURSE THAT IT’S BEEN ON FOR GENERATIONS.

WHEN MANY OF YOU WERE IN KINDERGARTEN, MEN TRAINED AND FINANCED BY THE AL QAEDA TERRORIST ORGANIZATION DESTROYED THE NEW YORK WORLD TRADE CENTER AND BADLY DAMAGED THE PENTAGON. THAT WAS A MOMENT OF GREAT UNCERTAINTY THAT IS RICH IN DATA, BUT IT’S THE STORIES THAT WE TELL OURSELVES ABOUT THAT MOMENT THAT GIVE IT SUCH MEANING THAT THE NUMBERS 9 AND 11 CAN NO LONGER STAND FOR ANYTHING ELSE.

WHEN YOU WERE IN MIDDLE SCHOOL, A COLLAPSE IN THE GLOBAL ECONOMY CAUSED JOBS LOST AND DREAMS DEFERRED TO BE PROMINENT TOPICS IN EITHER YOUR OWN HOUSE OR YOUR FRIENDS’ FAMILIES. THOSE MOMENTS ARE PART OF YOUR LIFE STORY JUST LIKE TODAY IS.

UNCERTAINTY SEEMS TO BE EVERYWHERE THESE DAYS. FOR MANY PEOPLE, THE INSTINCT IS TO CLING TO SECURITY AS IF THE SONG IS ABOUT TO STOP IN A GAME OF MUSICAL CHAIRS. OTHERS SEE IN UNCERTAINTY A CHANCE TO SOW DOUBT AND UNDERMINE EVIDENCE.

FOR A LOT OF PEOPLE, UNCERTAINTY IS A SYNONYM FOR FEAR AND AN EXCUSE FOR CYNICISM. AND THOSE PEOPLE ARE GOING TO ASK PEOPLE WITH YOUR SKILLS, GRADUATES, TO TELL THEM HOW TO PROFIT FROM FEAR AND HOW THEY SHOULD HEDGE THEIR BETS SO THAT WHEN THE SHUFFLE IS DONE THEIR CARD IS ON TOP.

LIKE JOURNALISTS, YOUR JOBS WILL BE TO SORT OUT THE SIGNAL FROM THE NOISE AND TURN DATA POINTS INTO ACTIONABLE INTELLIGENCE. AND ALSO LIKE JOURNALISTS, YOUR CHALLENGE IS CHOOSING WHICH SOURCES TO TRUST, WHICH BITS OF INFORMATION ARE RELEVANT, AND THEN HOW TO COMMUNICATE INHERENTLY UNCERTAIN FINDINGS SO THAT THEY CAN BE UNDERSTOOD BY PEOPLE WHO DON’T HAVE YOUR ANALYTICAL EXPERTISE.

THE CHALLENGE OF TACKLING THESE QUANTITATIVE PROBLEMS DOESN’T EXIST JUST IN BUSINESS AND SCIENTIFIC ENVIRONMENTS. THE REASON YOUR SKILLS ARE SO VALUABLE IS BECAUSE THE WORLD TODAY IS RICH WITH DATA BUT POOR IN OUR ABILITY TO MAKE SENSE OF IT.

THE NEWSROOM IS JUST ONE PLACE THAT IS WAKING UP TO THE VALUE OF YOUR SKILLS. ADVERTISERS, LEGISLATORS, TERRORISTS, TRUCKERS, AND TAXI DRIVERS; POLICE AND POLITICAL DISSIDENTS ALL WANT YOUR SKILLS. YOUR GENERATION OF STATISTICIANS AND DATA SCIENTISTS WILL HAVE INCREDIBLE PROFESSIONAL OPPORTUNITY, AND THE STORY YOU TELL YOURSELF ABOUT WHY YOU ARE HERE TODAY WILL HELP DETERMINE WHETHER HUMANITY USES DATA TO EMPOWER OR ENSLAVE ITSELF.

DATA-DRIVEN AUTOMATION IS ABOUT TO UPEND THE GLOBAL WORKFORCE. FOR EXAMPLE, IN THE FIELD OF JOURNALISM, DURHAM COMPANY AUTOMATED INSIGHTS TURNS DATA INTO ARTICLES FOR THE ASSOCIATED PRESS.

THIS KIND OF AUTOMATION IS EXCITING. BUT WILL WE BECOME SO ENTRANCED BY AUTOMATING THE DESCRIPTION OF WHAT IS THAT WE FORGET THAT DATA CANNOT TELL US INEVITABLY WHAT WILL BE? THAT’S LEFT UP TO US. WE CAN’T FORGET THAT EVERY FUTURE IS POSSIBLE AND THAT IF WE HAVE CURIOSITY, IMAGINATION AND PERSEVERANCE, WE CAN TIP THAT NEEDLE IN THE DIRECTION WE WANT.

NOW WE KNOW THAT ALL THE CURIOSITY, IMAGINATION AND PERSEVERANCE IN THE WORLD DOESN’T ALWAYS GET US WHAT WE WANT. THE STORIES OF OUR LIVES ARE NOT WORKS OF FICTION OF WHICH WE ARE THE OMNIPOTENT AUTHORS. OUR STORIES COLLIDE WITH THE STORIES OF OTHERS, AND VERY OFTEN, THE STORY THE REST OF THE WORLD SEEMS TO BE WRITING FOR US DOES NOT MATCH OUR PLOT AT ALL. REGARDLESS OF WHETHER YOU PUT ANY EFFORT INTO CREATING THE FUTURE YOU WANT FOR YOURSELF, THERE ARE PLENTY OF OTHER PEOPLE WHO WILL BE PUTTING EFFORT INTO CREATING A FUTURE YOU DON’T WANT.

IF THERE ARE 200 PEOPLE IN THIS ROOM, THEN THERE ARE 200 UNIQUE STORIES ABOUT WHY EACH OF US IS HERE. PART OF THE REASON THAT GRADUATION IS SO POWERFUL IS THAT IT’S A MOMENT IN WHICH ALL OF THOSE INDIVIDUAL STORIES WEAVE TOGETHER IN A HELIX THAT, FOR 224 YEARS, HAS CREATED A CULTURE ON THIS CAMPUS THAT STRIVES FOR THE MOTTO YOU ARE ABOUT TO SEE ON YOUR DIPLOMAS — LUX LIBERTAS — LIGHT AND LIBERTY.

AND NOW IT’S TIME FOR US TO GIVE YOU THOSE PAPERS AND TURN THE PAGE FROM THIS CHAPTER IN YOUR STORY TO THE NEXT.

YOU HAVE BEEN A PART OF THE LONG, IMPROBABLE STORY OF THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL ITSELF. TIME HERE MARCHES ON, PEOPLE HERE MARCH ON, BUT THE PLOT OF OUR SHARED STORY—TO BRING LIGHT AND LIBERTY TO THE PEOPLE OF THE STATE AND THE WORLD—REMAINS THE SAME.

YOU’VE WALKED THIS UNIVERSITY’S BRICK PATHS FOR THE LAST TIME AS STUDENTS, BUT AS YOU SET OFF INTO A WORLD OF UNCERTAINTY, THE CHOICES YOU MAKE IN YOUR UNIQUE PURSUIT OF YOUR UNIQUE STORIES BECOME THE GUIDING LIGHTS THAT FUTURE GENERATIONS WILL USE TO WALK THESE SAME BRICK PATHS.

YOUR CHOICES ON HOW TO APPLY THE POWERFUL SKILLS YOU LEARNED HERE AT UNC BECOME DATAPOINTS THAT STUDENTS 20 YEARS FROM NOW WILL USE TO BUILD THEIR MODEL OF THE CAROLINA WAY AND ALL THE POSSIBLE FUTURES IT MIGHT HAVE.

PARENTS, FAMILIES AND FRIENDS, THANK YOU FOR ENTRUSTING THESE STUDENTS YOU LOVE TO UNC. FACULTY, THANK YOU FOR PREPARING THESE STUDENTS TO BE LEADERS IN EVERY VOCATION AND AVOCATION. AND STUDENTS—GRADUATES—THANK YOU FOR EVERYTHING YOU’VE ADDED TO THE STORY OF UNC AND EVERYTHING YOU WILL CONTINUE TO ADD. I LOOK FORWARD TO LISTENING TO ONE OF YOU GIVE THE GRADUATION SPEECH AT THE JOURNALISM SCHOOL IN 2037.

Localizing the NYT Data Viz on Police Racial Disparity

One of the most important attributes of data driven journalism is that it scales, and the primary goal of my OpenRural, Open N.C. and data dashboard projects has been to democratize data so that we start seeing the same types of reporting and presentation in small community papers that we see in the big national news sites. So when I saw Thursday’s New York Times graphic on the race gap in America’s police departments, I immediately thought that something similar could be done pretty quickly that would look at North Carolina towns.

Rebecca Tippett at UNC’s Carolina Demography service was able to pull and clean the data within about three hours. She posted the data in CSV format to her blog, along with a nice explanation.

Being a words guy rather than a picture guy, I used data visualization software Tableau to put together a prototype of something similar to what The Times had done. It is absolutely no where near as good as what they did, but I copied their concept, color scheme and fonts. And about two hours later I had something that told the same story.

Click on the image to see the interactive version of an embeddable graphic that can easily — and at no cost — be dropped in to any news site or blog (except this one … because I’m still hosting it on javascript-averse wordpress.com).

Click to view the interactive version on Tableau Public.
Click to view the interactive version on Tableau Public.

The graphic alone doesn’t tell the whole story. Tippett pointed out when I showed her the chart that most of the Latinos in Siler City aren’t even eligible to join the city’s police force — 40% are not adults, and 80% of adult Hispanics there are not citizens.

And many of these police forces are very small, which makes it easy for them to end up with huge percentage disparities in the racial breakdowns of their police and residents. Tiny Biscoe, for example, only has nine police officers. Wagram has two police officers — half of which are white and half of which are “other.”

The other potential problem with the data is that it’s seven years old. But so is the data used by The Times.

This is just an example of how we might continue to democratize data. This graphic could be emailed to an editor of each news outlet in North Carolina, along with a list of suggested questions that local reporters could ask to quickly make the data more relevant.

Suggested Questions to Localize This Data Driven Story

  • “This data is seven years old. Does it still look accurate to you? Can you provide me with some more recent data of the racial and ethnic breakdown of the police department?”
  • “Why do you think your department has a higher percentage of white officers than the residents?”
  • “How does the racial disparity between the police department and local residents effect the way your department works?”
  • “Walk me through the hiring process for new officers. How does a candidate’s race factor in to hiring decisions, if at all?”
  • “How do you publicize vacancies in the department? Do you do anything to recruit minority applicants?”
  • “What percentage of your officers live in the city? How important is it that officers come from within the city? Why?”
  • Also, seek opinions of others — both insiders such as city council members and community leaders as well as people on the street. Consider using social media such as Facebook or Twitter to ask people what they think about the data and these questions. This is the start of a conversation, not the end. Be sure to get a diversity of perspectives — age, gender, geography and certainly race and ethnicity.

The Challenge: News Deserts

But even if we acquire, clean and produce data along with some simple story guides, data driven journalism may still not find its way into smaller newspapers if nobody is there to receive our help. At many papers, this would still be seen as enterprise reporting. As an editor with a staff you can count on one hand, do you send a reporter out prospecting for answers to these somewhat uncomfortable questions? Or do you have them write up the day’s arrests? Or preview this weekend’s chamber of commerce golf tournament?

North Carolina also has broad news deserts — whole counties that have no reporters shining light in dark places, holding powerful people accountable and explaining an increasingly complex and interconnected world. Siler City, for example, is in a county of 65,000 people with a single newspaper that reaches only 12 percent of them. The News & Observer — provides scant coverage of the county.

What other story templates would you like to see? What would make them easier to use?

Why ‘Robot Reporters’ Are a Good Thing

First of all, let’s not let allow the alluring alliteration to distract from we’re really talking about — not robot reporters, but robot writers.

Mashable’s Lance Ulanoff asked me what I thought about the news that Durham’s Automated Insights would be writing automated business stories for the Associated Press.

This trend excites me about the future of journalism. I’ve been talking with folks about it for about five years, since I first saw similar work that was being incubated by Northwestern’s journalism school. That effort grew into the company Narrative Science, which has been writing earnings preview stories for Forbes.com. The Los Angeles Times uses an algorithm to write earthquake stories. The Washington Post has looked into using Narrative Science for high school sports stories.

The Guardian learned how hard it is to build a robot writer, but the automated stories I’ve seen written by both Automated Insights and Narrative Science are pretty good. And 46 media and communications undergrads couldn’t distinguish a computer written story from one written by a human.

The trend in automation should free up the best writers and best reporters to add the how and why context that still needs to be done by humans. If I were a beat reporter at a newspaper I’d be working as fast I could to convince by editor to let a computer write the scut stories I have to write and free me up to do more explanatory and accountability reporting, or to craft beautifully written narratives.

One significant risk is that for the last decade we’ve seen “good enough” journalism growing in popularity. News organizations that continue to have a strategy of harvesting profits rather than investing in growth will no doubt cut reporters if machines can write commodity news at a lower cost.

If I were a young journalist looking for my first job, I’d be looking for news organizations that are sustaining a small margin and growing both expenses and revenues — the ones that are using both bots and humans.

The trend toward automation will result in an emphasis on the news value of impact. Mass customization is going to change the nouns in the leads of stories from the third person to the second — “investors” will become “you.”

The trick is how to make money off this. News organizations that continue to see themselves as manufacturers of goods will probably increase the volume of digital commodity content they publish and continue to drive down ad rates.

But smart content companies are evolving from a manufacturing industry to a service industry, and trying to create, explain and capture the value they provide to each client by getting the right information to the right people at the right time.

What we see now as data is as unsophisticated as what many of us thought of data when Google first made its mission organizing all of it. We think of data now as numbers in tables — scores, money, temperatures, but we’ll soon see data as behavior and content metadata. And we will see automated stories that incorporate the user’s data and the data of her social network as well.

That level of concierge news service, though, is going to come at a price for users. If we’ve seen the democratization of media this automation trend has the potential to create a world of media haves and have nots — the haves will pay premium subscription fees to get highly personalized news from bots. The have-nots will get generic news (maybe written by bots as well).

The one thing from which I think everyone will benefit is an increase in the quality and frequency of narrative writing, and of explanatory and accountability reporting.

To aid that transition I’m working on the idea that we can use digital public records to build a newsroom dashboard system that will alert beat reporters to possible story ideas. Automated Insights and Narrative Science are scaling commodity news stories. I want to see if we can lower the human reporters’ opportunity cost of pursuing enterprise stories that land with much bigger and much longer lasting impact.

If you want a pithy quote from a journalism prof. on the effect that robot writers are going to have on the job market for journalism students, here it is: “My C students are probably screwed. My A students are going to do better than ever.”

Data Journalism Class Exercise (Or, Teaching Critical Thinking)

Here’s a great exercise for journalism professors who are introducing their students to data-driven journalism. It provides a good opportunity to show them that they have to get over the common perception that data is unbiased — clean and clear. It gives instructors an opportunity to talk about the need to “interview” the data.

The assignment is deceptively simple: Have the students download the Census Bureau’s list of rural and urban counties and calculate the population density for the counties in your state.

That’s it. Tell them no more. Depending on where they get stuck, slowly reveal to them the clues they need to complete the project. What you may not be surprised to find is that too many college undergrads seem to be accustomed to following step-by-step instructions and too few know how to break down a problem into smaller, sequential pieces. This is the kind of critical thinking skills that they need to be good journalists. Or, as I like to say, think journalistically regardless of their eventual profession.

Helping Them Get Unstuck

Force your students to get a quick start. Don’t let them sit and stare at their computer screens for even a second. Agitate them in whatever way you need to make them feel like an asteroid is about to smash the earth to smithereens. They can’t solve the whole problem all at once, so what are the pieces of the problem hidden inside this big problem?

  • Where can you find the Census list of rural and urban counties?

The answer — of course — is Google. So, there’s an opportunity to teach efficient search strategies.

Students will click around the Census site a bit trying to find what they want. Ask how skimmed and how many read every word on each page. A good opportunity to talk about the way people use information online.

You can help students find the data they need. And from there you can show them basic file-management and Excel techniques. Where does the file download on their computer? What’s the difference between a .csv and a .xlsx file?

With the data open in Excel, they’ll need to sort to filter out just their state. But now what? Ask the students what they think each of the columns represent. What does it mean that something has a POP_UA of 10791 and a STATE of 37?

Once they figure that out, they may note that the data includes some pre-calculated population density. But it’s not the information you asked them to find, so they’ll have to calculate population density — a commonly-needed, very simple journalism math equation.

This gives you a chance to explain that numbers are only meaningful in relation to other numbers. And how to do basic calculations in Excel.

The students will do the math correctly, but they won’t get answers that make any sense. A chance for you to talk with them about how data still has to pass the sniff test. Why doesn’t the data make sense? They can find the answer back on the Census website.

Once they’ve made the correct calculations (how many meters are in a mile anyway?), you can talk with them about how you still need to find the story in the data. Even though their calculations have added value to the data — essentially refining raw ore — mere presentation is of marginal value.

You can top off the conversation by coming back to language, and that journalistic aspiration for precision and objectivity. What does “rural” mean anyway? What does the dictionary say? Is it an abstract concept or something you can measure? How (many different ways) does the Census measure it? How is it different than the USDA’s definition? Which is better? Why?

This is a project that could take several weeks as a module in a college class, or as a MOOC or quick conference or newsroom workshop. Its strength is its scope and flexibility. Just like a good journalist.