I sit on the board of the CRA-W, a subcommittee of the Computing Research Association dedicated to supporting and growing the population of female researchers in Computer Science careers. The CRA-W is an action-oriented committee; members design and run a variety of programs aimed at helping undergrad through professional women succeed as researchers. For example, I co-chair the Career Mentoring Workshop, a 2-day conference for senior grad students and early-career professionals.
At the CRA-W Board meeting earlier this year, there was a lot of discussion about evaluation metrics. Many of these programs are funded through National Science Foundation grants, particularly through its Broadening Participation in Computing program. However, recent management changes in BPC have resulted in pressure to measure the effectiveness of CRA-W's programs.
Coming from an industry background, this seems like an entirely reasonable request to me. After all, if NSF is run like a business, say, like a venture capital firm, and it's providing seed money to organizations such as CRA-W to produce results, we ought to be able to document what we've achieved in exchange for their investment.
However, measuring the effectivess of mentoring programs is extremely difficult. We can't run a controlled lab study: take a set of undergrads, run half of them through a mentoring program and have half be the control group, keep everything else in their lives the same, and see which group has a higher conversion rate to professional CS researchers many years later.
So, how can we solve this problem? In HCI we face similar problems of evaluation. For example, I'm working on a research system right now whose primary value would only show up over time, as it becomes more customized to its user and learns that user's work habits. The effectiveness of such a system depends on the real-world life situations in which it is used, which can't be easily simulated in a controlled lab environment.
I wonder, Can we use lessons from HCI to help evaluate other complex systems, such as mentoring programs? If you were to design an evaluation for a mentoring program, what would you measure, and how would you determine whether the program was a success? I'd love to hear your ideas.