Crowdsourced software development (CSD) offers a series of specified tasks to
a large crowd of trustworthy software workers. Topcoder is a leading platform
to manage the whole process of CSD. While increasingly accepted as a realistic
option for software development, preliminary analysis on Topcoder’s software
crowd worker behaviors reveals an alarming task-quitting rate of 82.9%. In
addition, a substantial number of tasks do not receive any successful
submission.
In this paper, we report about a methodology to improve the efficiency of
CSD. We apply massive data analytics and machine leaning to (i) perform
comparative analysis on alternative technique analysis to predict likelihood of
winners and quitters for each task, (ii) significantly reduce the amount of
non-succeeding development effort in registered but inappropriate tasks, (iii)
identify and rank the most qualified registered workers for each task, and (iv)
provide reliable prediction of tasks risky to get any successful submission.
Our results and analysis show that Random Forest (RF) based predictive
technique performs best among the alternative techniques studied. Applying RF,
the tasks recommended to workers can reduce the amount of non-succeeding
development effort to a great extent. On average, over a period of 30 days, the
savings are 3.5 and 4.6 person-days per registered tasks for experienced resp.
unexperienced workers. For the task-related recommendations of workers, we can
accurately recommend at least 1 actual winner in the top ranked workers,
particularly 94.07% of the time among the top-2 recommended workers for each
task. Finally, we can predict, with more than 80% F-measure, the tasks likely
not getting any submission, thus triggering timely corrective actions from CSD
platforms or task requesters.