IBM's Project CodeNet Aimed at Teaching AI to Code

By AI News

May 11, 2021

IBM has announced Project CodeNet, a large dataset that aims to help teach AI how to understand and even write code. The open-source dataset on Github has about 500 million lines of code, 14 million examples, and spans 55 programming languages including Python, C++, Java, Go, COBOL, Pascal, and more.

CodeNet will lead to enhanced tools that help speed up the writing and checking of code by programmers by improving an AI's understanding of how to do such tasks.

"Given its wealth of programs written in a multitude of languages, we believe Project CodeNet can serve as a benchmark dataset for source-to-source translation and do for AI and code what the ImageNet dataset did years ago for computer vision," IBM said. "Project CodeNet specifically can drive algorithmic innovation . . . to make a more significant dent in machine understanding of code as opposed to machine processing of code."

