Amazon open-sources its Topical Chat data set of over 4.7 million word


Way back in April, Amazon announced its intention to publish a data set — the Topical Chat data set — of crowdsourced human conversations to teams competing in the annual Alexa Prize Socialbot Grand Challenge competition. It finally made good on that promise today with the release on GitHub of more than 235,000 utterances containing over 4,700,000 words, which it asserts will support “high-quality” and “repeatable” dialogue systems research. (Venture Beat)