Raw 2- and 3-gram frequencies of the corpus prim-5.0-public-all are also available.
Language models
Language models are in the iARPA format, using witten-bell smoothing. They were created by the IRSTLM Tooklit. Models are lowercased.
Raw 2- and 3-gram frequencies of the corpus prim-5.0-public-all are also available.