Train a weighted finite-state transducer
This takes an existing WFST and data and splits states in an entropy reduce way to produced a new WFST that better models the given data.