Tokens, POS, phonemes
Input is tokenized, part-of-speech-tagged, and mapped to ARPAbet phonemes. No semantic embedding step. No vector store. Symbols stay symbols all the way down.
- ↘Tokenizer: NLTK-derived, lossless on whitespace
- ↘POS tagger: rule + lookup hybrid
- ↘Phonemic mapper: CMU dict + heuristic fallback