I'd like to speak on this matter a bit as lip syncing was some of the first bits of code that I examined when the source code was released. Essentially, the way that lip syncing works is based on the principle of Amplitude Modulation (AM). The audio sample which is currently playing gets the loudness/amplitude of the waves monitored, and every arbitrary interval (500ms? 200ms? 100ms? I don't know exactly when.), it samples the audio and picks an animation between FACE_TALK0-4 based on the loudness of the sample. It's primitive, but it works. If I remember right, the code for lipsyncing allows this sort of stuff to take place easily. You'd need to set the entity up to be playing a sound effect from it, and the engine apparently does the rest. See how taunts do it.