------------What is this?------------ This is a portable offline program that voices character's replies in oobabooga using tacotron2 and hifigan model pair. Tacotron models are very expressive and fast, the inference is usually less than a second. It starts synthesizing after the first response sentence is completed, so you can expect uninterrupted speech during generation. Ignores text between brackets or asterisks. Most code taken from collab inference notebooks from /ppp/'s main doc. ------------How to use it?------------ Step 1: Unzip to a new folder Step 2: Download models for your characters (guide below) Step 3: Place models to corresponding folders and modify config.txt to have full filenames of models Step 4: Launch oobabooga Now launch START.bat, wait for a new chrome window to open, continue the dialogue there. It should start pronouncing new messages. Sometimes window doesn't open or it doesn't start pronouncing. Just reload the program. How to get models guide: All models are trained by /ppp/ anons. You will need 2 files: tacotron and hifigan model. Place them to tacotron_models and hifigan_models respectively Download Tacotron2 models: https://docs.google.com/document/d/17VAnMQI4NJzu7UXZALs14AFvhpw8wvbLdA9HrA2xLus/edit You don't want models marked with MMI, they won't work. You would usually want the one specifying the least loss. Download HifiGAN models: HifiGAN model is less important, for example Rarity sounds just fine with Twilight's hifigan model. Rarity: https://drive.google.com/uc?id=1-cQw1xpZpxZAh5NwfKm_HN5Ga3PP-zi- Twilight: https://drive.google.com/uc?id=1GwDJVYZG1IGKFILjD3sF-11vJ0vN2yiK Fluttershy: https://drive.google.com/uc?id=1Cwy_VafEucmSjyrpXc62IIk4CuhMb4Tn Maud: https://drive.google.com/uc?id=1qpgI41wNXFcH-iKq1Y42JlBC9j0je8PW Trixie: https://drive.google.com/uc?id=1YDcUDvkEwjBCPRzDsDEflMJJTXPhBFTk If there is no hifigan model fo your character, use Twilight, it will still sound very good. ---------------------------------------------------------------------------------- Additional info: GateThreshold parameter in config.txt: Takes values from 0 to 1, represents how certain a model should be that generation is over to stop generation. Default value (0.25) works fine for me. If a model ends phrases abruptly, you may raise this value. Setting it too high may give you a never ending silence at the end. Script info (for developers): This uses 32KHz audio synthesis by first synthesizing at 22KHz and then upscaling it. The entire hifigan pipeline is inside the corresponding module for ease of use. (yes, a single object loads 2 models, does all transforms in forward pass) Once again, it's taken from collab and reworked for ease of use, I doubt anyone would want to change the pipeline. I can't credit everyone because I forgot where I got the repos from.