Speech Recognition

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Speech Recognition

Intro

First, we need to install fastaudio module.

reticulate::py_install('fastaudio',pip = TRUE)

Dataset

Grab data from TensorFlow Speech Commands (2.3 GB):

commands_path = "SPEECHCOMMANDS"
audio_files = get_audio_files(commands_path)
length(audio_files$items)
# [1] 105835

Preprocess

Prepare dataset and put into data loader:

DBMelSpec = SpectrogramTransformer(mel=TRUE, to_db=TRUE)
a2s = DBMelSpec()
crop_4000ms = ResizeSignal(4000)
tfms = list(crop_4000ms, a2s)

auds = DataBlock(blocks = list(AudioBlock(), CategoryBlock()),  
                 get_items = get_audio_files, 
                 splitter = RandomSplitter(),
                 item_tfms = tfms,
                 get_y = parent_label)

audio_dbunch = auds %>% dataloaders(commands_path, item_tfms = tfms, bs = 20)

See batch:

audio_dbunch %>% show_batch(figsize = c(15, 8.5), nrows = 3, ncols = 3, max_n = 9, dpi = 180)

Model

Before fitting, 3 channels to 1 channel:

torch = torch()
nn = nn()

learn = Learner(dls, xresnet18(pretrained = FALSE), nn$CrossEntropyLoss(), metrics=accuracy)

# channel from 3 to 1
learn$model[0][0][['in_channels']] %f% 1L
# reshape
new_weight_shape <- torch$nn$parameter$Parameter(
  (learn$model[0][0]$weight %>% narrow('[:,1,:,:]'))$unsqueeze(1L))

# assign with %f%
learn$model[0][0][['weight']] %f% new_weight_shape

Add callbacks

Weights and biases could be save and visualized on wandb.ai:

# login for the 1st time then remove it
login("API_key_from_wandb_dot_ai")
init(project='R')

wandb: Currently logged in as: henry090 (use `wandb login --relogin` to force relogin)
wandb: Tracking run with wandb version 0.10.8
wandb: Syncing run macabre-zombie-2
wandb: ⭐️ View project at https://wandb.ai/henry090/speech_recognition_from_R
wandb: 🚀 View run at https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv
wandb: Run data is saved locally in wandb/run-20201030_224503-2sjw3juv
wandb: Run `wandb off` to turn off syncing.

Conclusion

Now we can train our model:

learn %>% fit_one_cycle(3, lr_max=slice(1e-2), cbs = list(WandbCallback()))

epoch   train_loss   valid_loss   accuracy   time 
------  -----------  -----------  ---------  -----
epoch   train_loss   valid_loss   accuracy   time 
------  -----------  -----------  ---------  -----
WandbCallback requires use of "SaveModelCallback" to log best model
0       0.590236     0.728817     0.787121   04:18 
WandbCallback was not able to get prediction samples -> wandb.log must be passed a dictionary
1       0.288492     0.310335     0.908490   04:19 
2       0.182899     0.196792     0.941088   04:10

See beautiful dashboard here:

https://wandb.ai/henry090/speech_recognition_from_R/runs/2sjw3juv?workspace=user-henry090

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.