Over the past weekend, I hacked together a pipeline to automatically transcribe bass tab from any song.
The main motivation behind this was my obsession with learning obscure reggae basslines from old Jamaican music. After stumbling upon a Hacker News post on Spleeter a couple days ago, I decided to build a prototype and it turned out to be a huge success.
Here are the stages of the pipeline:
Raise pitch by 1 octave using
sox $in $out pitch 1200 bass -30 100 gain 10. This step raises the original track by 12 semitones and removes some overtones.
Melody tracking. This step produces a frequency estimation and associated confidence for each sample (100ms). Since the role of the bass is to accentuate chord tones, notes are usually played one at a time. A monophonic instrument is a lot easier to track than a polyphonic one(guitar/piano) where several notes are played simultaneously. Melody tracking is an active research area in the field of Music Information Retrieval. After 10 hours of reading, I settled on crepe whose deep learning approach is apparently better than state-of-the-art heuristics based methods such as pYIN and Melodia.
Note Tracking. This step produces a sequence of notes with frequency, start and duration. Unfortunately crepe doesn’t do note tracking(unlike pYIN), and I couldn’t find any deep learning model that does this. I studied the source code of tony and used a variant of the state space based algorithm.
Fretboard arrangement. This step converts notes to positions on fretboard. First, separate the sequence of notes into sentences based on whether gap length is longer than 1.5 stddev. Then run a custom algorithm which uses dynamic programming to minimize biomechanical cost between notes(finger movement, penalizes open string and high frets). On average the same pitch has 3 possible positions on the fretboard, a naive graph search algorithm on a tree with branching factor 3 would give O(3^n) space and time complexity.
Plotting, making video and uploading to youtube. YouTube has API limits so I can’t upload more than 3 videos a day.
The pipeline runs under 5 minutes for a 4 minute song but 90% is in video generation(which is written in Python).
Although this pipeline is still rough around the edges, it generally outputs sensible results. The arrangement algorithm use a dynamic threshold and should take more things into account such as musical conventions and right hand techniques. I believe the result may be good enough to use as training set for a neural net tuned for note tracking(maybe finetune the weights from crepe?). Having done my (limited) research, I don’t think there’s a deep learning model that does note tracking so you can definitely write a research paper with this.
The source code for this project is here