• python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Facebook Twitter Instagram
Devs Fixed
  • python
  • javascript
  • reactjs
  • sql
  • c#
  • java
Devs Fixed
Home ยป Resolved: Is the waveform the “raw” audio data? (Pytorch)

Resolved: Is the waveform the “raw” audio data? (Pytorch)

0
By Isaac Tonny on 17/06/2022 Issue
Share
Facebook Twitter LinkedIn

Question:

I am using PyTorch in an audio deep learning project. I am using the torchaudio.load method to load the waveform and sample rate. Now, my question is, is the waveform considered the “raw” audio data? Is it the PCM data? If not, then how can I get PCM data from .ogg format?

Answer:

Solution


Yes, it’s raw data.
For the explanation read below. If you know about sampling theory and how sound is generated, skip to the last paragraph.

PCM


PCM is a fancy way to explain the process by which a continuous time wave is represented inside a computer. You can learn more in any introductory course/book of digital signal processing such chapter 3 of The Scientist and Engineer’s Guide to Digital Signal Processing.
Briefly in a computer you can only represent finite quantities, so you need to take discrete samples in time (sampling) at a certain amplitudes (quantization).
When loading any audio file this process has been already done for you.

RAW DATA


If you connect a speaker and you play a wave, the membrane will oscillate as the amplitude of such wave at every instant. This is the “raw” audio, a signal that contains the amplitude at each “time” instant. If you can “see” the wave changing with no discontinuity from left to right when plotting your data, it is very likely a raw vector.
What is non-raw data then? Every compression algorithm modifies the input vector with any sort of mathematical function, so that it occupies less space, but also is not understandable anymore by just looking at it. This is because the samples don’t represent anymore an amplitude over time. If you’d play the compressed wave through a speaker you wouldn’t get any sound, only noise.

Pytorch


In the example you provided from the pytorch documentation we can clearly see that the plot represents raw data, sampled at 16kHz.
To exclude the possibility that
  • torchaudio.load could still give a sort of compressed object
  • the raw data is generated and plotted by plot_waveform

We can see that the waveform variable is long 54400 samples and sampled at 16kHz. This means it represents 54400*(1/16000) seconds, which are exactly 3.4s.
The plot shows 3.4seconds, thus telling us that what is represented in the variable waveform returned by the load function is the raw data.

If you have better answer, please add a comment about this, thank you!

audio python pytorch
Share. Facebook Twitter LinkedIn

Related Posts

Resolved: std::regex_replace to replace multiple combinations

26/03/2023

Resolved: How can I copy files using the ansible.builtin.copy module and avoid conflicting file names?

26/03/2023

Resolved: Reshape tensors of unknown shape with tf.function

26/03/2023

Leave A Reply

© 2023 DEVSFIX.COM

Type above and press Enter to search. Press Esc to cancel.