NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
AudioFlux: A C/C++ library for audio and music analysis (github.com)
jcelerier 30 days ago [-]
It would be nice to have a comparison with any of the many C++ MIR (music information retrieval) libraries in the wild:

- https://essentia.upf.edu/

- https://github.com/marsyas/marsyas

- https://github.com/ircam-ismm/pipo

- https://github.com/flucoma/flucoma-core/tree/main/include/al...

BrannonKing 30 days ago [-]
If a person wanted to transcribe sheet music from recorded audio, do you know which library and features would be the best starting point?
bravura 30 days ago [-]
I have had mixed luck with this model, which is supposedly state-of-the-art: https://github.com/magenta/mt3

What kind of music are you trying to transcribe?

Feel free to email me.

bckr 30 days ago [-]
Start with source separation using demucs
Foobar8568 30 days ago [-]
Personally it's more converting scans of old music sheets to proper ones I am looking for, I haven't seen anything working properly.
adrianh 30 days ago [-]
Try our newish music scanning feature at Soundslice:

https://www.soundslice.com/sheet-music-scanner/

atoav 30 days ago [-]
Check out librosa as well
dsego 30 days ago [-]
It's also for Python, I just discovered it a few days ago. This is the website https://audioflux.top/
bravura 30 days ago [-]
If this is supposed to be used for deep-learning, shouldn't all the transforms be GPU-accelerated torch functions?
tgv 30 days ago [-]
By the looks of it, those functions extract features (like frequency peaks). You do that once for a sound. The output could function as input for an NN, in which case it would be a tokenizer for sound.
bravura 30 days ago [-]
Given what I've seen in audio ML research:

1) Tuning hyperparameters of your audio preprocessing is a pain if it's a preprocessed CPU step. You have to redo preprocessing every time you want to tune your audio feature hyperparams

2) It's quite common to use torchaudio spectrograms, etc. purely because they are faster (I can link to a handful of recent high-impact audio ML github repos if you like)

3) If you use nnAudio, you can actually backprop the STFT or mel filters and tune them if you like. With that said, this is not so commonplace.

4) Sometimes the audio is GENERATED by a GPU. For example, in a neural vocoder, you decode the audio from a mel to a waveform. Then, you compute the loss over the true versus predict audio mel spectrograms. You can't do this with these C++ features. (Again, I can link a handful of recent high-impact audio ML github repos if you like.)

Again, I just don't get it.

aa-jv 29 days ago [-]
>Again, I just don't get it.

The point is, ship it.

Seriously, nobody is lugging a GPU around to interact with their most frequently used micro-computing platform, their headphones, which right now, already represent a new and extraordinary era of "accelerated component" market expansion.

The 7 microphones in your earpiece, and the 6 speakers pushing air into your head, are not quite as close to the GPU, as they need to be, perhaps .. but they already have a DSP, and there is already a silicon battle going on among the vendors.

>You can't do this with these C++ features.

Yes, and I think the point in the end, is to use AI to write better C++ code, and design better, cheaper, smarter silicon, as always (and actually ship it) ..

codetrotter 30 days ago [-]
> I can link to a handful of recent high-impact audio ML github repos if you like

Yes please :D

bravura 30 days ago [-]
For instance:

https://github.com/descriptinc/descript-audio-codec/blob/mai...

https://github.com/NVIDIA/BigVGAN/blob/main/loss.py#L23

https://arxiv.org/pdf/2210.13438 (the github repo doesn't include training, just inference)

It is INCREDIBLY common to use multi-scale spectral loss as the audio distance / objective measure in audio generation. They have some issues (i.e. they aren't always well correlated with human perception) but they are the known-current-best.

tgv 30 days ago [-]
Backpropping filter coefficients sounds clever, but can't you just do that on any layer that takes a spectrum as input?
bravura 30 days ago [-]
Backpropping filter coefficients is clever, but it hasn't really caught on much. Google also tried with LEAF (https://github.com/google-research/leaf-audio) to have a learnable audio filterbank.

Anyway, in audio ML what is very common is:

a) Futzing with the way you do feature extraction on the input. (Oh, maybe I want CQT for this task or a different scale Mel etc)

b) Doing feature extraction on generated audio output, and constructing loss functions from generated audio features.

So, as I said, I don't exactly see the utility of this library for deep learning.

With that said, it is definitely nice to have really high speed low latency audio algorithms in C++. I just wouldn't market it as "useful for deep learning" because

a) during training, you need more flexibility than non-GPU methods without backprop

b) if you are doing "deep learning" then your inferred model will presumably be quite large, and there will be a million other things you'll need to optimize to get real-time inference or inference on CPUs to work well.

Is just my gut reaction. It seems like a solid project, I just question the one selling point of "useful for deep learning" that's all.

Severian 30 days ago [-]
Are there resources you would recommend reading regarding ML and audio?
bravura 30 days ago [-]
This is a really broad topic. I began studying it about 5 years ago.

Can you start by suggesting what you task you want to do? I'll throw out some suggestions, but you can say something different. Also you are welcome to email me (email in HN profile):

* Voice conversion / singing voice conversion

* Transcription of audio to MIDI

* Classification / tagging of audio scene

* Applying some effect / cleanup to audio

* Separating audio into different instruments

etc

The really quick summary of audio ML as a topic is:

* Often people treat it audio ML as vision ML, by using spectrogram representations of audio. Nonetheless, 1D models are sometimes just as good if not better, but they require very specific familiarity with the audio domain.

* Audio distance measures (loss functions) are pretty crappy and not well-correlated with human perception. You can say the same thing about vision distance measures, but a lot more research has gone into vision models so we have better heuristics around vision stuff. With that said, multi-scale log mel spectrogram isn't that terrible.

* Audio has a handful of little gotches around padding, windowing, etc.

* DSP is a black art and DSP knowledge has high ROI versus just being dumb and black boxy about everything.

gecko39 30 days ago [-]
I'm considering doing some ML stuff for a mobile DJ app.. like beat/bpm detection, instrument / vocal separation etc. Have you seen anything recent that might be efficient enough to run on a mobile device and process a track in a reasonable amount of time ( less than song length ) ?
Severian 30 days ago [-]
I may not email as it isn't a serious pursuit, but more curiosity. Thank you for the invitation! My current fascination is in separation and classification. And modular synthesis where I guess DSP stuff comes about if translating into the digital domain.
aa-jv 29 days ago [-]
A GPU is useful, but DSP's are also still useful - for example there is a compelling case to have frameworks around such as AudioFlux, JUCE and others, in order to support portability and also realtime analysis competitively, which is important in this domain, where such things as Qualcomms' ADK, and others, is quite literally being put inside peoples ears...

Not to say that big-AI shouldn't have audio analysis as a compelling sphere of application, but more that, until the chips arrive, in-ear AI is less of a specification/requirement, than in-ear DSP.

We don't need AI to isolate discrete audio components and do things with them, in-Ear. Offline/big-AI, however, is still compelling. But we don't yet have GPU neckbands ..

herogary 30 days ago [-]
Maybe for the convenience of mobile usage?
nesarkvechnep 30 days ago [-]
What's this C/C++ language?
n4r9 30 days ago [-]
Some preliminary analysis suggests that if C is an integer greater than 1, C/C++ will always evaluate to 1 [0].

[0] https://www.programiz.com/online-compiler/9fkHTct0Mybpu

gpderetta 29 days ago [-]
Actually it could be 2 or 1. IIRC the order of evaluation of operator parameters is unspecified (this used to be UB, now is merely unspecified).

/extremelypedantic

30 days ago [-]
pjmlp 30 days ago [-]
It is the use of English grammar rules to mean C and C++, naturally not everyone was that great on English classes, and specially those that never attended WG21 and WG14 meetings, or work for said companies, and enjoy being pedantic online about it.

To make it easier for those skipping English classes

"A forward dash can be used to state alternatives. A sentence that uses a forward slash in this way can be read to mean that any or all of the stated words could apply."

https://www.thesaurus.com/e/grammar/slash/

And from the world of reference C and C++.

"The C/C++ Users Journal"

https://en.wikipedia.org/wiki/C/C%2B%2B_Users_Journal

"Visual Studio C/C++ IDE and Compiler for Windows"

https://visualstudio.microsoft.com/vs/features/cplusplus/

Random job post from Microsoft,

"Perform software development in C/C++, Python, and other languages."

https://jobs.careers.microsoft.com/global/en/job/1752991/Pri...

Random job post from Apple,

"Develop/maintain bit-accurate function C/C++ model for hardware verification

Develop/maintain cycle-approximate perf C/C++ model for performance analysis - Analyze model

Excellent C/C++ programming skills"

https://jobs.apple.com/en-us/details/200448639/graphics-mode...

Random job post from Google,

"4 years of experience coding with one or more programming languages (e.g., Java, C/C++, Python)"

https://www.google.com/about/careers/applications/jobs/resul...

Random job post from NVidia,

"Strong C/C++ programming skills"

https://nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAExternalCar...

More examples from WG14, WG21 members, C and C++ compiler vendors can be provided.

DragonStrength 30 days ago [-]
A library can be written in one or the other. Note most of your "evidence" is job postings, which are generally written by non-technical folks who often mistake Javascript and Java. But no, there is no "C/C++" language or library. There are skills which help you in both. There is code that compiles with both compilers. There is no WG21 for C/C++.

Yes, Visual Studio supports both C and C++, but those are, in fact, two different languages.

You'll struggle to find the links you promised at the end because C and C++ are run by two different groups, meaning you won't be able to link us to single sources.

pjmlp 29 days ago [-]
Of course there is no "C/C++" language, only people that failed at English grammar class, and have yet to update their English parser and semantic analisis.

From Herb Sutter, a name that you might know what relevance it has for WG21, I hope.

"Keynote: Safety, Security, Safety and C / C++ - C++ Evolution"

https://www.youtube.com/watch?v=EB7yR-1317k

DragonStrength 29 days ago [-]
Context is important in English. Your example is, again, a different context.

A library is one or the other. Talking about safe systems languages where C and C++ share memory safety issues is very different from promoting a library. Thanks for the opportunity to clarify here. You’re confusing context with lack of technical precision.

pjmlp 29 days ago [-]
Nope, I just have better things in life than to be pissed off in Internet when people using English grammar rules accordingly.

We both made it quite clear where we stand, so there is hardly any value pointing out uses of C/C++ expression by other key WG14 and WG21 members, papers or products.

DragonStrength 27 days ago [-]
I only jumped in when I saw your inaccurate, condescending post. I'd hate for people to be misled by your confidence in making a pretty simple English mistake. Context always matters in English and precision matters in technical discussions.

"C/C++" has meaning in some contexts and reveals ignorance when used out of context. The post title here uses it incorrectly, but yes, there are ways to use it correctly. We disagree on that because you can't tell the difference in the two. So it is, but the actual explanation of usage is there for others who do care if they are perceived as non-technical in technical environments.

otabdeveloper4 29 days ago [-]
It's C when compiled with the gcc or clang compiler.
troymc 30 days ago [-]
1/++
rossant 30 days ago [-]
or 1++ if you got your C operator precedence wrong
iExploder 30 days ago [-]
mean you are using C++ but only the features that dont suck, so probably 10% of the language, rest is plain C
dekken_ 30 days ago [-]
It's C and Python, not C++
textlapse 30 days ago [-]
All squares are rectangles I guess.
dsego 30 days ago [-]
C can be used in C++ code, no?
epcoa 30 days ago [-]
It is true that there is C code that is conforming C++ code. However I would say if you’re using a C compiler with with “extern C” in the headers for C++ linker compatibility (as this library does) then saying C++ is about as misleading as saying a Rust library is C++ as you can link to that too.

As far as compatibility and “history” the languages are different enough now. There are both: features in C that do not exist in C++, and code that is conforming C that would be UB in C++. Saying C/C++ (for real) is usually a dumb target when it’s better to pick one and settle with that.

If it’s C, just say so. Everyone knows what extern C is, you don’t need to confuse.

leonardohn 30 days ago [-]
Even Pascal is closer to C than C++ is, yet historically people use this term implying they are very close.
Galanwe 30 days ago [-]
Something very close, but that's not what you would expect for something that markets itself as a C++ library IMHO. Especially in 2024, most people would hope (or assume) that "C++" means "C++ 11" at least.

Definitely doesn't count as _lying_, but still underwhelming.

bregma 30 days ago [-]
Yes. And C can also be used with Python and Rust. That does not make this a Rust library.
dsego 30 days ago [-]
Right, but C++ started as an extension of C and is mostly compatible and historically you could compile C with the C++ compiler. I don't think it's a good comparison.
codetrotter 30 days ago [-]
Zig can compile C. That makes this C/C++/Zig library. Right? :^)
jcelerier 30 days ago [-]
> historically you could compile C with the C++ compiler.

not any C, only the C++-compatible subset.

    int* foo = malloc(sizeof(int)); 
has never worked in C++ for instance while it's valid C. Code that worked is code that people actually did effort to express in a way compatible with a C++ compiler.
shultays 29 days ago [-]

  #ifdef _cplusplus
      #include <iostream>
      #define print() int main(){cout << "Hello world! -- from C++" << endl;}
  #elif (defined __STDC__) || (defined __STDC_VERSION__)
      #include <stdio.h>
      #define print() int main(){printf("Hello world! -- from C\n");}
  #else
  import builtins
  print = lambda : builtins.print("Hello world! -- from Python")
  #endif
  
  print()
Some python code works in C and C++ as well but people don't group them together and call Python/C/C++
dsego 28 days ago [-]
You must admit that C/Python doesn't quite have the same cachet as C/C++. C & C++ also share the same name, C++ was born as a derivative of C (with classes), they have the same syntax, logical constructs etc. Python is not even a systems language.
dekken_ 30 days ago [-]
Depends, not all C is C++, eg, there is no (yet) `restrict` keyword in C++ (even if lots of C++ compilers support __restrict__, it's not in the spec)
BrannonKing 30 days ago [-]
So are they going for feature parity with librosa? I think that would be great.
30 days ago [-]
gosub100 30 days ago [-]
Can this be used for audio fingerprinting?
zombot 29 days ago [-]
How can a Python library support iOS?
adamnemecek 29 days ago [-]
It’s a cpp library with python bindings.
leonardohn 30 days ago [-]
[flagged]
morning-coffee 30 days ago [-]
Do you have one in safe Rust? See, we've only just met, and I don't know how you handle your ptr/len arguments in C just yet. ;)
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 23:10:46 GMT+0000 (Coordinated Universal Time) with Vercel.