I understand this provides a way to interact with ts data via natural language, but is there any benefit to this over tool calling to a library that uses signal processing and/or rule based algos (or using machine learning if the data is noisy/variable)?
For example, you ask an off-the-shelf LLM to analyze your ECG data. The LLM uses a tool to call out to your ECG ts analysis library. The library iterates over the data and finds stats & ECG events. It returns something like "Average heart rate: 60bpm, AFib detected at <time>, etc...". The LLM has all the info it needs to give an accurate analysis at a fraction of computational cost.
On top of that, this requires a large annotated dataset and a pre-trained model. And correct me if I'm wrong, but I don't think it's possible to have a "general" model that could handle arbitrary time series data. I.e. a model that is trained on ECG data would not be compatible with stock market data. And there isn't a way to have a model that understands both stock market data and ECG data.
manquer 120 days ago [-]
You couldn’t run that on the edge though .
The point is to be reliably run it on the edge , nobody sane would want their heart rate monitor to be run via the cloud with the uptimes and reliability that come that would come with any remote service plus the extra challenges of llm inference .
The goal would be running on the edge in addition to standard rules based detection which already these machines have and add advanced pattern detection that llms can provide to reduce alert fatigue and also detect new class of complex patterns which these sensors typically don’t.
copypaper 120 days ago [-]
> advanced pattern detection... detect new class of complex patterns
This sounds great and all, but it's wishful thinking. There isn't anything in this supporting that it's able to find any meaningful patterns beyond existing solutions (i.e. standard rules based detection/machine learning as mentioned above).
What they've essentially done is taken a dataset in which each report was "annotated with a report string (generated by cardiologist or automatic interpretation by ECG-device)" [1] and used it with a series of templates (i.e. questions to ask the llm) from the ECG-QA paper [2] to fine-tune a model to achieve 65% accuracy with solely pattern recognition and 85% accuracy with pattern+clinical context (i.e. patient history).
The 42 template questions they used (as mentioned in 4.1 in the paper) can each be evaluated deterministically via code and retrieved via a tool call for any llm to parse. And I argue that the results would be the same, if not better, for a fraction of the cost. Doing calculations like this on time series data is very very quick. A couple ms at most. I don't see why this couldn't be run on the edge.
Plus, Table 9 shows this thing takes a minimum of 7GB of ram usage with a 270m parameter model and ~15-20GB for a 1B model. I don't see how this could be run on the edge considering most phones have 6-8GB of ram.
Thank you for the detailed breakdown with reference.
I just wanted to show what would be the motivation in this line of research of building fine tuned light-weight foundation models like this , I didn’t mean to imply this paper already achieves those goals.
The tech and hardware is not yet ready as you point out both in terms of performance and what it can actually do currently , but the key thing to be excited about is that gap is within the realm of possibility to close in next few years with the right funding.
SebastianSosa 120 days ago [-]
I understand this provides a conversation interface for interacting with internet scale data (ChatGPT), but is there any benefit to this over searching in Google then clicking on the top link, (avoiding the ad) clicking accept my cookies, reading the header, scrolling down, Xing out of premium subscription, reading rest of article, repeat for the 4 next links?
Ok bro.
copypaper 120 days ago [-]
> Limitations: ... Finally, while we report strong results on individual datasets, we
have not yet demonstrated generalization to unseen data, an essential step toward general TSLMs.
From the paper itself...
Imagine you asked ChatGPT a question but it could only give you answers from a single blog.
let_tim_cook_ 120 days ago [-]
"Stanford Repo Released Sep 31, 2025" Seems like something sampled from a distribution with non-zero probability that the day after Sep 30, 2025 would is the 31st....
rjakob 120 days ago [-]
Thanks for the note. Ironically, the post is about models built to understand time.
120 days ago [-]
lomase 120 days ago [-]
They fixed it already.
Animats 120 days ago [-]
The underlying work is something called "Flamingo".[1] This is a system for understanding interleaved text and images in sequence. So it can process two "modalities" that are both sequential. This new work seems to put some kind of time token in one "modality" channel, leading to more awareness of time.
(The web site is too cute. Applying a left to right gradient on text is a bit much.)
A fun litmus test for it would be to de-trend S&P500 to its individual components and to identify and rank contributions of all 500 stocks. But that alone would not get it a job at Rentec or the NSA.
Unlike most commercial & medical applications where signals are stationary with white (uncorrelated) noise, the NSA & Rentec mostly deal with non-stationary signals with regime changes and correlated noise, which can't be denoised without loss of information.
The idea is not so much to predict the next stock price tick or to decipher an intercepted signal (most likely encrypted anyways), but rather to detect "regime changes", ie quickest detection of a change of pattern in non-stationary signals. Then the detected pattern is matched to known trading patterns for a particular stock or to the expected spy activities.
brandonb 120 days ago [-]
This is very cool! From the paper, this technique seems to work well for question answering in time-series.
In medical AI, IMO, the most exciting work is detecting disease signals too subtle for humans—for example, estimating ejection fraction from an ECG (which cardiologists can’t do this, but algorithms can and have been tested in RCTs: https://www.nature.com/articles/s41591-021-01335-4
).
Since OpenTSLM tokenizes time-series into an LLM embedding space, would that process prevent capturing such subtle signals? Or could the approach be extended to handle that use case?
RealLast 120 days ago [-]
OpenTSLM models are exactly made to capture these subtle signals.
That was one of the original motivations. The model integrates the raw time series data via cross attention, with concrete time series representations learned by a raw time series encoder.
brandonb 120 days ago [-]
Can you explain how? If I'm understanding the paper right, the timeseries encoding is a Conv1D and the cross-attention layer is constrained to output the token space of a pre-trained LLM. My naive expectation is these constraints would make the model less expressive / fine-tunable to pick up on these types of subtle signals.
But obviously ML is an empirical field, so if you found that a constrained architecture worked well in practice, that's an interesting result in its own right.
RealLast 120 days ago [-]
Sure! There is more after the 1D conv, another transformer architecture that encodes further features of the time series. The LLM can then basically query this encoder for information, also able to capture more subtle patterns. In away it's similiar to how some vision language models work.
aerugo_ 115 days ago [-]
What I really want to know is - how do LLMs understand time series at all? Admittedly, even the best LLMs are not fantastic at analyzing time series and tabular data, but they also don’t completely suck at it. Why is that? They seem better at it that my intuition tells me they should be.
In my opinion we need a multi-modal model that is great at both tabular datasets and text analysis. Most analytical work in economics, policy, public health, medicine etc requires a combination of crosschecking between both. Current gen LLMs are not good enough at generating novel insights by looking at tables and text at the same time. I also haven’t have found any data on this so please serve it to be on a plate if I’m wrong.
esafak 120 days ago [-]
Wouldn't it be better to have the model write a script that calls a TS library and give it access to an interpreter to run it? That's how a human would do it. I'm not convinced of the need to bake this into the model. What can you do with native TS capability that you can't by tool calling?
sync 120 days ago [-]
Anthropic is encouraging the "have the model write a script" technique as well, buried in their latest announcement on Claude Agent SDK, this stuck with me:
> The Claude Agent SDK excels at code generation—and for good reason. Code is precise, composable, and infinitely reusable, making it an ideal output for agents that need to perform complex operations reliably.
> When building agents, consider: which tasks would benefit from being expressed as code? Often, the answer unlocks significant capabilities.
Does it actually have a concept of time? Does it understand causality?
esafak 120 days ago [-]
There are papers on that, such as https://arxiv.org/abs/2410.15319. Time series modeling will not bring about an understanding of causality except in a weak sense https://en.wikipedia.org/wiki/Granger_causality. To truly connect a cause and effect you need a graphical model. And automated causal discovery, the hardest part of which is proposing the nodes of the graph, is a nascent field.
RealLast 120 days ago [-]
I think you missed the point. Would you call an image analysis library to describe an image or reason over a sequence of images? Check out some of the plots in the paper to see what these models can do.
esafak 120 days ago [-]
I would if the image analysis library was backed by a VLM. I have not fully read the paper, but couldn't figure 6 have been done by an LLM writing a script that calls libraries for time series feature extraction and writing a hypothesis test or whatever? They will do the heavy lifting and return a likelihood ratio or some statistic that is interpretable to an LLM.
pks016 120 days ago [-]
Looks promising! I'll try it once I get home today.
I work with a large number of audio time series data (not words and all have subtle variation). It would be interesting to see how it compares to traditional statistical methods.
resters 120 days ago [-]
It would be nice if claude code could monitor a time series of my heart rate to realize when it is soiling the bed.
lsh0 120 days ago [-]
fwiw, I'm finding claude2 released a few days ago to be a lot less infuriating
resters 120 days ago [-]
I had stopped using the sonnet model for anything important after some very big goof ups. 4.5 is definitely significantly better.
I've been totally blown away by opus except on a project I'm working on I discovered a few unexpected weaknesses that have cost quite a bit of time.
LudwigNagasena 120 days ago [-]
As I understand it, the model is trained for classification and interpretation of time series data, but have you tried benchmarking it at forecasting? Explanation and recommendations are often deeply intertwined with forecasts, so there must be at least some effect there?
woadwarrior01 119 days ago [-]
Link to their huggingface account with (some of) the model weights. I couldn't find a link to it on their website, white paper or GitHub.
If you view a byte sequence as a time series then I suppose this could be a good file compression algorithm.
lacoolj 120 days ago [-]
Like hitting a thumb tack with a sledge hammer
amelius 120 days ago [-]
It works.
zubairov 120 days ago [-]
This is very cool! Amazing work guys!
llmslave 120 days ago [-]
Guaranteed there are hedge funds with language models that can predict time series. Alot of really good time series research has never been published, and is locked in some guys head that lives in a 20 million dollar apartment in NYC
fogzen 120 days ago [-]
When I worked at an ML hedge fund 6 years ago, t-SNE performed the best and momentum was the feature that best predicted stock movements.
The actual algorithms for predicting price movement were fairly simplistic, most work was around strategies for dealing with overfitting and how to execute the trades. Accuracy was around 51-55% (a bit better than coin toss) so it was a big challenge to actually execute the trades and still make a profit after fees and other nonsense. Finding alpha is what ML is used for but that’s just the first step.
ttul 120 days ago [-]
This makes intuitive sense to me, because the system you are modeling is wide open and you’re competing against others who have the same information. Achieving much more than 51% accuracy would be extraordinary. But if you get 51% consistently over time, with leverage, you can make a good amount of money.
cwmoore 120 days ago [-]
My experience as well; seemed more accurate while prices were rising.
1980phipsi 120 days ago [-]
One of the difficulties with these models would be backtesting investment strategies. You always need to make sure that you are only using data that would have been available at the time to avoid look-ahead bias.
reactordev 120 days ago [-]
Can confirm, kdb+ exists… and you’ll probably never be able to get your hands on it. There are lots of models that use it. And they are indeed locked inside some guys head high up in the towers of midtown.
IAmGraydon 120 days ago [-]
KBD+ is no secret, but what does this have to do with anything? It's just a database optimized for time series data and has nothing to do with AI. It's widely used in the financial business and even for non-financial things like Formula-1 race analysis.
reactordev 120 days ago [-]
Cool. You missed the part where I said there are models using that. Those models are shhhhhhh…
PyTorch is no secret either yet…
The point I’m making is there are models, based on database stream data, that you’ll never get access to even if you had $100m dollars.
IAmGraydon 120 days ago [-]
I see.
fmbb 120 days ago [-]
Why would they use LLM for this?
observationist 120 days ago [-]
Predicting the future is valuable. If a model can apply the same underlying world model that it uses to accurately predict OLHC series as it does to produce English language, then you can interrogate and expand on that underlying world model in complex and very useful ways. Being able to prompt it can describe a scenario, or uncover hidden influences that wouldn't be apparent from a simple accurate prediction. Things like that allow sophistication in the tools - instead of an accurate chart with all sorts of complex indicators, you can get English explication and variations on scenarios.
You can't tell a numbers only model "ok, with this data, but now you know all the tomatoes in the world have gone rotten and the market doesn't know it yet, what's the best move?" You can use an LLM model like that, however, and with RL, which allows you to branch and layer strategies dependent on dynamic conditions and private data, for arbitrary outcomes. Deploy such a model at scale and run tens of thousands of simulations, iterating through different scenarios, and you can start to apply confidence metrics and complex multiple-degree-of-separation strategies to exploit arbitrage opportunities.
Any one of the big labs could do something like this, including modeling people, demographic samples, distributions of psychological profiles, cultural and current events, and they'd have a manipulation engine to tell them exactly who, when, and where to invest, candidates to support, messages to push and publish.
The fundamental measures of intelligence are how far into the future a system can predict across which domains. The broader the domains and farther into the future, the more intelligence, and things like this push the boundaries.
We should probably get around to doing a digital bill of rights, but I suspect it's too late already anyway, and we're full steam ahead to snow crash territory.
mikert89 120 days ago [-]
Automated hypothesis testing in the form of a search for alpha in the market is certainly being used right now. An LLM can ask new questions about correlations between assets, and run statistical tests on those correlations, in ways that previously was only possible by employing a phd statistician
wordpad 120 days ago [-]
The emergent behavior of LLMs being amazing at accurately predicting tokens in previously unseen conditions might be more powerful than more rigorous machine learning extrapolations.
Especially when you throw noisy subjective context at it.
mikepurvis 120 days ago [-]
The “prediction” in this case is I think some approximation of “ingest today’s news and social media buzz as it’s happening and predict what the financial news tomorrow morning will be.”
120 days ago [-]
riku_iki 119 days ago [-]
hypothetically LLM absorbed lots of world knowledge, and it can trace lots of deep correlations between various factors.
constantcrying 120 days ago [-]
This isn't (just) time series forecasting, it is about interacting with time series data through natural language.
senorrib 120 days ago [-]
I doubt those are language models.
RealLast 120 days ago [-]
Check it out, they are completely based on Llama and Gemma, outputting text. Models are open-source.
yawnxyz 120 days ago [-]
would be cool to use this to predict series of passages for directed evolution, e.g. appelman protocol or similar, in phage/host interactions
120 days ago [-]
qwe----3 120 days ago [-]
“Researchers from Google” (did an internship)
pdntspa 120 days ago [-]
OF COURSE the good stuff is proprietary....
ivape 120 days ago [-]
Seems like MIT?
RealLast 119 days ago [-]
Yep, fully open-source!
ivape 119 days ago [-]
You guys have a discord or anything like that by any chance?
syntaxing 120 days ago [-]
How many parameters are a basic model?
ivape 120 days ago [-]
You'd be fine tuning a base model, and they suggested 1B and 3B variants, possibly bigger.
t_mann 120 days ago [-]
> Read the White Paper
> A universal TSLM will power proactive healthcare, adaptive robotics, resilient infrastructure, and new forms of human-AI collaboration.
> scientists, engineers, and builders from ETH, Stanford, Harvard, Cambridge, TUM, CDTM, Google, Meta, AWS, and beyond
What's with all this fuss? Why not just upload your paper to arxiv? Time series models are interesting enough, but from the abstract it's not even clear whether they are using transformers or a recurrent architecture like xLSTM - arguably a more intuitive choice for time series - or something else. This website is barely distinguishable from a crypto/DeFi pitch.
RealLast 120 days ago [-]
The full paper is on the website. The arxive release of the exact same paper is pending. Click the button "read the white paper" to get the full paper.
t_mann 120 days ago [-]
[flagged]
dang 120 days ago [-]
Please don't treat people in a hostile fashion when discussing their work on HN. That's the opposite of the kind of community we want here.
"why does the website sound like some of the shadiest investment pitches I've seen" is no way to welcome someone sharing their work in good faith.
ghc 120 days ago [-]
> Few studies use cross-attention to integrate time series into LLMs
I mean, sure, but why would you need a study for that? There's plenty of prior work using cross-attention to integrate time series dynamics into non-LLM transformer models, right? Or maybe I'm assuming that integrating a time series embedding with an LLM is easier than it is.
Looking at the repo, the training data seems extremely health-focused. I guess I would have to tune the model with my own datasets if I want it to answer questions about multi-source sensor data?
orbifold 120 days ago [-]
This is a terrible idea and direction but it will not stop people from pursuing it and as soon as they have a critical mass of people reviewing each other it will go on for quite a while. Transformers for time series is one of those things that seems to make sense but not really.
EGreg 120 days ago [-]
Can you elaborate as to why, actually? What specifically makes this the case
dschaurecker 120 days ago [-]
Very cool!
iLoveOncall 120 days ago [-]
You don't need specially trained LLMs for this. My team has been using successfuly Claude 3.5 for a year for the purpose of analyzing huge time series data sets (close to the max context window), without anything special beyond a prompt describing the task at hand.
nowittyusername 120 days ago [-]
I agree, LLM's are capable of doing this right out of the box if you provide it grounding data like current time and a few other things in the system prompt. Its really odd that this is getting any attention.
Numerous research, INCLUDING the OpenTSLM paper has PROVEN they are NOT able to do this out of the box. Did you even check out the results at all? They literally compare OpenTSLM against standard text only baselines. Gemma3-270M performs better than GPT-4o using tokenized time series alone. Thus, I guess you guys are being ironic.
dang 120 days ago [-]
I understand how annoying it is when people post shallow dismissals of your work on the internet, but please don't give in to the annoyance when replying. It makes the thread worse, and it's against the HN guidelines: https://news.ycombinator.com/newsguidelines.html.
I don't know if this is your work or not, but I appreciate your wanting to defend it...we just need you to do that in a way that doesn't attack others, no matter how wrong they are or you feel they are. Easier said than done of course, but we're all working on it together.
iLoveOncall 120 days ago [-]
An experiment is not a proof.
If this is the level of one of the contributors to the OpenTSLM paper (which you very obviously are), no wonder due diligence wasn't done properly.
rjakob 120 days ago [-]
It’s less about proof and more about demonstrating a new capability that TSLMs enable. To be fair, the paper did test standard LLMs, which consistently underperformed.
@iLoveOncall, can you point to examples where out of the box models achieved good results on multiple time-series? Also, what kind of time-series data did you analyze with Claude 3.5? What exactly did you predict, and how did you assess reasoning capabilities?
RealLast 120 days ago [-]
[flagged]
NwtnsMthd 120 days ago [-]
This sounds very interesting, would you be able to share a little more about your process? What works and what doesn't?
iLoveOncall 120 days ago [-]
Unfortunately not really, but we've found (and used in production for a year) that Claude 3.5 is perfectly capable of identifying anomalies or other points of interests in very large sets of time series data.
Think of 100-200K worth of tokens formatted like this:
<Entity1>-<Entity2> <Dimension2> <ISO 8601 time +1> <value>
The only pre-filtering we do is eliminate "obviously non relevant" data, such as series where the value is completely flat the whole time, but this was done to add more data to the context, not because Claude struggled with it (it doesn't).
120 days ago [-]
RLAIF 120 days ago [-]
[dead]
120 days ago [-]
Imad_mkdm 120 days ago [-]
[flagged]
esafak 120 days ago [-]
You know this adds nothing.
120 days ago [-]
posidoli 120 days ago [-]
That is outstanding work and will revolutionize the approaches in this topic!
Y_Y 120 days ago [-]
Bad bot
ivape 120 days ago [-]
Such a blatant bot? Makes you wonder what’s lurking here.
Rendered at 19:57:44 GMT+0000 (Coordinated Universal Time) with Vercel.
For example, you ask an off-the-shelf LLM to analyze your ECG data. The LLM uses a tool to call out to your ECG ts analysis library. The library iterates over the data and finds stats & ECG events. It returns something like "Average heart rate: 60bpm, AFib detected at <time>, etc...". The LLM has all the info it needs to give an accurate analysis at a fraction of computational cost.
On top of that, this requires a large annotated dataset and a pre-trained model. And correct me if I'm wrong, but I don't think it's possible to have a "general" model that could handle arbitrary time series data. I.e. a model that is trained on ECG data would not be compatible with stock market data. And there isn't a way to have a model that understands both stock market data and ECG data.
The point is to be reliably run it on the edge , nobody sane would want their heart rate monitor to be run via the cloud with the uptimes and reliability that come that would come with any remote service plus the extra challenges of llm inference .
The goal would be running on the edge in addition to standard rules based detection which already these machines have and add advanced pattern detection that llms can provide to reduce alert fatigue and also detect new class of complex patterns which these sensors typically don’t.
This sounds great and all, but it's wishful thinking. There isn't anything in this supporting that it's able to find any meaningful patterns beyond existing solutions (i.e. standard rules based detection/machine learning as mentioned above).
What they've essentially done is taken a dataset in which each report was "annotated with a report string (generated by cardiologist or automatic interpretation by ECG-device)" [1] and used it with a series of templates (i.e. questions to ask the llm) from the ECG-QA paper [2] to fine-tune a model to achieve 65% accuracy with solely pattern recognition and 85% accuracy with pattern+clinical context (i.e. patient history).
The 42 template questions they used (as mentioned in 4.1 in the paper) can each be evaluated deterministically via code and retrieved via a tool call for any llm to parse. And I argue that the results would be the same, if not better, for a fraction of the cost. Doing calculations like this on time series data is very very quick. A couple ms at most. I don't see why this couldn't be run on the edge.
Plus, Table 9 shows this thing takes a minimum of 7GB of ram usage with a 270m parameter model and ~15-20GB for a 1B model. I don't see how this could be run on the edge considering most phones have 6-8GB of ram.
[1]: https://physionet.org/content/ptb-xl/1.0.3/ [2]: https://arxiv.org/pdf/2306.15681
I just wanted to show what would be the motivation in this line of research of building fine tuned light-weight foundation models like this , I didn’t mean to imply this paper already achieves those goals.
The tech and hardware is not yet ready as you point out both in terms of performance and what it can actually do currently , but the key thing to be excited about is that gap is within the realm of possibility to close in next few years with the right funding.
Ok bro.
From the paper itself...
Imagine you asked ChatGPT a question but it could only give you answers from a single blog.
(The web site is too cute. Applying a left to right gradient on text is a bit much.)
[1] https://arxiv.org/pdf/2204.14198
Unlike most commercial & medical applications where signals are stationary with white (uncorrelated) noise, the NSA & Rentec mostly deal with non-stationary signals with regime changes and correlated noise, which can't be denoised without loss of information.
The idea is not so much to predict the next stock price tick or to decipher an intercepted signal (most likely encrypted anyways), but rather to detect "regime changes", ie quickest detection of a change of pattern in non-stationary signals. Then the detected pattern is matched to known trading patterns for a particular stock or to the expected spy activities.
In medical AI, IMO, the most exciting work is detecting disease signals too subtle for humans—for example, estimating ejection fraction from an ECG (which cardiologists can’t do this, but algorithms can and have been tested in RCTs: https://www.nature.com/articles/s41591-021-01335-4 ).
Since OpenTSLM tokenizes time-series into an LLM embedding space, would that process prevent capturing such subtle signals? Or could the approach be extended to handle that use case?
But obviously ML is an empirical field, so if you found that a constrained architecture worked well in practice, that's an interesting result in its own right.
In my opinion we need a multi-modal model that is great at both tabular datasets and text analysis. Most analytical work in economics, policy, public health, medicine etc requires a combination of crosschecking between both. Current gen LLMs are not good enough at generating novel insights by looking at tables and text at the same time. I also haven’t have found any data on this so please serve it to be on a plate if I’m wrong.
> The Claude Agent SDK excels at code generation—and for good reason. Code is precise, composable, and infinitely reusable, making it an ideal output for agents that need to perform complex operations reliably.
> When building agents, consider: which tasks would benefit from being expressed as code? Often, the answer unlocks significant capabilities.
https://www.anthropic.com/engineering/building-agents-with-t...
I work with a large number of audio time series data (not words and all have subtle variation). It would be interesting to see how it compares to traditional statistical methods.
I've been totally blown away by opus except on a project I'm working on I discovered a few unexpected weaknesses that have cost quite a bit of time.
https://huggingface.co/OpenTSLM
The actual algorithms for predicting price movement were fairly simplistic, most work was around strategies for dealing with overfitting and how to execute the trades. Accuracy was around 51-55% (a bit better than coin toss) so it was a big challenge to actually execute the trades and still make a profit after fees and other nonsense. Finding alpha is what ML is used for but that’s just the first step.
PyTorch is no secret either yet…
The point I’m making is there are models, based on database stream data, that you’ll never get access to even if you had $100m dollars.
You can't tell a numbers only model "ok, with this data, but now you know all the tomatoes in the world have gone rotten and the market doesn't know it yet, what's the best move?" You can use an LLM model like that, however, and with RL, which allows you to branch and layer strategies dependent on dynamic conditions and private data, for arbitrary outcomes. Deploy such a model at scale and run tens of thousands of simulations, iterating through different scenarios, and you can start to apply confidence metrics and complex multiple-degree-of-separation strategies to exploit arbitrage opportunities.
Any one of the big labs could do something like this, including modeling people, demographic samples, distributions of psychological profiles, cultural and current events, and they'd have a manipulation engine to tell them exactly who, when, and where to invest, candidates to support, messages to push and publish.
The fundamental measures of intelligence are how far into the future a system can predict across which domains. The broader the domains and farther into the future, the more intelligence, and things like this push the boundaries.
We should probably get around to doing a digital bill of rights, but I suspect it's too late already anyway, and we're full steam ahead to snow crash territory.
Especially when you throw noisy subjective context at it.
> A universal TSLM will power proactive healthcare, adaptive robotics, resilient infrastructure, and new forms of human-AI collaboration.
> scientists, engineers, and builders from ETH, Stanford, Harvard, Cambridge, TUM, CDTM, Google, Meta, AWS, and beyond
What's with all this fuss? Why not just upload your paper to arxiv? Time series models are interesting enough, but from the abstract it's not even clear whether they are using transformers or a recurrent architecture like xLSTM - arguably a more intuitive choice for time series - or something else. This website is barely distinguishable from a crypto/DeFi pitch.
https://news.ycombinator.com/newsguidelines.html
I mean, sure, but why would you need a study for that? There's plenty of prior work using cross-attention to integrate time series dynamics into non-LLM transformer models, right? Or maybe I'm assuming that integrating a time series embedding with an LLM is easier than it is.
Looking at the repo, the training data seems extremely health-focused. I guess I would have to tune the model with my own datasets if I want it to answer questions about multi-source sensor data?
Numerous research, INCLUDING the OpenTSLM paper has PROVEN they are NOT able to do this out of the box. Did you even check out the results at all? They literally compare OpenTSLM against standard text only baselines. Gemma3-270M performs better than GPT-4o using tokenized time series alone. Thus, I guess you guys are being ironic.
I don't know if this is your work or not, but I appreciate your wanting to defend it...we just need you to do that in a way that doesn't attack others, no matter how wrong they are or you feel they are. Easier said than done of course, but we're all working on it together.
If this is the level of one of the contributors to the OpenTSLM paper (which you very obviously are), no wonder due diligence wasn't done properly.
Think of 100-200K worth of tokens formatted like this:
<Entity1>-<Entity2> <Dimension> <ISO 8601 time> <value>
<Entity1>-<Entity2> <Dimension> <ISO 8601 time +1> <value>
<Entity1>-<Entity2> <Dimension> <ISO 8601 time +2> <value>
<Entity1>-<Entity2> <Dimension2> <ISO 8601 time> <value>
<Entity1>-<Entity2> <Dimension2> <ISO 8601 time +1> <value>
The only pre-filtering we do is eliminate "obviously non relevant" data, such as series where the value is completely flat the whole time, but this was done to add more data to the context, not because Claude struggled with it (it doesn't).