It's really a pity that they do this now. Some of their older papers had actually quite some valuable information, comments, discussions, thoughts, even commented out sections, figures, tables in it. It gave a much better view on how the paper was written over time, or how even the work processed over time. Sometimes you also see some alternative titles being discussed, which can be quite funny.
> Some of their older papers had actually quite some valuable information, comments, discussions, thoughts, even commented out sections, figures, tables in it.
I think you answered your own question.
Chinjut 249 days ago [-]
What question?
JohnKemeny 249 days ago [-]
I think I read the comment as being sceptical as to why. I withdraw my comment in that form.
toolslive 250 days ago [-]
Maybe papers need to be put under version control.
westurner 249 days ago [-]
FigShare and Zenodo grant (DataCite) DOIs for git commit tags.
Maybe papers need to contain executable test assertions.
enriquto 250 days ago [-]
> Removes all comments from your code (yes, those are visible on arXiv and you do not want them to be).
Why not? I love to peek at .tex file comments, and secretively hope that somebody somewhere is reading mine...
pantalaimon 250 days ago [-]
Those comments might also explain how some cool figure was done
lou1306 250 days ago [-]
Ehh sometimes you have additional results or insightful remarks that simply don't fit into the page limits. You may want to keep those for yourself and use them for a separate publications rather than give them away.
JohnKemeny 250 days ago [-]
Well, you don't have page limits on arXiv, though.
michaelmior 250 days ago [-]
This is true, but arXiv submissions are often prepared with a target venue in mind that does have page limits.
JohnKemeny 250 days ago [-]
Also true, but the arxiv version is often (in my experience) containing the entire paper. Indeed, many conferences ask people to submit the full version to arXiv.
michaelmior 249 days ago [-]
Interesting. I know it frequently happens, but I've never seen a conference explicitly make that request.
JohnKemeny 249 days ago [-]
Here's an example (ICALP):
> Authors are strongly encouraged to also make full versions of their submissions freely accessible in an on-line repository such as ArXiv, HAL, ECCC.
This is from the call for papers a few years ago. The wording has changed in recent CFPs, due to employing (weak) double-blind reviewing.
They still allow uploading to arXiv (with full names and affiliations) despite being anonymous.
lou1306 249 days ago [-]
Aye, but in this context "full version" usually means "a version with more detailed proofs/results related to the paper's contributions", rather than "a version with additional contributions".
generationP 250 days ago [-]
What is the point of concealing tikz source code? It increases the size of the source archive and undermines accessibility.
MengerSponge 250 days ago [-]
And obfuscating "raw simulation data"? It's not pro-research fraud, but it's what a person who was pro-research fraud would prefer.
mattkrause 249 days ago [-]
Agreed that the phrasing is suspicious!
However, it’s pointless or even counterproductive to embed the raw high-resolution data in the paper because it doesn’t show up in the rendered copy but balloons its size. For 6.5” (i.e., full width) figure printed at 300 dpi, you can only show 2100 points horizontally—-and realistically a lot less. Upload the raw traces somewhere and add a link.
Source: As a grad student, I stupidly turned a simple poster into a multi-gigabyte monstrosity by embedding lots of raw data. The guy at the print shop was not happy when it crashed his large-format printer!
MengerSponge 248 days ago [-]
Same! I've accidentally rendered a PDF monstrosity where every data point was represented in full vector graphic glory. It was absolutely enormous and dumb, because you couldn't tell that from the figure.
Generate high quality graphics, with the limitations of print, digital displays, and attention in mind. Then toss your data up on Zenodo and cite its DOI.
Obfuscating is the wrong word. "Decimate", "project", "render" are all better options, depending on what you mean. Punning render is the most fun of that lot, FWIW.
DominikPeters 250 days ago [-]
It's also nice for other people to reuse and adapt your figure, or include it in beamer presentations.
jpfr 250 days ago [-]
Many researchers learn LaTeX by looking at the idioms used for the papers they really like.
That includes code for Tikz figures.
I hope people will use this tool only to remove the inadvertent disclosure of commented regions and to reduce the file size. But keep the LaTeX source intact otherwise!
WanderPanda 249 days ago [-]
It needs to be intact, the pdf is rendered by the arxiv backend based on the source
semi-extrinsic 249 days ago [-]
You can upload only the PDF on ArXiv. Useful when you for some reason (e.g. client request) publish in certain engineering conferences that only allow Word submissions...
cozzyd 249 days ago [-]
if arxiv detects that it's a latex-generated PDF, it will reject it. Though it's probably possible to launder the latex-generated PDF through ghostscript or something to evade detection (I haven't tried...)
DominikPeters 250 days ago [-]
To remove comments, one can also run, for example `latexpand --empty-comments --keep-includes --expand-bbl document.bbl document.tex > document-arxiv-v1.tex`. Latexpand should come pre-installed with texlive. Without the `--keep-includes` option, it also flattens the tex files into one.
But I'd consider removing comments by hand and leaving any comments that are potentially insightful.
sabjut 250 days ago [-]
I wish journals would start accepting Typst[0] files. It is definitely the format of the next decade in my opinion. It's both open source and highly performant.
Sadly existing legacy structures prevent it from gaining the critical mass needed for it to thrive just yet.
Some of those are redundant (arxiv will complain if there are unused files, must commonly by accidentally adding the .bib file). My make arxiv target on papers usually just calls latexpand to cull comments and modifies all image includes to not be in a subdirectory (then prepares a tar file with the modified source and all figures).
...and here the tool to quickly inspect comments that were left in the LaTeX
firsching 249 days ago [-]
You can even get sharable links to the comments...
auggierose 250 days ago [-]
Or, don't put your stuff on the arXiv, but put it on zenodo. You also get a DOI, and you can just publish the PDF, not the source. You can even restrict access to the PDF, and create share links with access to it.
evanb 250 days ago [-]
You get a DOI on the arXiv. You can just publish the PDF on the arXiv, but this is a sure sign you are a crackpot.
auggierose 250 days ago [-]
You cannot just publish the PDF, they have checks that make sure that you didn't produce your PDF with LaTeX. There are probably ways to get around that, but why? Just use zenodo instead.
Or just publish on zenodo, without all that fuss. The reasons the ArXiv gives may be good from their point of view, but if you don’t care too much about that but have your own good reasons for not wanting to publish your source, then zenodo is a great and in many respects superior alternative, no questions asked.
auggierose 250 days ago [-]
You mean, like Grisha Perelman?
evanb 249 days ago [-]
There are exceptions to every rule.
auggierose 249 days ago [-]
If you are “sure” I expect 100% correctness.
evanb 249 days ago [-]
See, every rule has an exception.
auggierose 249 days ago [-]
Let's assume that every rule has an exception. Then this rule must have an exception as well, so there is a rule with no exception. That is a contradiction.
So most definitely, there are some rules with no exception. The ones you are sure about should be among them.
frumiousirc 250 days ago [-]
arXiv issues DOIs for submissions.
auggierose 250 days ago [-]
I didn't say otherwise. In fact, the "also" is meant to express exactly that.
249 days ago [-]
bigbacaloa 250 days ago [-]
[dead]
Rendered at 01:26:56 GMT+0000 (Coordinated Universal Time) with Vercel.
E.g. from https://arxiv.org/abs/1804.09849:
%\title{Sequence-to-Sequence Tricks and Hybrids\\for Improved Neural Machine Translation} % \title{Mixing and Matching Sequence-to-Sequence Modeling Techniques\\for Improved Neural Machine Translation} % \title{Analyzing and Optimizing Sequence-to-Sequence Modeling Techniques\\for Improved Neural Machine Translation} % \title{Frankenmodels for Improved Neural Machine Translation} % \title{Optimized Architectures and Training Strategies\\for Improved Neural Machine Translation} % \title{Hybrid Vigor: Combining Traits from Different Architectures Improves Neural Machine Translation}
\title{The Best of Both Worlds: \\Combining Recent Advances in Neural Machine Translation\\ ~}
Also a lot of things in the Attention is all you need paper: https://arxiv.org/abs/1706.03762v1
I think you answered your own question.
Maybe papers need to contain executable test assertions.
Why not? I love to peek at .tex file comments, and secretively hope that somebody somewhere is reading mine...
> Authors are strongly encouraged to also make full versions of their submissions freely accessible in an on-line repository such as ArXiv, HAL, ECCC.
This is from the call for papers a few years ago. The wording has changed in recent CFPs, due to employing (weak) double-blind reviewing.
They still allow uploading to arXiv (with full names and affiliations) despite being anonymous.
However, it’s pointless or even counterproductive to embed the raw high-resolution data in the paper because it doesn’t show up in the rendered copy but balloons its size. For 6.5” (i.e., full width) figure printed at 300 dpi, you can only show 2100 points horizontally—-and realistically a lot less. Upload the raw traces somewhere and add a link.
Source: As a grad student, I stupidly turned a simple poster into a multi-gigabyte monstrosity by embedding lots of raw data. The guy at the print shop was not happy when it crashed his large-format printer!
Generate high quality graphics, with the limitations of print, digital displays, and attention in mind. Then toss your data up on Zenodo and cite its DOI.
Obfuscating is the wrong word. "Decimate", "project", "render" are all better options, depending on what you mean. Punning render is the most fun of that lot, FWIW.
That includes code for Tikz figures.
I hope people will use this tool only to remove the inadvertent disclosure of commented regions and to reduce the file size. But keep the LaTeX source intact otherwise!
But I'd consider removing comments by hand and leaving any comments that are potentially insightful.
Sadly existing legacy structures prevent it from gaining the critical mass needed for it to thrive just yet.
[0] https://typst.app/
...and here the tool to quickly inspect comments that were left in the LaTeX
If you disagree with their good reasons https://info.arxiv.org/help/faq/whytex.html to submit the TeX you might be granted an exception.
So most definitely, there are some rules with no exception. The ones you are sure about should be among them.