NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
ArXiv LaTeX Cleaner: Clean the LaTeX code of your paper to submit to ArXiv (github.com)
albertzeyer 2 days ago [-]
It's really a pity that they do this now. Some of their older papers had actually quite some valuable information, comments, discussions, thoughts, even commented out sections, figures, tables in it. It gave a much better view on how the paper was written over time, or how even the work processed over time. Sometimes you also see some alternative titles being discussed, which can be quite funny.

E.g. from https://arxiv.org/abs/1804.09849:

%\title{Sequence-to-Sequence Tricks and Hybrids\\for Improved Neural Machine Translation} % \title{Mixing and Matching Sequence-to-Sequence Modeling Techniques\\for Improved Neural Machine Translation} % \title{Analyzing and Optimizing Sequence-to-Sequence Modeling Techniques\\for Improved Neural Machine Translation} % \title{Frankenmodels for Improved Neural Machine Translation} % \title{Optimized Architectures and Training Strategies\\for Improved Neural Machine Translation} % \title{Hybrid Vigor: Combining Traits from Different Architectures Improves Neural Machine Translation}

\title{The Best of Both Worlds: \\Combining Recent Advances in Neural Machine Translation\\ ~}

Also a lot of things in the Attention is all you need paper: https://arxiv.org/abs/1706.03762v1

JohnKemeny 2 days ago [-]
> Some of their older papers had actually quite some valuable information, comments, discussions, thoughts, even commented out sections, figures, tables in it.

I think you answered your own question.

Chinjut 1 days ago [-]
What question?
JohnKemeny 1 days ago [-]
I think I read the comment as being sceptical as to why. I withdraw my comment in that form.
toolslive 2 days ago [-]
Maybe papers need to be put under version control.
westurner 1 days ago [-]
FigShare and Zenodo grant (DataCite) DOIs for git commit tags.

Maybe papers need to contain executable test assertions.

enriquto 2 days ago [-]
> Removes all comments from your code (yes, those are visible on arXiv and you do not want them to be).

Why not? I love to peek at .tex file comments, and secretively hope that somebody somewhere is reading mine...

pantalaimon 2 days ago [-]
Those comments might also explain how some cool figure was done
lou1306 2 days ago [-]
Ehh sometimes you have additional results or insightful remarks that simply don't fit into the page limits. You may want to keep those for yourself and use them for a separate publications rather than give them away.
JohnKemeny 2 days ago [-]
Well, you don't have page limits on arXiv, though.
michaelmior 2 days ago [-]
This is true, but arXiv submissions are often prepared with a target venue in mind that does have page limits.
JohnKemeny 1 days ago [-]
Also true, but the arxiv version is often (in my experience) containing the entire paper. Indeed, many conferences ask people to submit the full version to arXiv.
michaelmior 1 days ago [-]
Interesting. I know it frequently happens, but I've never seen a conference explicitly make that request.
JohnKemeny 1 days ago [-]
Here's an example (ICALP):

> Authors are strongly encouraged to also make full versions of their submissions freely accessible in an on-line repository such as ArXiv, HAL, ECCC.

This is from the call for papers a few years ago. The wording has changed in recent CFPs, due to employing (weak) double-blind reviewing.

They still allow uploading to arXiv (with full names and affiliations) despite being anonymous.

lou1306 1 days ago [-]
Aye, but in this context "full version" usually means "a version with more detailed proofs/results related to the paper's contributions", rather than "a version with additional contributions".
jpfr 2 days ago [-]
Many researchers learn LaTeX by looking at the idioms used for the papers they really like.

That includes code for Tikz figures.

I hope people will use this tool only to remove the inadvertent disclosure of commented regions and to reduce the file size. But keep the LaTeX source intact otherwise!

WanderPanda 1 days ago [-]
It needs to be intact, the pdf is rendered by the arxiv backend based on the source
semi-extrinsic 1 days ago [-]
You can upload only the PDF on ArXiv. Useful when you for some reason (e.g. client request) publish in certain engineering conferences that only allow Word submissions...
cozzyd 17 hours ago [-]
if arxiv detects that it's a latex-generated PDF, it will reject it. Though it's probably possible to launder the latex-generated PDF through ghostscript or something to evade detection (I haven't tried...)
generationP 2 days ago [-]
What is the point of concealing tikz source code? It increases the size of the source archive and undermines accessibility.
DominikPeters 2 days ago [-]
It's also nice for other people to reuse and adapt your figure, or include it in beamer presentations.
MengerSponge 2 days ago [-]
And obfuscating "raw simulation data"? It's not pro-research fraud, but it's what a person who was pro-research fraud would prefer.
mattkrause 1 days ago [-]
Agreed that the phrasing is suspicious!

However, it’s pointless or even counterproductive to embed the raw high-resolution data in the paper because it doesn’t show up in the rendered copy but balloons its size. For 6.5” (i.e., full width) figure printed at 300 dpi, you can only show 2100 points horizontally—-and realistically a lot less. Upload the raw traces somewhere and add a link.

Source: As a grad student, I stupidly turned a simple poster into a multi-gigabyte monstrosity by embedding lots of raw data. The guy at the print shop was not happy when it crashed his large-format printer!

DominikPeters 2 days ago [-]
To remove comments, one can also run, for example `latexpand --empty-comments --keep-includes --expand-bbl document.bbl document.tex > document-arxiv-v1.tex`. Latexpand should come pre-installed with texlive. Without the `--keep-includes` option, it also flattens the tex files into one.

But I'd consider removing comments by hand and leaving any comments that are potentially insightful.

sabjut 2 days ago [-]
I wish journals would start accepting Typst[0] files. It is definitely the format of the next decade in my opinion. It's both open source and highly performant.

Sadly existing legacy structures prevent it from gaining the critical mass needed for it to thrive just yet.

[0] https://typst.app/

whatshisface 1 days ago [-]
They could produce TeX files.
cozzyd 22 hours ago [-]
Some of those are redundant (arxiv will complain if there are unused files, must commonly by accidentally adding the .bib file). My make arxiv target on papers usually just calls latexpand to cull comments and modifies all image includes to not be in a subdirectory (then prepares a tar file with the modified source and all figures).
1 days ago [-]
auggierose 2 days ago [-]
Or, don't put your stuff on the arXiv, but put it on zenodo. You also get a DOI, and you can just publish the PDF, not the source. You can even restrict access to the PDF, and create share links with access to it.
evanb 2 days ago [-]
You get a DOI on the arXiv. You can just publish the PDF on the arXiv, but this is a sure sign you are a crackpot.
auggierose 2 days ago [-]
You cannot just publish the PDF, they have checks that make sure that you didn't produce your PDF with LaTeX. There are probably ways to get around that, but why? Just use zenodo instead.
evanb 1 days ago [-]
https://info.arxiv.org/help/submit_pdf.html explains all the constraints on direct PDF publication.

If you disagree with their good reasons https://info.arxiv.org/help/faq/whytex.html to submit the TeX you might be granted an exception.

auggierose 1 days ago [-]
Or just publish on zenodo, without all that fuss. The reasons the ArXiv gives may be good from their point of view, but if you don’t care too much about that but have your own good reasons for not wanting to publish your source, then zenodo is a great and in many respects superior alternative, no questions asked.
auggierose 2 days ago [-]
You mean, like Grisha Perelman?
evanb 1 days ago [-]
There are exceptions to every rule.
auggierose 1 days ago [-]
If you are “sure” I expect 100% correctness.
evanb 1 days ago [-]
See, every rule has an exception.
auggierose 1 days ago [-]
Let's assume that every rule has an exception. Then this rule must have an exception as well, so there is a rule with no exception. That is a contradiction.

So most definitely, there are some rules with no exception. The ones you are sure about should be among them.

frumiousirc 2 days ago [-]
arXiv issues DOIs for submissions.
auggierose 2 days ago [-]
I didn't say otherwise. In fact, the "also" is meant to express exactly that.
firsching 2 days ago [-]
https://github.com/mo271/arxiv-comments

...and here the tool to quickly inspect comments that were left in the LaTeX

firsching 1 days ago [-]
You can even get sharable links to the comments...
bigbacaloa 2 days ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 01:03:54 GMT+0000 (Coordinated Universal Time) with Vercel.