As a scientist, it is important to stay up to date on new work, and there is no better repository than arXiv. The main problem is that current arXiv categories are a bit too broad: I work in robotics, but not haptics or human-robot interaction. These are interesting fields but I don’t need daily updates.
My solution: query the arXiv API! Here’s how to use my script and set up your own query.
In arxivfeed.py
set authors, subjects, and keywords that are interesting to you. The feed will contain all papers by specified authors, and any papers within the subjects with abstracts that match your keywords.
The arxivfeed.py
script saves everything to the file specified in html_file
. I added a GitHub action (in .github/workflows
) to automatically run the script and push the html to my repository. Here is the workflow:
name: arxiv
on:
workflow_dispatch:
schedule:
- cron: "30 9 * * *" #runs at 9:30 UTC everyday
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout@v4 # checkout the repository content to github runner.
- name: setup python
uses: actions/setup-python@v5
with:
python-version: 3.9 #install the python needed
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install feedparser
- name: execute py script
run: |
python experimental/arxivfeed.py
- name: commit files
run: |
git config --local user.email "action@github.com"
git config --local user.name "GitHub Action"
git add -A
git diff-index --quiet HEAD || (git commit -a -m "updated arxiv" --allow-empty)
- name: push changes
uses: ad-m/github-push-action@master
with:
github_token: $
branch: master
What’s happening is relatively self-explanatory. First, I set it to run at 6:30 AM every day using UTC. Then, I set up the job. The job runs on ubuntu (latest) and does the following steps:
The only step that requires anything more is pushing the changes. To do this, you need to set up a GitHub secret. You can find secrets in the settings tab of the repository. Add a secret for actions called “ACCESS_TOKEN” and paste in a personal access token for your github (you probably should only give it write access to this repo).
That’s it! If you want to test the script go to the actions tab, click on arxiv, and hit “run workflow”.