Skip to content

Synchronize Files

The goal is to safely synchronize files from a remote server into your repository using rsync and automatically open a pull request (PR) if new or updated files exist.

1. Overview

This workflow automates:

  1. Pulling files from a remote server (using rsync).
  2. Detecting changes in your repository.
  3. Creating a Pull Request if new or updated files are found.

Benefits:

  • No manual SSH steps are needed—just rely on GitHub Actions.
  • The entire sync process is auditable in Git history and PRs.
  • Keeps your private SSH key secret by storing it in GitHub Secrets.

2. Prerequisites

  1. SSH Access on Remote Server
    • Your public key must be in ~/.ssh/authorized_keys on the remote server.
  2. GitHub Repository
    • Store the synced files here.
  3. GitHub Secrets:
    • SSH_PRIVATE_KEY: The private key you’ll use to authenticate via SSH.
    • SSH_HOST: The remote server’s domain or IP (e.g., example.com).
    • SSH_USER: The username for SSH login on the remote server.

(Optionally, you can also store known host fingerprints if you prefer strict host key checking, but scp-action by default uses a relaxed SSH setting.)


3. Generate & Add SSH Keys

If you don’t already have an SSH key pair for GitHub Actions, follow these steps:

  1. Generate an SSH Key:

    bash
    ssh-keygen -t ed25519 -C "github-actions" -f github_actions_key -N ""
    • Private key: github_actions_key
    • Public key: github_actions_key.pub
  2. Copy Public Key to Server:

    bash
    cat github_actions_key.pub >> ~/.ssh/authorized_keys
  3. Add Private Key to GitHub:

    • Go to Repository → Settings → Secrets and variables → Actions.
    • Create a New repository secret named SSH_PRIVATE_KEY, pasting in the entire private key (github_actions_key).
  4. Add SSH_HOST & SSH_USER secrets as well:

    • SSH_HOST: e.g., your-server.com
    • SSH_USER: e.g., ubuntu or deploy

4. GitHub Actions Workflow

Create (or edit) the file .github/workflows/sync.yml in your repository:

yaml
name: Sync Data from Remote via Rsync

on:
  schedule:
    - cron: '0 */6 * * *'  # Runs every 6 hours
  workflow_dispatch:        # Allows manual trigger

# Optional: avoid overlapping runs if one is still in progress
concurrency: sync-remote-data

jobs:
  sync-and-pr:
    runs-on: ubuntu-latest

    # Must have write permissions to push commits and create PRs
    permissions:
      contents: write
      pull-requests: write

    steps:
      - name: Check out repository
        uses: actions/checkout@v3
        with:
          fetch-depth: 0  # Ensures full Git history, important for diffs

      - name: Sync files using appleboy/scp-action (rsync)
        uses: appleboy/scp-action@v0.1.6
        with:
          host: ${{ secrets.SSH_HOST }}
          username: ${{ secrets.SSH_USER }}
          key: ${{ secrets.SSH_PRIVATE_KEY }}
          source: "/remote/path/"
          target: "./synced_data/"
          rsync: true          # Enables rsync mode
          recursive: true      # Recurse into directories
          archive: true        # Archive mode (preserve attributes)
          overwrite: true      # Overwrite existing files locally
          strip_components: 0  # Adjust if you need to drop leading dirs
          # Pass additional arguments (e.g. --delete) to rsync:
          args: "--delete"

      - name: Configure Git
        run: |
          git config --global user.name "GitHub Action"
          git config --global user.email "github-actions@github.com"

      - name: Check for Changes
        run: |
          git add .
          if git diff --cached --quiet; then
            echo "No changes detected. Exiting."
            exit 0
          fi

      - name: Commit and Push Changes
        id: commit_and_push
        run: |
          TIMESTAMP=$(date -u +'%Y%m%d%H%M%S')
          BRANCH_NAME="sync-updates-$TIMESTAMP"
          echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV

          git checkout -b "$BRANCH_NAME"
          git commit -m "Sync remote data on $(date -u +'%Y-%m-%d %H:%M:%S UTC')"
          git push origin HEAD

      - name: Create Pull Request
        uses: peter-evans/create-pull-request@v5
        with:
          branch: ${{ env.BRANCH_NAME }}
          title: "Sync updates from remote server"
          body: "This PR contains updated files synced via rsync."
          labels: "automation"

Step-by-Step Explanation

  1. Triggers
    • schedule (every 6 hours) and workflow_dispatch (manual) let you control when syncs run.
  2. appleboy/scp-action
    • We specify rsync: true to use rsync instead of plain scp.
    • source is /remote/path/ (on the remote server).
    • target is ./synced_data/ (in the local repository).
    • args: "--delete" will remove local files that no longer exist remotely, mirroring the remote folder exactly.
  3. Git Configuration & Check
    • We configure the commit name/email for the automated commits.
    • If no changes are detected, we exit early.
  4. Commit, Push, and PR

5. Usage Tips & Customization

  1. Filtering / Excluding Files
    • If needed, you can add more args for rsync, such as --exclude "somefile" or --exclude "*.log".
    • Example:
      yaml
      args: "--delete --exclude '*.log' --exclude 'node_modules'"
  2. Large File Handling
    • Consider Git LFS for storing large files.
  3. Branch Reuse
    • If you don’t want a new branch each run, hardcode a branch like sync-updates and configure the PR action to reuse that branch. You’ll avoid multiple open PRs.
  4. SSH Security
    • By default, appleboy/scp-action sets StrictHostKeyChecking=no. For stricter verification, you could supply your remote’s known hosts in a separate step or as an action input.
  5. Schedule Adjustments
    • Modify the cron: '0 */6 * * *' expression for different intervals. For example, '0 3 * * *' runs daily at 3 AM (UTC).

6. Testing & Verification

  1. Manual Run
    • Go to your repository’s Actions tab, select “Sync Data from Remote via Rsync,” and click “Run workflow”.
  2. Check the Logs
    • Confirm that the appleboy/scp-action step successfully connected via SSH and copied the files.
  3. Pull Request
    • After the run, a new Pull Request should appear in the repository, containing the synced changes.
  4. Merge or Close
    • Review the changes. If everything is correct, merge the PR. If not, close or make adjustments as needed.

7. Conclusion

Using appleboy/scp-action with rsync provides a clean, secure, and automated way to pull data from a remote server into your GitHub repository. With scheduled triggers, all updates come in as Pull Requests—giving you a clear audit trail and an easy way to review or revert changes.

For further customization or troubleshooting, consult the official appleboy/scp-action documentation or reach out to your DevOps team.