File Transfer Guide¶
Learn efficient methods to transfer files between your local machine and OSC.
Quick Reference¶
| Method | Best For | Speed | Ease of Use |
|---|---|---|---|
| VS Code | Small files (<1 MB), editing | Medium | ⭐⭐⭐⭐⭐ |
| SCP | Single files, medium size | Medium | ⭐⭐⭐⭐ |
| Rsync | Large files (>100 MB), directory sync | Fast | ⭐⭐⭐ |
| SFTP | Interactive browsing | Medium | ⭐⭐⭐⭐ |
| OnDemand | Web uploads (<100 MB) | Medium | ⭐⭐⭐⭐⭐ |
| Git | Code and scripts only | Fast | ⭐⭐⭐⭐ |
flowchart TD
A{What are you\ntransferring?} --> B[Code / scripts]
A --> C[Small files\n< 1 MB]
A --> D[Medium files\n1–100 MB]
A --> E[Large files / dirs\n> 100 MB]
B --> F[Git push/pull]
C --> G[VS Code\ndrag & drop]
D --> H{Single file or\ndirectory?}
H -->|Single file| I[SCP]
H -->|Directory| J[Rsync]
E --> K[Rsync\n--partial --progress] Method 1: VS Code (Easiest)¶
If you're using Remote Development, VS Code makes file transfer simple.
Drag and Drop¶
- Connect to OSC via Remote-SSH
- Drag files from your local file explorer into VS Code's file browser
- Files upload automatically
Download Files¶
- Right-click any file in VS Code
- Select "Download..."
- Choose destination
Upload Files¶
- Right-click in the file browser
- Select "Upload..."
- Choose files to upload
Method 2: SCP (Secure Copy)¶
SCP is built into SSH and works from the command line.
Basic Usage¶
# Upload file to OSC
scp local_file.txt pitzer:~/
# Upload to specific directory
scp local_file.txt pitzer:~/projects/data/
# Download file from OSC
scp pitzer:~/remote_file.txt ./
# Download to specific directory
scp pitzer:~/remote_file.txt ~/Downloads/
Copy Directories¶
# Upload directory
scp -r local_directory/ pitzer:~/remote_directory/
# Download directory
scp -r pitzer:~/remote_directory/ ./local_directory/
Multiple Files¶
# Upload multiple files
scp file1.txt file2.txt file3.txt pitzer:~/data/
# Download multiple files
scp pitzer:~/data/*.csv ./
With Progress¶
Method 3: Rsync (Recommended for Large Transfers)¶
Rsync is the most efficient method for large files and directories.
Why Rsync?¶
- ✅ Resumes interrupted transfers
- ✅ Only transfers changed files
- ✅ Shows progress
- ✅ Can preserve permissions and timestamps
- ✅ Compression during transfer
Basic Rsync¶
# Upload directory to OSC
rsync -avz --progress local_directory/ pitzer:~/remote_directory/
# Download directory from OSC
rsync -avz --progress pitzer:~/remote_directory/ ./local_directory/
Rsync Options Explained¶
-a(archive): Preserves permissions, timestamps, symbolic links-v(verbose): Show files being transferred-z(compress): Compress during transfer--progress: Show transfer progress-h(human-readable): Show sizes in KB, MB, GB--exclude: Exclude files/patterns--delete: Delete files in destination not in source
Common Rsync Commands¶
# Sync with human-readable progress
rsync -avzh --progress source/ pitzer:~/destination/
# Exclude certain files
rsync -avz --progress --exclude='*.log' --exclude='.git/' source/ pitzer:~/destination/
# Dry run (see what would be transferred)
rsync -avzn --progress source/ pitzer:~/destination/
# Sync and delete files not in source
rsync -avz --progress --delete source/ pitzer:~/destination/
# Resume interrupted transfer
rsync -avz --progress --partial source/ pitzer:~/destination/
Rsync Exclude Patterns
Create .rsync-exclude file:
Use it:
Large Dataset Sync
# For multi-GB datasets
rsync -avzh --progress \
--partial --partial-dir=.rsync-partial \
--exclude='*.log' \
./dataset/ pitzer:~/projects/data/dataset/
Options:
--partial: Keep partial files if transfer interrupted--partial-dir: Store partial files in hidden directory
Method 4: OSC OnDemand (Web Interface)¶
OSC OnDemand includes a built-in file manager for uploading, downloading, and browsing files. See the OnDemand Guide for details.
Method 5: Git (For Code)¶
For code and small files, use Git:
# On local machine
git add .
git commit -m "Update code"
git push origin main
# On OSC
git pull origin main
Best Practices with Git¶
- ✅ Use for code, scripts, configs
- ✅ Use for small data files (< 10 MB)
- ❌ Don't commit large datasets
- ❌ Don't commit binary files (model checkpoints)
- ❌ Don't commit generated files
Use .gitignore:
# Python
__pycache__/
*.pyc
.venv/
# Data
*.csv
*.h5
*.hdf5
*.npy
data/
datasets/
# Models
*.pth
*.ckpt
checkpoints/
Best Practices¶
1. Organize Your Files¶
On OSC:
~/
├── projects/
│ ├── project1/
│ │ ├── code/
│ │ ├── data/
│ │ └── results/
│ └── project2/
├── scratch/ # Temporary large files
└── shared/ # Shared with lab members
2. Use Scratch Space for Large Data¶
# Scratch space (faster, more space, but not backed up)
cd $TMPDIR # Temporary scratch
cd /fs/scratch/ # Persistent scratch (your project)
3. Compress Before Transfer¶
# On local machine
tar -czf dataset.tar.gz dataset/
rsync -avz --progress dataset.tar.gz pitzer:~/
# On OSC
ssh pitzer
tar -xzf dataset.tar.gz
Checksum Verification for Large Transfers
5. Use .rsyncignore¶
Create project-level .rsyncignore:
Sync command:
Troubleshooting¶
Transfer Interrupted¶
Rsync resumes automatically:
SCP doesn't resume - use rsync instead.
Permission Denied¶
# Check permissions on OSC
ssh pitzer
ls -la ~/destination/
chmod 755 ~/destination/ # Fix if needed
Slow Transfer Speed¶
Causes: - Network congestion - Large number of small files - Uncompressed transfer
Solutions:
# Compress during transfer
rsync -avz --progress source/ pitzer:~/destination/
# Archive first, then transfer
tar -czf archive.tar.gz source/
rsync -avz --progress archive.tar.gz pitzer:~/
Connection Drops During Transfer¶
Use tmux or screen to keep transfers running if your connection drops. For tmux setup and usage, see Remote Development — Using tmux for Persistent Sessions.
Disk Quota Exceeded¶
Next Steps¶
- Set up Remote Development
- Explore Job Submission