[ PROMPT_NODE_27585 ]
Zinc Database – API Reference
[ SKILL_DOCUMENTATION ]
# ZINC Database API Reference
## Overview
Complete technical reference for programmatic access to the ZINC database, covering API endpoints, query syntax, parameters, response formats, and advanced usage patterns for ZINC22, ZINC20, and legacy versions.
## Base URLs
### ZINC22 (Current)
- **CartBlanche22 API**: `https://cartblanche22.docking.org/`
- **File Repository**: `https://files.docking.org/zinc22/`
- **Main Website**: `https://zinc.docking.org/`
### ZINC20 (Maintained)
- **API**: `https://zinc20.docking.org/`
- **File Repository**: `https://files.docking.org/zinc20/`
### Documentation
- **Wiki**: `https://wiki.docking.org/`
- **GitHub**: `https://github.com/docking-org/`
## API Endpoints
### 1. Substance Retrieval by ZINC ID
Retrieve compound information using ZINC identifiers.
**Endpoint**: `/substances.txt`
**Parameters**:
- `zinc_id` (required): Single ZINC ID or comma-separated list
- `output_fields` (optional): Comma-separated field names (default: all fields)
**URL Format**:
```
https://cartblanche22.docking.org/substances.txt:zinc_id={ZINC_ID}&output_fields={FIELDS}
```
**Examples**:
Single compound:
```bash
curl "https://cartblanche22.docking.org/[email protected]_fields=zinc_id,smiles,catalogs"
```
Multiple compounds:
```bash
curl "https://cartblanche22.docking.org/substances.txt:zinc_id=ZINC000000000001,ZINC000000000002,ZINC000000000003&output_fields=zinc_id,smiles,tranche"
```
Batch retrieval from file:
```bash
# Create file with ZINC IDs (one per line or comma-separated)
curl -X POST "https://cartblanche22.docking.org/substances.txt?output_fields=zinc_id,smiles"
-F "zinc_id=@zinc_ids.txt"
```
**Response Format** (TSV):
```
zinc_id smiles catalogs
ZINC000000000001 CC(C)O [vendor1,vendor2]
ZINC000000000002 c1ccccc1 [vendor3]
```
### 2. Structure Search by SMILES
Search for compounds by chemical structure with optional similarity thresholds.
**Endpoint**: `/smiles.txt`
**Parameters**:
- `smiles` (required): Query SMILES string (URL-encode if necessary)
- `dist` (optional): Tanimoto distance threshold (0-10, default: 0 = exact)
- `adist` (optional): Alternative distance metric (0-10, default: 0)
- `output_fields` (optional): Comma-separated field names
**URL Format**:
```
https://cartblanche22.docking.org/smiles.txt:smiles={SMILES}&dist={DIST}&adist={ADIST}&output_fields={FIELDS}
```
**Examples**:
Exact structure match:
```bash
curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1&output_fields=zinc_id,smiles"
```
Similarity search (Tanimoto distance = 3):
```bash
curl "https://cartblanche22.docking.org/smiles.txt:smiles=CC(C)Cc1ccc(cc1)C(C)C(=O)O&dist=3&output_fields=zinc_id,smiles,catalogs"
```
Broad similarity search:
```bash
curl "https://cartblanche22.docking.org/smiles.txt:smiles=c1ccccc1&dist=5&adist=5&output_fields=zinc_id,smiles,tranche"
```
URL-encoded SMILES (for special characters):
```bash
# Original: CC(=O)Oc1ccccc1C(=O)O
# Encoded: CC%28%3DO%29Oc1ccccc1C%28%3DO%29O
curl "https://cartblanche22.docking.org/smiles.txt:smiles=CC%28%3DO%29Oc1ccccc1C%28%3DO%29O&dist=2"
```
**Distance Parameters Interpretation**:
- `dist=0`: Exact match
- `dist=1-3`: Close analogs (high similarity)
- `dist=4-6`: Moderate analogs
- `dist=7-10`: Diverse chemical space
### 3. Supplier Code Search
Query compounds by vendor catalog numbers.
**Endpoint**: `/catitems.txt`
**Parameters**:
- `catitem_id` (required): Supplier catalog code
- `output_fields` (optional): Comma-separated field names
**URL Format**:
```
https://cartblanche22.docking.org/catitems.txt:catitem_id={SUPPLIER_CODE}&output_fields={FIELDS}
```
**Example**:
```bash
curl "https://cartblanche22.docking.org/catitems.txt:catitem_id=SUPPLIER-12345&output_fields=zinc_id,smiles,supplier_code,catalogs"
```
### 4. Random Compound Sampling
Generate random compound sets with optional filtering by chemical properties.
**Endpoint**: `/substance/random.txt`
**Parameters**:
- `count` (optional): Number of compounds to retrieve (default: 100, max: depends on server)
- `subset` (optional): Filter by predefined subset (e.g., 'lead-like', 'drug-like', 'fragment')
- `output_fields` (optional): Comma-separated field names
**URL Format**:
```
https://cartblanche22.docking.org/substance/random.txt:count={COUNT}&subset={SUBSET}&output_fields={FIELDS}
```
**Examples**:
Random 100 compounds (default):
```bash
curl "https://cartblanche22.docking.org/substance/random.txt"
```
Random lead-like molecules:
```bash
curl "https://cartblanche22.docking.org/substance/random.txt:count=1000&subset=lead-like&output_fields=zinc_id,smiles,tranche"
```
Random drug-like molecules:
```bash
curl "https://cartblanche22.docking.org/substance/random.txt:count=5000&subset=drug-like&output_fields=zinc_id,smiles"
```
Random fragments:
```bash
curl "https://cartblanche22.docking.org/substance/random.txt:count=500&subset=fragment&output_fields=zinc_id,smiles,tranche"
```
**Subset Definitions**:
- `fragment`: MW tranche_urls.txt <= mw_range[0]) & (df['mw'] = logp_range[0]) & (df['logp'] <= logp_range[1])
if max_hbd is not None:
mask &= df['hbd'] <= max_hbd
if phase is not None:
mask &= df['phase'] == phase
return df[mask]
# Example: Get drug-like compounds with specific properties
df = advanced_zinc_search(count=10000, subset='drug-like')
filtered = filter_by_properties(
df,
mw_range=(300, 450),
logp_range=(1.0, 4.0),
max_hbd=3,
phase=0
)
```
## Rate Limiting and Best Practices
### Rate Limiting
ZINC does not publish explicit rate limits, but users should:
- **Avoid rapid-fire requests**: Space out queries by at least 1 second
- **Use batch operations**: Query multiple ZINC IDs in single request
- **Cache results**: Store frequently accessed data locally
- **Off-peak usage**: Perform large downloads during off-peak hours (UTC nights/weekends)
### Etiquette
```python
import time
def polite_zinc_query(query_func, *args, delay=1.0, **kwargs):
"""Wrapper to add delay between queries."""
result = query_func(*args, **kwargs)
time.sleep(delay)
return result
```
### Error Handling
```python
def robust_zinc_query(url, max_retries=3, timeout=30):
"""
Query ZINC with retry logic.
Args:
url: Full ZINC API URL
max_retries: Maximum retry attempts
timeout: Request timeout in seconds
Returns:
Query results or None on failure
"""
import subprocess
import time
for attempt in range(max_retries):
try:
result = subprocess.run(
['curl', '-s', '--max-time', str(timeout), url],
capture_output=True,
text=True,
check=True
)
# Check for empty or error responses
if not result.stdout or 'error' in result.stdout.lower():
raise ValueError("Invalid response")
return result.stdout
except (subprocess.CalledProcessError, ValueError) as e:
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"Retry {attempt + 1}/{max_retries} after {wait_time}s...")
time.sleep(wait_time)
else:
print(f"Failed after {max_retries} attempts")
return None
```
## Integration with Molecular Docking
### Preparing DOCK6 Libraries
```bash
# 1. Download tranche files
wget https://files.docking.org/zinc22/H05/H05P035M400-0.db2.gz
# 2. Decompress
gunzip H05P035M400-0.db2.gz
# 3. Use directly with DOCK6
dock6 -i dock.in -o dock.out -l H05P035M400-0.db2
```
### AutoDock Vina Integration
```bash
# 1. Download MOL2 format
wget https://files.docking.org/zinc22/H05/H05P035M400-0.mol2.gz
gunzip H05P035M400-0.mol2.gz
# 2. Convert to PDBQT using prepare_ligand script
prepare_ligand4.py -l H05P035M400-0.mol2 -o ligands.pdbqt -A hydrogens
# 3. Run Vina
vina --receptor protein.pdbqt --ligand ligands.pdbqt
--center_x 25.0 --center_y 25.0 --center_z 25.0
--size_x 20.0 --size_y 20.0 --size_z 20.0
```
### RDKit Integration
```python
from rdkit import Chem
from rdkit.Chem import AllChem, Descriptors
import pandas as pd
def process_zinc_results(zinc_df):
"""
Process ZINC results with RDKit.
Args:
zinc_df: DataFrame with SMILES column
Returns:
DataFrame with calculated properties
"""
# Convert SMILES to molecules
zinc_df['mol'] = zinc_df['smiles'].apply(Chem.MolFromSmiles)
# Calculate properties
zinc_df['mw'] = zinc_df['mol'].apply(Descriptors.MolWt)
zinc_df['logp'] = zinc_df['mol'].apply(Descriptors.MolLogP)
zinc_df['hbd'] = zinc_df['mol'].apply(Descriptors.NumHDonors)
zinc_df['hba'] = zinc_df['mol'].apply(Descriptors.NumHAcceptors)
zinc_df['tpsa'] = zinc_df['mol'].apply(Descriptors.TPSA)
zinc_df['rotatable'] = zinc_df['mol'].apply(Descriptors.NumRotatableBonds)
# Generate 3D conformers
for mol in zinc_df['mol']:
if mol:
AllChem.EmbedMolecule(mol, randomSeed=42)
AllChem.MMFFOptimizeMolecule(mol)
return zinc_df
# Save to SDF for docking
def save_to_sdf(zinc_df, output_file):
"""Save molecules to SDF file."""
writer = Chem.SDWriter(output_file)
for idx, row in zinc_df.iterrows():
if row['mol']:
row['mol'].SetProp('ZINC_ID', row['zinc_id'])
writer.write(row['mol'])
writer.close()
```
## Troubleshooting
### Common Issues
**Issue**: Empty or no results
- **Solution**: Check SMILES syntax, verify ZINC IDs exist, try broader similarity search
**Issue**: Timeout errors
- **Solution**: Reduce result count, use batch queries, try during off-peak hours
**Issue**: Invalid SMILES encoding
- **Solution**: URL-encode special characters (use `urllib.parse.quote()` in Python)
**Issue**: Tranche files not found
- **Solution**: Verify tranche code format, check file repository structure
### Debug Mode
```python
def debug_zinc_query(url):
"""Print query details for debugging."""
print(f"Query URL: {url}")
result = subprocess.run(['curl', '-v', url],
capture_output=True, text=True)
print(f"Status: {result.returncode}")
print(f"Stderr: {result.stderr}")
print(f"Stdout length: {len(result.stdout)}")
print(f"First 500 chars:n{result.stdout[:500]}")
return result.stdout
```
## Version Differences
### ZINC22 vs ZINC20 vs ZINC15
| Feature | ZINC22 | ZINC20 | ZINC15 |
|---------|--------|--------|--------|
| Compounds | 230M+ purchasable | Focused on leads | ~750M total |
| API | CartBlanche22 | Similar | REST-like |
| Tranches | Yes | Yes | Yes |
| 3D Structures | Yes | Yes | Yes |
| Status | Current, growing | Maintained | Legacy |
### API Compatibility
Most query patterns work across versions, but URLs differ:
- ZINC22: `cartblanche22.docking.org`
- ZINC20: `zinc20.docking.org`
- ZINC15: `zinc15.docking.org`
## Additional Resources
- **ZINC Wiki**: https://wiki.docking.org/
- **ZINC22 Documentation**: https://wiki.docking.org/index.php/Category:ZINC22
- **ZINC API Guide**: https://wiki.docking.org/index.php/ZINC_api
- **File Access Guide**: https://wiki.docking.org/index.php/ZINC22:Getting_started
- **Publications**:
- ZINC22: J. Chem. Inf. Model. 2023
- ZINC15: J. Chem. Inf. Model. 2020, 60, 6065-6073
- **Support**: Contact via ZINC website or GitHub issues
Source: claude-code-templates (MIT). See About Us for full credits.