[ PROMPT_NODE_26521 ]
Chemical Analysis
[ SKILL_DOCUMENTATION ]
# Chemical Properties and Similarity Analysis
## Overview
DrugBank provides extensive chemical property data including molecular structures, physicochemical properties, and calculated descriptors. This information enables structure-based analysis, similarity searches, and QSAR modeling.
## Chemical Identifiers and Structures
### Available Structure Formats
- **SMILES**: Simplified Molecular Input Line Entry System
- **InChI**: International Chemical Identifier
- **InChIKey**: Hashed InChI for database searching
- **Molecular Formula**: Chemical formula (e.g., C9H8O4)
- **IUPAC Name**: Systematic chemical name
- **Traditional Names**: Common names and synonyms
### Extract Chemical Structures
```python
from drugbank_downloader import get_drugbank_root
def get_drug_structures(drugbank_id):
"""Extract chemical structure representations"""
root = get_drugbank_root()
ns = {'db': 'http://www.drugbank.ca'}
for drug in root.findall('db:drug', ns):
primary_id = drug.find('db:drugbank-id[@primary="true"]', ns)
if primary_id is not None and primary_id.text == drugbank_id:
structures = {}
# Get calculated properties
calc_props = drug.find('db:calculated-properties', ns)
if calc_props is not None:
for prop in calc_props.findall('db:property', ns):
kind = prop.find('db:kind', ns).text
value = prop.find('db:value', ns).text
if kind in ['SMILES', 'InChI', 'InChIKey', 'Molecular Formula', 'IUPAC Name']:
structures[kind] = value
return structures
return {}
# Usage
structures = get_drug_structures('DB00001')
print(f"SMILES: {structures.get('SMILES')}")
print(f"InChI: {structures.get('InChI')}")
```
## Physicochemical Properties
### Calculated Properties
Properties computed from structure:
- **Molecular Weight**: Exact mass in Daltons
- **logP**: Partition coefficient (lipophilicity)
- **logS**: Aqueous solubility
- **Polar Surface Area (PSA)**: Topological polar surface area
- **H-Bond Donors**: Number of hydrogen bond donors
- **H-Bond Acceptors**: Number of hydrogen bond acceptors
- **Rotatable Bonds**: Number of rotatable bonds
- **Refractivity**: Molar refractivity
- **Polarizability**: Molecular polarizability
### Experimental Properties
Measured properties from literature:
- **Melting Point**: Physical melting point
- **Water Solubility**: Experimental solubility data
- **pKa**: Acid dissociation constant
- **Hydrophobicity**: Experimental logP/logD values
### Extract All Properties
```python
def get_all_properties(drugbank_id):
"""Extract all calculated and experimental properties"""
root = get_drugbank_root()
ns = {'db': 'http://www.drugbank.ca'}
for drug in root.findall('db:drug', ns):
primary_id = drug.find('db:drugbank-id[@primary="true"]', ns)
if primary_id is not None and primary_id.text == drugbank_id:
properties = {
'calculated': {},
'experimental': {}
}
# Calculated properties
calc_props = drug.find('db:calculated-properties', ns)
if calc_props is not None:
for prop in calc_props.findall('db:property', ns):
kind = prop.find('db:kind', ns).text
value = prop.find('db:value', ns).text
source = prop.find('db:source', ns)
properties['calculated'][kind] = {
'value': value,
'source': source.text if source is not None else None
}
# Experimental properties
exp_props = drug.find('db:experimental-properties', ns)
if exp_props is not None:
for prop in exp_props.findall('db:property', ns):
kind = prop.find('db:kind', ns).text
value = prop.find('db:value', ns).text
properties['experimental'][kind] = value
return properties
return {}
# Usage
props = get_all_properties('DB00001')
print(f"Molecular Weight: {props['calculated'].get('Molecular Weight', {}).get('value')}")
print(f"logP: {props['calculated'].get('logP', {}).get('value')}")
```
## Lipinski's Rule of Five Analysis
### Rule of Five Checker
```python
def check_lipinski_rule_of_five(drugbank_id):
"""Check if drug satisfies Lipinski's Rule of Five"""
props = get_all_properties(drugbank_id)
calc_props = props.get('calculated', {})
# Extract values
mw = float(calc_props.get('Molecular Weight', {}).get('value', 0))
logp = float(calc_props.get('logP', {}).get('value', 0))
h_donors = int(calc_props.get('H Bond Donor Count', {}).get('value', 0))
h_acceptors = int(calc_props.get('H Bond Acceptor Count', {}).get('value', 0))
# Check rules
rules = {
'molecular_weight': mw <= 500,
'logP': logp <= 5,
'h_bond_donors': h_donors <= 5,
'h_bond_acceptors': h_acceptors <= 10
}
violations = sum(1 for passes in rules.values() if not passes)
return {
'passes': violations <= 1, # Allow 1 violation
'violations': violations,
'rules': rules,
'values': {
'molecular_weight': mw,
'logP': logp,
'h_bond_donors': h_donors,
'h_bond_acceptors': h_acceptors
}
}
# Usage
ro5 = check_lipinski_rule_of_five('DB00001')
print(f"Passes Ro5: {ro5['passes']} (Violations: {ro5['violations']})")
```
### Veber's Rules
```python
def check_veber_rules(drugbank_id):
"""Check Veber's rules for oral bioavailability"""
props = get_all_properties(drugbank_id)
calc_props = props.get('calculated', {})
psa = float(calc_props.get('Polar Surface Area (PSA)', {}).get('value', 0))
rotatable = int(calc_props.get('Rotatable Bond Count', {}).get('value', 0))
rules = {
'polar_surface_area': psa <= 140,
'rotatable_bonds': rotatable = similarity_threshold:
drug_name = drug.find('db:name', ns).text
indication = drug.find('db:indication', ns)
indication_text = indication.text if indication is not None else None
similar_drugs.append({
'drug_id': drug_id,
'drug_name': drug_name,
'similarity': similarity,
'indication': indication_text
})
# Sort by similarity
similar_drugs.sort(key=lambda x: x['similarity'], reverse=True)
return similar_drugs
# Find similar drugs
similar = find_similar_drugs('DB00001', similarity_threshold=0.7)
for drug in similar[:10]:
print(f"{drug['drug_name']}: {drug['similarity']:.3f}")
```
### Batch Similarity Matrix
```python
import numpy as np
import pandas as pd
def create_similarity_matrix(drug_ids):
"""Create pairwise similarity matrix for a list of drugs"""
n = len(drug_ids)
matrix = np.zeros((n, n))
# Get all SMILES
smiles_dict = {}
for drug_id in drug_ids:
structures = get_drug_structures(drug_id)
smiles_dict[drug_id] = structures.get('SMILES')
# Calculate similarities
for i, drug1_id in enumerate(drug_ids):
for j, drug2_id in enumerate(drug_ids):
if i == j:
matrix[i, j] = 1.0
elif i < j: # Only calculate upper triangle
smiles1 = smiles_dict[drug1_id]
smiles2 = smiles_dict[drug2_id]
if smiles1 and smiles2:
sim = calculate_tanimoto_similarity(smiles1, smiles2)
matrix[i, j] = sim if sim is not None else 0
matrix[j, i] = matrix[i, j] # Symmetric
df = pd.DataFrame(matrix, index=drug_ids, columns=drug_ids)
return df
# Create similarity matrix for a set of drugs
drug_list = ['DB00001', 'DB00002', 'DB00003', 'DB00005']
sim_matrix = create_similarity_matrix(drug_list)
```
## Molecular Fingerprints
### Generate Different Fingerprint Types
```python
from rdkit.Chem import MACCSkeys
from rdkit.Chem.AtomPairs import Pairs
from rdkit.Chem.Fingerprints import FingerprintMols
def generate_fingerprints(smiles):
"""Generate multiple types of molecular fingerprints"""
mol = Chem.MolFromSmiles(smiles)
if mol is None:
return None
fingerprints = {
'morgan_fp': AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048),
'maccs_keys': MACCSkeys.GenMACCSKeys(mol),
'topological_fp': FingerprintMols.FingerprintMol(mol),
'atom_pairs': Pairs.GetAtomPairFingerprint(mol)
}
return fingerprints
# Generate fingerprints for a drug
structures = get_drug_structures('DB00001')
fps = generate_fingerprints(structures.get('SMILES'))
```
### Substructure Search
```python
from rdkit.Chem import Fragments
def search_substructure(substructure_smarts):
"""Find drugs containing a specific substructure"""
root = get_drugbank_root()
ns = {'db': 'http://www.drugbank.ca'}
pattern = Chem.MolFromSmarts(substructure_smarts)
if pattern is None:
print("Invalid SMARTS pattern")
return []
matching_drugs = []
for drug in root.findall('db:drug', ns):
drug_id = drug.find('db:drugbank-id[@primary="true"]', ns).text
structures = get_drug_structures(drug_id)
smiles = structures.get('SMILES')
if smiles:
mol = Chem.MolFromSmiles(smiles)
if mol and mol.HasSubstructMatch(pattern):
drug_name = drug.find('db:name', ns).text
matching_drugs.append({
'drug_id': drug_id,
'drug_name': drug_name
})
return matching_drugs
# Example: Find drugs with benzene ring
benzene_drugs = search_substructure('c1ccccc1')
print(f"Found {len(benzene_drugs)} drugs with benzene ring")
```
## ADMET Property Prediction
### Predict Absorption
```python
def predict_oral_absorption(drugbank_id):
"""Predict oral absorption based on physicochemical properties"""
props = get_all_properties(drugbank_id)
calc_props = props.get('calculated', {})
mw = float(calc_props.get('Molecular Weight', {}).get('value', 0))
logp = float(calc_props.get('logP', {}).get('value', 0))
psa = float(calc_props.get('Polar Surface Area (PSA)', {}).get('value', 0))
h_donors = int(calc_props.get('H Bond Donor Count', {}).get('value', 0))
# Simple absorption prediction
good_absorption = (
mw <= 500 and
-0.5 <= logp <= 5.0 and
psa <= 140 and
h_donors <= 5
)
absorption_score = 0
if mw <= 500:
absorption_score += 25
if -0.5 <= logp <= 5.0:
absorption_score += 25
if psa <= 140:
absorption_score += 25
if h_donors <= 5:
absorption_score += 25
return {
'predicted_absorption': 'good' if good_absorption else 'poor',
'absorption_score': absorption_score,
'properties': {
'molecular_weight': mw,
'logP': logp,
'psa': psa,
'h_donors': h_donors
}
}
```
### BBB Permeability Prediction
```python
def predict_bbb_permeability(drugbank_id):
"""Predict blood-brain barrier permeability"""
props = get_all_properties(drugbank_id)
calc_props = props.get('calculated', {})
mw = float(calc_props.get('Molecular Weight', {}).get('value', 0))
logp = float(calc_props.get('logP', {}).get('value', 0))
psa = float(calc_props.get('Polar Surface Area (PSA)', {}).get('value', 0))
h_donors = int(calc_props.get('H Bond Donor Count', {}).get('value', 0))
# BBB permeability criteria (simplified)
bbb_permeable = (
mw <= 450 and
logp <= 5.0 and
psa <= 90 and
h_donors 0.85 = very similar, 0.7-0.85 = similar, <0.7 = different
4. **Rule Application**: Lipinski's Ro5 and Veber's rules are guidelines, not absolute cutoffs
5. **ADMET Prediction**: Use computational predictions as screening, validate experimentally
6. **Chemical Space**: Visualize chemical space to understand drug diversity
7. **Standardization**: Standardize molecules (neutralize, remove salts) before comparison
8. **Performance**: Cache computed fingerprints for large-scale similarity searches
Source: claude-code-templates (MIT). See About Us for full credits.