[ PROMPT_NODE_26652 ]
Gget 工作流
[ SKILL_DOCUMENTATION ]
# gget 工作流示例
扩展的工作流示例,展示了如何组合多个 gget 模块以完成常见的生物信息学任务。
## 目录
1. [完整基因分析流水线](#complete-gene-analysis-pipeline)
2. [比较结构生物学](#comparative-structural-biology)
3. [癌症基因组学分析](#cancer-genomics-analysis)
4. [单细胞表达分析](#single-cell-expression-analysis)
5. [构建参考转录组](#building-reference-transcriptomes)
6. [突变影响评估](#mutation-impact-assessment)
7. [药物靶点发现](#drug-target-discovery)
---
## 完整基因分析流水线
从发现到功能注释的基因综合分析。
python
import gget
import pandas as pd
# 第 1 步:搜索感兴趣的基因
print("第 1 步:搜索 GABA 受体基因...")
search_results = gget.search(["GABA", "receptor", "alpha"],
species="homo_sapiens",
andor="and")
print(f"找到 {len(search_results)} 个基因")
# 第 2 步:获取详细信息
print("n第 2 步:获取详细信息...")
gene_ids = search_results["ensembl_id"].tolist()[:5] # 前 5 个基因
gene_info = gget.info(gene_ids, pdb=True)
print(gene_info[["ensembl_id", "gene_name", "uniprot_id", "description"]])
# 第 3 步:检索序列
print("n第 3 步:检索序列...")
nucleotide_seqs = gget.seq(gene_ids)
protein_seqs = gget.seq(gene_ids, translate=True)
# 保存序列
with open("gaba_receptors_nt.fasta", "w") as f:
f.write(nucleotide_seqs)
with open("gaba_receptors_aa.fasta", "w") as f:
f.write(protein_seqs)
# 第 4 步:获取表达数据
print("n第 4 步:获取组织表达数据...")
for gene_id, gene_name in zip(gene_ids, gene_info["gene_name"]):
expr_data = gget.archs4(gene_name, which="tissue")
print(f"n{gene_name} 表达:")
print(expr_data.head())
# 第 5 步:查找相关基因
print("n第 5 步:查找相关基因...")
correlated = gget.archs4(gene_info["gene_name"].iloc[0], which="correlation")
correlated_top = correlated.head(20)
print(correlated_top)
# 第 6 步:对相关基因进行富集分析
print("n第 6 步:执行富集分析...")
gene_list = correlated_top["gene_symbol"].tolist()
enrichment = gget.enrichr(gene_list, database="ontology", plot=True)
print(enrichment.head(10))
# 第 7 步:获取疾病关联
print("n第 7 步:获取疾病关联...")
for gene_id