DrugBank数据库Downloads详解(版本5.1.4,2019-7-2)
Drugbank开放数据集是公共域数据集,可以在您的应用程序或项目中自由使用(包括商业用途)。它是根据Creative Common的CC0国际许可证发布的。
在法律允许的范围内,将CC0与药房银行公开数据关联的人放弃了对药房银行公开数据的所有版权和相关或相邻权利。发表于:加拿大。
目录
1. Drug Sequences(以Approved为例)
2. Protein identifiers(Approved)
4. External Links → External Drug Links(Approved)
5. External Links → Target Drug-UniProt Links(Approved)
6. External Links → Enzyme/Carrier/Transporter Drug-UniProt Links(Approved)
7. Structures→ Structure External Links (Approved)
1. Drug Sequences(以Approved为例)
drugbank_approved_drug_sequences.fasta.zip
按下图操作,并下载
下载文件如下所示:
以drugbank_drug|DB00002 Cetuximab heavy chain为例:https://www.drugbank.ca/drugs/DB00002
可以发现这是一个被批准的药物(Approved drug),是蛋白质类型。
因此,Drug Sequences是蛋白质类药物。
2. Protein identifiers(Approved)
Protein identifiers include external IDs to resources such as UniProt and PDB. These downloads are divided first by protein/compound type (target, transporter, etc.). Secondly they are divided by drug group (approved, illicit, etc.). Each archive contains 2 files: one for all target/enzyme/transporter/carriers and one with only those marked as
pharmacologically active
(directly related to the mechanism of action for at least one of the associated drugs). Note that each row in the export CSV file also includes a concatenated list of DrugBank drugs IDs (semi-colon delimited) as the last column.蛋白质标识符包括uniprot和pdb等资源的外部id。这些下载首先按蛋白质/化合物类型(目标、转运体等)划分。其次,它们按药物类别(批准的、非法的等)划分。每个档案包含2个文件:一个为所有目标/酶/转运蛋白/载体和一个只有那些标记为药理活性(直接相关的作用机制,至少一个相关的药物)。请注意,export csv文件中的每一行还包括一个串联的药库药品id列表(以分号分隔)作为最后一列。
drugbank_approved_target_polypeptide_ids.csv.zip
all.csv, pharmacologically_active.csv
可以发现:左边比右边多了1000+条data(注意:不是全部的蛋白质数据,应该是有相应的drug的)。
- 以ID = 4为例(两个文件都存在的):https://www.drugbank.ca/bio_entities/BE0000004
4,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,1EVU; 1EX0; 1F13; 1FIE; 1GGT; 1GGU; 1GGY; 1QRK; 4KTY,,F13A1,HGNC:3531,Humans,DB11300; DB11311; DB11571; DB11572; DB13151
再以其相应的Drug进行搜索,以DB11300为例:https://www.drugbank.ca/drugs/DB11300#targets
匹配成功!
- 以ID = 2为例(仅all.csv文件存在的):https://www.drugbank.ca/bio_entities/BE0000002
2,Histidine decarboxylase,HDC,32109,X54297,P19113,DCHS_HUMAN,4E1O,,HDC,HGNC:4855,Humans,DB00114; DB00117
匹配成功!说明All.csv文件储存的是针对有Drug Relations项的所有Proteins。
- 然而,需要注意的是,可能并不完整。因为,对于ID = 4号,All.csv显示如下
4,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,1EVU; 1EX0; 1F13; 1FIE; 1GGT; 1GGU; 1GGY; 1QRK; 4KTY,,F13A1,HGNC:3531,Humans,DB01839; DB11300; DB11311; DB11571; DB11572; DB13151
并未将https://www.drugbank.ca/bio_entities/BE0000004中的Drug Relations全部包含进去,缺少了如下两项: (原因未知)
从上述描述和文件名可以得出:
pharmacologically_active.csv文件包含的Drug IDs是如下图所示的。而All.csv应该是包含yes & unknown的,但是尚不完全。
3. target sequences(Approved)
drugbank_approved_target_polypeptide_sequences.fasta.zip
protein.fasta, gene.fasta
- 分别是Amino acid sequence和Gene sequence
- 以P19113为例,直接检索
进入后,页面如下
结果与文件中的标题行一致,标题行为:
>drugbank_target|P19113 Histidine decarboxylase (DB00114; DB00117)
- DB是相关联的Drug
值得注意的是:以下两个文件是一一对应的。
ZIP | drugbank_approved_target_polypeptide_ids.csv | drugbank_approved_target_polypeptide_sequences.fasta |
file | all.csv | protein.fasta |
ID |
DrugBank的ID |
UniProt的ID |
4. External Links → External Drug Links(Approved)
drugbank_approved_drug_links.csv.zip
drug links.csv
- 包含3883个Drug
- 包含如下内容:
DrugBank ID , Name , CAS Number , Drug Type , KEGG Compound ID , KEGG Drug ID , PubChem Compound ID ,
PubChem Substance ID , ChEBI ID , PharmGKB ID , HET ID , UniProt ID , UniProt Title , GenBank ID , DPD ID ,
RxList Link , Pdrhealth Link , Wikipedia ID , Drugs.com Link , NDC ID , ChemSpider ID , BindingDB ID , TTD ID
5. External Links → Target Drug-UniProt Links(Approved)
drugbank_approved_target_uniprot_links.csv.zip
uniprot links.csv
- 以DB00002为例,https://www.drugbank.ca/drugs/DB00002:
文件中相对于DB00002有12行,说明该药有12个Targets(并提供了其Uniprot ID)。与上图中显示的Targets(12)一致。
DB00002,Cetuximab,BiotechDrug,P00533,Epidermal growth factor receptor
DB00002,Cetuximab,BiotechDrug,O75015,Low affinity immunoglobulin gamma Fc region receptor III-B
DB00002,Cetuximab,BiotechDrug,P00736,Complement C1r subcomponent
DB00002,Cetuximab,BiotechDrug,P02745,Complement C1q subcomponent subunit A
DB00002,Cetuximab,BiotechDrug,P02746,Complement C1q subcomponent subunit B
DB00002,Cetuximab,BiotechDrug,P02747,Complement C1q subcomponent subunit C
DB00002,Cetuximab,BiotechDrug,P08637,Low affinity immunoglobulin gamma Fc region receptor III-A
DB00002,Cetuximab,BiotechDrug,P09871,Complement C1s subcomponent
DB00002,Cetuximab,BiotechDrug,P12314,High affinity immunoglobulin gamma Fc receptor I
DB00002,Cetuximab,BiotechDrug,P12318,Low affinity immunoglobulin gamma Fc region receptor II-a
DB00002,Cetuximab,BiotechDrug,P31994,Low affinity immunoglobulin gamma Fc region receptor II-b
DB00002,Cetuximab,BiotechDrug,P31995,Low affinity immunoglobulin gamma Fc region receptor II-c
- 前三列DrugBank ID, Name, Type为Drug信息
- 后两列UniProt ID, UniProt Name为Target信息
6. External Links → Enzyme/Carrier/Transporter Drug-UniProt Links(Approved)
drugbank_approved_enzyme/c*/t*_uniprot_links.csv.zip
uniprot links.csv
- # Enzyme/Carrier/Transporter = 4281 + 2377 + 567 – 6 = 7219
- # Target = 10364
- 以DB00006为例,https://www.drugbank.ca/drugs/DB00006:
在Enzyme文件中:DB00006,Bivalirudin,SmallMoleculeDrug,P05164,Myeloperoxidase
在Target文件中:DB00006,Bivalirudin,SmallMoleculeDrug,P00734,Prothrombin
因此,Target和 Enzyme/Carrier/Transporter分别是不同的东西。(只关注Target即可?)
7. Structures→ Structure External Links (Approved)
drugbank_approved_structure_links.csv.zip
structure links.csv
- 2594条data
- 包含如下内容:
DrugBank ID , Name , CAS Number , Drug Groups , InChIKey , InChI , SMILES , Formula ,
KEGG Compound ID , KEGG Drug ID , PubChem Compound ID , PubChem Substance ID ,
ChEBI ID , ChEMBL ID , HET ID , ChemSpider ID , BindingDB ID
8. Complete Database(Full)
drugbank_all_full_database.xml.zip
full database.xml
其他可参考文章:
Drug-Target Interaction 预测中的几个数据库(转载)
注意:biointeractions为药物-药物相互作用