Drugbank开放数据集是公共域数据集,可以在您的应用程序或项目中自由使用(包括商业用途)。它是根据Creative Common的CC0国际许可证发布的。

在法律允许的范围内,将CC0与药房银行公开数据关联的人放弃了对药房银行公开数据的所有版权和相关或相邻权利。发表于:加拿大。

https://www.drugbank.ca/


目录

1. Drug Sequences(以Approved为例)

2. Protein identifiers(Approved)

3. target sequences(Approved)

4. External Links → External Drug Links(Approved)

5. External Links → Target Drug-UniProt Links(Approved)

6. External Links → Enzyme/Carrier/Transporter Drug-UniProt Links(Approved)

7. StructuresStructure External Links (Approved)

8. Complete Database(Full)


1. Drug Sequences(以Approved为例)

drugbank_approved_drug_sequences.fasta.zip

按下图操作,并下载

下载文件如下所示: 

drugbank_drug|DB00002 Cetuximab heavy chain为例:https://www.drugbank.ca/drugs/DB00002

可以发现这是一个被批准的药物(Approved drug),是蛋白质类型。

因此,Drug Sequences是蛋白质类药物


2. Protein identifiers(Approved)

Protein identifiers include external IDs to resources such as UniProt and PDB. These downloads are divided first by protein/compound type (target, transporter, etc.). Secondly they are divided by drug group (approved, illicit, etc.). Each archive contains 2 files: one for all target/enzyme/transporter/carriers and one with only those marked as pharmacologically active (directly related to the mechanism of action for at least one of the associated drugs). Note that each row in the export CSV file also includes a concatenated list of DrugBank drugs IDs (semi-colon delimited) as the last column.

蛋白质标识符包括uniprot和pdb等资源的外部id。这些下载首先按蛋白质/化合物类型(目标、转运体等)划分。其次,它们按药物类别(批准的、非法的等)划分。每个档案包含2个文件:一个为所有目标/酶/转运蛋白/载体和一个只有那些标记为药理活性(直接相关的作用机制,至少一个相关的药物)。请注意,export csv文件中的每一行还包括一个串联的药库药品id列表(以分号分隔)作为最后一列。

drugbank_approved_target_polypeptide_ids.csv.zip

all.csv, pharmacologically_active.csv

可以发现:左边比右边多了1000+条data(注意:不是全部的蛋白质数据,应该是有相应的drug的)。

4,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,1EVU; 1EX0; 1F13; 1FIE; 1GGT; 1GGU; 1GGY; 1QRK; 4KTY,,F13A1,HGNC:3531,Humans,DB11300; DB11311; DB11571; DB11572; DB13151

 再以其相应的Drug进行搜索,以DB11300为例:https://www.drugbank.ca/drugs/DB11300#targets

匹配成功!

2,Histidine decarboxylase,HDC,32109,X54297,P19113,DCHS_HUMAN,4E1O,,HDC,HGNC:4855,Humans,DB00114; DB00117

 

匹配成功!说明All.csv文件储存的是针对有Drug Relations项的所有Proteins。

  • 然而,需要注意的是,可能并不完整。因为,对于ID = 4号,All.csv显示如下

4,Coagulation factor XIII A chain,F13A1,182309,M22001,P00488,F13A_HUMAN,1EVU; 1EX0; 1F13; 1FIE; 1GGT; 1GGU; 1GGY; 1QRK; 4KTY,,F13A1,HGNC:3531,Humans,DB01839; DB11300; DB11311; DB11571; DB11572; DB13151 

并未将https://www.drugbank.ca/bio_entities/BE0000004中的Drug Relations全部包含进去,缺少了如下两项: (原因未知)

从上述描述和文件名可以得出:

pharmacologically_active.csv文件包含的Drug IDs是如下图所示的。而All.csv应该是包含yes & unknown的,但是尚不完全。


3. target sequences(Approved)

drugbank_approved_target_polypeptide_sequences.fasta.zip

protein.fasta, gene.fasta

  •  分别是Amino acid sequenceGene sequence
  • P19113为例,直接检索

 进入后,页面如下

结果与文件中的标题行一致,标题行为:

 >drugbank_target|P19113 Histidine decarboxylase (DB00114; DB00117)

  •  DB是相关联的Drug

值得注意的是:以下两个文件是一一对应的。

ZIP drugbank_approved_target_polypeptide_ids.csv drugbank_approved_target_polypeptide_sequences.fasta
file all.csv protein.fasta
ID

DrugBank的ID

如:https://www.drugbank.ca/bio_entities/BE0000002

UniProt的ID

4. External Links → External Drug Links(Approved)

drugbank_approved_drug_links.csv.zip

drug links.csv 

  • 包含3883个Drug
  • 包含如下内容:

DrugBank ID , Name , CAS Number , Drug Type , KEGG Compound ID , KEGG Drug ID , PubChem Compound ID ,

PubChem Substance ID , ChEBI ID , PharmGKB ID , HET ID , UniProt ID , UniProt Title , GenBank ID , DPD ID ,

RxList Link , Pdrhealth Link , Wikipedia ID , Drugs.com Link , NDC ID , ChemSpider ID , BindingDB ID , TTD ID


5. External Links → Target Drug-UniProt Links(Approved)

drugbank_approved_target_uniprot_links.csv.zip

uniprot links.csv

文件中相对于DB00002有12行,说明该药有12个Targets(并提供了其Uniprot ID)。与上图中显示的Targets(12)一致。 

DB00002,Cetuximab,BiotechDrug,P00533,Epidermal growth factor receptor
DB00002,Cetuximab,BiotechDrug,O75015,Low affinity immunoglobulin gamma Fc region receptor III-B
DB00002,Cetuximab,BiotechDrug,P00736,Complement C1r subcomponent
DB00002,Cetuximab,BiotechDrug,P02745,Complement C1q subcomponent subunit A
DB00002,Cetuximab,BiotechDrug,P02746,Complement C1q subcomponent subunit B
DB00002,Cetuximab,BiotechDrug,P02747,Complement C1q subcomponent subunit C
DB00002,Cetuximab,BiotechDrug,P08637,Low affinity immunoglobulin gamma Fc region receptor III-A
DB00002,Cetuximab,BiotechDrug,P09871,Complement C1s subcomponent
DB00002,Cetuximab,BiotechDrug,P12314,High affinity immunoglobulin gamma Fc receptor I
DB00002,Cetuximab,BiotechDrug,P12318,Low affinity immunoglobulin gamma Fc region receptor II-a
DB00002,Cetuximab,BiotechDrug,P31994,Low affinity immunoglobulin gamma Fc region receptor II-b
DB00002,Cetuximab,BiotechDrug,P31995,Low affinity immunoglobulin gamma Fc region receptor II-c

  • 前三列DrugBank ID, Name, Type为Drug信息
  • 后两列UniProt ID, UniProt Name为Target信息 

6. External Links → Enzyme/Carrier/Transporter Drug-UniProt Links(Approved)

drugbank_approved_enzyme/c*/t*_uniprot_links.csv.zip

uniprot links.csv

在Enzyme文件中:DB00006,Bivalirudin,SmallMoleculeDrug,P05164,Myeloperoxidase

在Target文件中:DB00006,Bivalirudin,SmallMoleculeDrug,P00734,Prothrombin

因此,Target和 Enzyme/Carrier/Transporter分别是不同的东西。(只关注Target即可?)


7. Structures→ Structure External Links (Approved)

drugbank_approved_structure_links.csv.zip

structure links.csv

  • 2594条data
  • 包含如下内容:

DrugBank ID , Name , CAS Number , Drug Groups , InChIKey , InChI , SMILES , Formula ,

KEGG Compound ID , KEGG Drug ID , PubChem Compound ID , PubChem Substance ID ,

ChEBI ID , ChEMBL ID , HET ID , ChemSpider ID , BindingDB ID
 


8. Complete Database(Full)

drugbank_all_full_database.xml.zip

full database.xml


其他可参考文章:

Drug-Target Interaction 预测中的几个数据库(转载)


注意:biointeractions为药物-药物相互作用

版权声明:本文为匿名原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: