NMPFamsDB offers an Application Programming Interface (API), enabling users to retrieve database components without utilizing the web interface. Through the API, you can programmatically access subsets of information and retrieve their components, access the database with scripts written in various languages like Perl, Python, R etc., or incorporate connections with NMPFamsDB to your own applications or web pages.
The API currently serves results in the JSON format. All API services can be accessed by using both GET and POST requests, unless otherwise noted (see below).
The base URL address of the API is the following:
https://pavlopoulos-lab.org/NMPFamsDB/api/
Each specific method can be used by appending its respective portion of the URL to the base address. Currently, the following methods are offered:
Method | Method Description | Request Type | URL | Arguments |
---|---|---|---|---|
Search Families | Returns a list of NMPFs. Additional arguments can be supplied to perform queries (see next column). | GET, POST | {base_url}/families |
ids: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025 ) category: dataset category ( metagenome , meatranscriptome or mixed ) taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745 ) scaffolds: list of Scaffold IDs, separated by commas ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229 ) proteins: list of Protein IDs, separated by commas structure: whether the family has a 3D structure model or not ( y for yes, n for no) |
Family Entry | Retrieves all information on a family, based on the supplied NMPF Family ID (familyId ). |
GET, POST | {base_url}/families/familyId |
familyId: a NMPF ID (e.g. F000872 ) |
Search 3D Models | Returns a list of NMPFs with available 3D models. Additional arguments can be supplied to perform queries (see next column). | GET, POST | {base_url}/models |
ids: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025 ) category: dataset category ( metagenome , meatranscriptome or mixed ) taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745 ) scaffold_ids: list of Scaffold IDs, separated by commas protein_ids: list of Protein IDs, separated by commas ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229 ) |
3D Model | Retrieves information on an NMPF 3D structure model, based on the supplied NMPF Family ID (familyId ). |
GET, POST | {base_url}/models/familyId |
familyId: a NMPF ID (e.g. F000872 ) |
Search Datasets | Returns a list of IMG/M datasets. Additional arguments can be supplied to perform queries (see next column). | GET, POST | {base_url}/datasets |
families: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025 ) taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745 ) scaffolds: list of Scaffold IDs, separated by commas proteins: list of Protein IDs, separated by commas ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229 ) category: dataset category ( metagenome , meatranscriptome or mixed ) restricted: whether the dataset’s data usage policy is restricted or not ( y for yes, n for no) |
Dataset | Retrieves all information on a Dataset as defined by the supplied Taxon OID (taxonOid ). |
GET, POST | {base_url}/datasets/taxonOid |
taxonOid: a dataset Taxon OID (e.g. 2001200001 ) |
Search Scaffolds | Returns a list of sequencing scaffolds. Additional arguments can be supplied to perform queries (see next column). | POST | {base_url}/scaffolds |
families: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025 ) taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745 ) scaffolds: list of Scaffold IDs, separated by commas proteins: list of Protein IDs, separated by commas ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229 ) category: dataset category ( metagenome , meatranscriptome or mixed ) restricted: whether the dataset’s data usage policy is restricted or not ( y for yes, n for no) |
Scaffold | Retrieves all information on a sequencing scaffold as defined by the supplied Taxon OID (taxonOid ) and Scaffold ID (scaffoldID ). |
GET, POST | {base_url}/scaffolds/taxonOid/scaffoldId |
taxonOid: a dataset Taxon OID (e.g. 2001200001 ) scaffoldId: A scaffold identifier (e.g. JGI12270J11330_10000002 ) |
Search Sequences | Returns a list of protein sequences. Additional arguments can be supplied to perform queries (see next column). | POST | {base_url}/sequences |
families: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025 ) taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745 ) scaffolds: list of Scaffold IDs, separated by commas proteins: list of Protein IDs, separated by commas ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229 ) category: dataset category ( metagenome , meatranscriptome or mixed ) restricted: whether the dataset’s data usage policy is restricted or not ( y for yes, n for no) |
The Search Families (/families
) , Search Datasets (/datasets
), Search Scaffolds (/scaffolds
) and Search 3D models (/models
) can be used with no arguments, in which case all database entries will be retrieved. If the arguments supplied in the last column of the table above are used, the search will be refined based on those.
GET requests can be executed simply by pasting the complete URL in a browser address. POST requests can be executed using scripts written in, e.g. Perl, Python or R, with command line applications such as cURL or with a REST API client application, such as Postman. In GET requests, arguments can be included by appending a question mark (?) to the end of the service name and separating parameters with ampersand (&). In POST requests, arguments need to be styled accordingly based on the method used to build the request (see examples below).
Note: Although most methods are available as both GET and POST requests, the former are limited with regards to the number of argument parameters (i.e. the URL length) and the size of the response. For large requests, you should use POST instead of GET. Also, note that for reasons of computational efficiency, the Search Scaffolds and Search Sequences methods are only accessible as POST requests.
To retrieve all NMPF families (similarly other types, e.g. datasets with /datasets
):
https://pavlopoulos-lab.org/NMPFamsDB/api/families
To retrieve only families with 3D structure models:
https://pavlopoulos-lab.org/NMPFamsDB/api/families?structure=y
To retrieve only families with 3D structure models that are metatranscriptome-specific:
https://pavlopoulos-lab.org/NMPFamsDB/api/families?structure=y&category=metatranscriptome
To retrieve only the families associated with datasets 2001200001
and 7000000745
and DO NOT have 3D structure models:
https://pavlopoulos-lab.org/NMPFamsDB/api/families?taxon_oids=2001200001,7000000745&structure=n
To retrieve families containing viral sequences (The NCBI Taxonomy ID for viruses is 10239
):
https://pavlopoulos-lab.org/NMPFamsDB/api/families?ncbi=10239
To get the data of a specific family (e.g. F000872
):
https://pavlopoulos-lab.org/NMPFamsDB/api/families/F000872
To get a specific dataset (e.g. 7000000745
):
https://pavlopoulos-lab.org/NMPFamsDB/api/datasets/7000000745
To get a specific scaffold (e.g. JGI12270J11330_10000002
from 3300000567
):
https://pavlopoulos-lab.org/NMPFamsDB/api/scaffolds/3300000567/JGI12270J11330_10000002
Retrieve all families:
curl -X POST https://pavlopoulos-lab.org/NMPFamsDB/api/families
Retrieve a particular family:
curl -X POST https://pavlopoulos-lab.org/NMPFamsDB/api/families/F000872
Retrieve all metatranscriptome-derived, viral (NCBI: 10239
) scaffolds:
using the multipart/form-data
Content-Type with -F
:
curl -X POST -F "category=metatranscriptome" -F "ncbi=10239" https://pavlopoulos-lab.org/NMPFamsDB/api/scaffolds
using the application/x-www-form-urlencoded
Content-Type with -d
:
curl -X POST -d "category=metatranscriptome&ncbi=10239" https://pavlopoulos-lab.org/NMPFamsDB/api/scaffolds
as JSON data:
curl -X POST https://pavlopoulos-lab.org/NMPFamsDB/api/scaffolds -H 'Content-Type: application/json' -d '{"category":"metatranscriptome","ncbi":10239}'
Example request for mixed metagenome/metatranscriptome (mixed
) families, containing viral (10239
) sequences and having a 3D structure:
import requests
import json
# define the API method URL
url="https://pavlopoulos-lab.org/NMPFamsDB/api/families"
# define the search parameters
params={"category":"mixed", "ncbi":10239, "structure":"y"}
# perform the request, check the status code and if it is 200 (request ran)
# decode the result in JSON
req=requests.post(url, params)
if req.status_code==200:
result=req.json()
# print the result
print(json.dumps(result, sort_keys=True, indent=4))
All requests return data in the JSON format. An example is shown below:
Search Families Request:
https://pavlopoulos-lab.org/NMPFamsDB/api/families?taxon_oids=7000000745&structure=n&category=metagenome
Response (excerpt):
[
{
"ID": "F046432",
"Category": "Metagenome",
"SequenceCount": "151",
"ScaffoldCount": "151",
"SampleCount": "105",
"HabitatCount": "11",
"PDB": "N",
"PDB_Confidence": null
},
{
"ID": "F095629",
"Category": "Metagenome",
"SequenceCount": "105",
"ScaffoldCount": "105",
"SampleCount": "94",
"HabitatCount": "7",
"PDB": "N",
"PDB_Confidence": null
}
]
Example Family Entry request:
https://pavlopoulos-lab.org/NMPFamsDB/api/families/F003346
Response (excerpt):
{
"ID": "F003346",
"url": "https://pavlopoulos-lab.org/NMPFamsDB/family?id=F003346",
"Category": "Metagenome",
"Sequence Count": 492,
"Scaffold Count": 484,
"Dataset Count": 1,
"Ecosystem Count": 1,
"Consensus Sequence": "VLERKEGKTGENLKNKELVLEKRCKVRPPLPPMADCEDLMGKYELMSMLRRTTQVEMSVGILRSRFETYPPQQFDLTVLEEDEDVLDPITTLGNHVVASGPRTLEERIESGRDWEQWLASVDVEEEERLVAEAETHLKYAKAWVDSLVGDVDIAPKICPGSTDNK",
"Structure": {
"Pivot Sequence": "MADKCEDLMGKYESMLRRTTQVEMSVGILRSRFETYPPQQFDLTVLEEDVLDTPITLGNHVVASGPRTLEERIESGRDWEQWLASVGDVEEEERLVAEAQETAKAWVDSLVGDVDIAPK",
"Secondary Structure": "CCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCHCCCCCCCCCCEEEECCCCCHHHHHCCCCCHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCC",
"Secondary Structure Confidence Score": "97655778776788887654455556665555456997766543344444578755687566468765777654677788888655776889999999999999999987656677999",
"Disordered Regions": null,
"3D Model": "N",
"3D pLDDT": 0
},
"Topology": {
"Signal Peptide": null,
"Topology": "GLOBULAR",
"Transmembrane": null,
"TransmembraneTopology": null
},
"Datasets": [
{
"Taxon OID": "3300031730",
"Sample Name": "Populus trichocarpa ectomycorrhiza microbial communities from riparian zone in the Pacific Northwest, United States - 19_EM"
}
],
"Scaffolds": [
{
"Taxon OID": "3300031730",
"Scaffold ID": "Ga0307516_10045150",
"Organism": "Predicted Viral",
"NCBI TaxID": 0
},
{
"Taxon OID": "3300031730",
"Scaffold ID": "Ga0307516_10133304",
"Organism": "Predicted Viral",
"NCBI TaxID": 0
},
{
"Taxon OID": "3300031730",
"Scaffold ID": "Ga0307516_10202168",
"Organism": "Predicted Viral",
"NCBI TaxID": 0
}
]
}