NMPFamsDB

NMPFamsDB

NMPFamsDB

A database of Novel Metagenome Protein Families

A database of Novel Metagenome Protein Clusters

A database of Novel Metagenome Protein Clusters
x
This website uses cookies to improve user experience. By using NMPFamDB you consent to all cookies in accordance with our privacy policy. OK
Programmatic Access - NMPFamsDB

Programmatic Access


NMPFamsDB offers an Application Programming Interface (API), enabling users to retrieve database components without utilizing the web interface. Through the API, you can programmatically access subsets of information and retrieve their components, access the database with scripts written in various languages like Perl, Python, R etc., or incorporate connections with NMPFamsDB to your own applications or web pages.

The API currently serves results in the JSON format. All API services can be accessed by using both GET and POST requests, unless otherwise noted (see below).

Contents


How it works

The base URL address of the API is the following:

https://pavlopoulos-lab.org/NMPFamsDB/api/

Each specific method can be used by appending its respective portion of the URL to the base address. Currently, the following methods are offered:

Method Method Description Request Type URL Arguments
Search Families Returns a list of NMPFs. Additional arguments can be supplied to perform queries (see next column). GET, POST {base_url}/families ids: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025)
category: dataset category (metagenome, meatranscriptome or mixed)
taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745)
scaffolds: list of Scaffold IDs, separated by commas
ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229)
proteins: list of Protein IDs, separated by commas
structure: whether the family has a 3D structure model or not (y for yes, n for no)
Family Entry Retrieves all information on a family, based on the supplied NMPF Family ID (familyId). GET, POST {base_url}/families/familyId familyId: a NMPF ID (e.g. F000872)
Search 3D Models Returns a list of NMPFs with available 3D models. Additional arguments can be supplied to perform queries (see next column). GET, POST {base_url}/models ids: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025)
category: dataset category (metagenome, meatranscriptome or mixed)
taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745)
scaffold_ids: list of Scaffold IDs, separated by commas
protein_ids: list of Protein IDs, separated by commas ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229)
3D Model Retrieves information on an NMPF 3D structure model, based on the supplied NMPF Family ID (familyId). GET, POST {base_url}/models/familyId familyId: a NMPF ID (e.g. F000872)
Search Datasets Returns a list of IMG/M datasets. Additional arguments can be supplied to perform queries (see next column). GET, POST {base_url}/datasets families: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025)
taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745)
scaffolds: list of Scaffold IDs, separated by commas
proteins: list of Protein IDs, separated by commas
ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229)
category: dataset category (metagenome, meatranscriptome or mixed)
restricted: whether the dataset’s data usage policy is restricted or not (y for yes, n for no)
Dataset Retrieves all information on a Dataset as defined by the supplied Taxon OID (taxonOid). GET, POST {base_url}/datasets/taxonOid taxonOid: a dataset Taxon OID (e.g. 2001200001)
Search Scaffolds Returns a list of sequencing scaffolds. Additional arguments can be supplied to perform queries (see next column). POST {base_url}/scaffolds families: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025)
taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745)
scaffolds: list of Scaffold IDs, separated by commas
proteins: list of Protein IDs, separated by commas
ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229)
category: dataset category (metagenome, meatranscriptome or mixed)
restricted: whether the dataset’s data usage policy is restricted or not (y for yes, n for no)
Scaffold Retrieves all information on a sequencing scaffold as defined by the supplied Taxon OID (taxonOid) and Scaffold ID (scaffoldID). GET, POST {base_url}/scaffolds/taxonOid/scaffoldId taxonOid: a dataset Taxon OID (e.g. 2001200001)
scaffoldId: A scaffold identifier (e.g. JGI12270J11330_10000002)
Search Sequences Returns a list of protein sequences. Additional arguments can be supplied to perform queries (see next column). POST {base_url}/sequences families: list of NMPF identifiers, separated by commas (e.g. F000872,F106175,F000025)
taxon_oids: list of Taxon OIDs, separated by commas (e.g. 2001200001,7000000745)
scaffolds: list of Scaffold IDs, separated by commas
proteins: list of Protein IDs, separated by commas
ncbi: a list of NCBI Taxonomy identifiers, separated by commas (e.g. 10239,229)
category: dataset category (metagenome, meatranscriptome or mixed)
restricted: whether the dataset’s data usage policy is restricted or not (y for yes, n for no)

The Search Families (/families) , Search Datasets (/datasets), Search Scaffolds (/scaffolds) and Search 3D models (/models) can be used with no arguments, in which case all database entries will be retrieved. If the arguments supplied in the last column of the table above are used, the search will be refined based on those.

GET requests can be executed simply by pasting the complete URL in a browser address. POST requests can be executed using scripts written in, e.g. Perl, Python or R, with command line applications such as cURL or with a REST API client application, such as Postman. In GET requests, arguments can be included by appending a question mark (?) to the end of the service name and separating parameters with ampersand (&). In POST requests, arguments need to be styled accordingly based on the method used to build the request (see examples below).

Note: Although most methods are available as both GET and POST requests, the former are limited with regards to the number of argument parameters (i.e. the URL length) and the size of the response. For large requests, you should use POST instead of GET. Also, note that for reasons of computational efficiency, the Search Scaffolds and Search Sequences methods are only accessible as POST requests.

Back to top



Example GET requests:

To retrieve all NMPF families (similarly other types, e.g. datasets with /datasets):

https://pavlopoulos-lab.org/NMPFamsDB/api/families

To retrieve only families with 3D structure models:

https://pavlopoulos-lab.org/NMPFamsDB/api/families?structure=y

To retrieve only families with 3D structure models that are metatranscriptome-specific:

https://pavlopoulos-lab.org/NMPFamsDB/api/families?structure=y&category=metatranscriptome

To retrieve only the families associated with datasets 2001200001 and 7000000745 and DO NOT have 3D structure models:

https://pavlopoulos-lab.org/NMPFamsDB/api/families?taxon_oids=2001200001,7000000745&structure=n

To retrieve families containing viral sequences (The NCBI Taxonomy ID for viruses is 10239):

https://pavlopoulos-lab.org/NMPFamsDB/api/families?ncbi=10239

To get the data of a specific family (e.g. F000872):

https://pavlopoulos-lab.org/NMPFamsDB/api/families/F000872

To get a specific dataset (e.g. 7000000745):

https://pavlopoulos-lab.org/NMPFamsDB/api/datasets/7000000745

To get a specific scaffold (e.g. JGI12270J11330_10000002 from 3300000567):

https://pavlopoulos-lab.org/NMPFamsDB/api/scaffolds/3300000567/JGI12270J11330_10000002

Back to top



Example POST requests

With cURL

Retrieve all families:

curl -X POST https://pavlopoulos-lab.org/NMPFamsDB/api/families

Retrieve a particular family:

curl -X POST https://pavlopoulos-lab.org/NMPFamsDB/api/families/F000872

Retrieve all metatranscriptome-derived, viral (NCBI: 10239) scaffolds:

With Python

Example request for mixed metagenome/metatranscriptome (mixed) families, containing viral (10239) sequences and having a 3D structure:

import requests
import json
# define the API method URL  
url="https://pavlopoulos-lab.org/NMPFamsDB/api/families"
# define the search parameters
params={"category":"mixed", "ncbi":10239, "structure":"y"}
# perform the request, check the status code and if it is 200 (request ran)
# decode the result in JSON
req=requests.post(url, params)
if req.status_code==200:
    result=req.json()
    # print the result
    print(json.dumps(result, sort_keys=True, indent=4))

Back to top



The response format

All requests return data in the JSON format. An example is shown below:

Search Families Request:

https://pavlopoulos-lab.org/NMPFamsDB/api/families?taxon_oids=7000000745&structure=n&category=metagenome

Response (excerpt):

[
    {
        "ID": "F046432",
        "Category": "Metagenome",
        "SequenceCount": "151",
        "ScaffoldCount": "151",
        "SampleCount": "105",
        "HabitatCount": "11",
        "PDB": "N",
        "PDB_Confidence": null
    },
    {
        "ID": "F095629",
        "Category": "Metagenome",
        "SequenceCount": "105",
        "ScaffoldCount": "105",
        "SampleCount": "94",
        "HabitatCount": "7",
        "PDB": "N",
        "PDB_Confidence": null
    }
]

Example Family Entry request:

https://pavlopoulos-lab.org/NMPFamsDB/api/families/F003346

Response (excerpt):

{
    "ID": "F003346",
    "url": "https://pavlopoulos-lab.org/NMPFamsDB/family?id=F003346",
    "Category": "Metagenome",
    "Sequence Count": 492,
    "Scaffold Count": 484,
    "Dataset Count": 1,
    "Ecosystem Count": 1,
    "Consensus Sequence": "VLERKEGKTGENLKNKELVLEKRCKVRPPLPPMADCEDLMGKYELMSMLRRTTQVEMSVGILRSRFETYPPQQFDLTVLEEDEDVLDPITTLGNHVVASGPRTLEERIESGRDWEQWLASVDVEEEERLVAEAETHLKYAKAWVDSLVGDVDIAPKICPGSTDNK",
    "Structure": {
        "Pivot Sequence": "MADKCEDLMGKYESMLRRTTQVEMSVGILRSRFETYPPQQFDLTVLEEDVLDTPITLGNHVVASGPRTLEERIESGRDWEQWLASVGDVEEEERLVAEAQETAKAWVDSLVGDVDIAPK",
        "Secondary Structure": "CCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCHCCCCCCCCCCEEEECCCCCHHHHHCCCCCHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCC",
        "Secondary Structure Confidence Score": "97655778776788887654455556665555456997766543344444578755687566468765777654677788888655776889999999999999999987656677999",
        "Disordered Regions": null,
        "3D Model": "N",
        "3D pLDDT": 0
    },
    "Topology": {
        "Signal Peptide": null,
        "Topology": "GLOBULAR",
        "Transmembrane": null,
        "TransmembraneTopology": null
    },
    "Datasets": [
        {
            "Taxon OID": "3300031730",
            "Sample Name": "Populus trichocarpa ectomycorrhiza microbial communities from riparian zone in the Pacific Northwest, United States - 19_EM"
        }
    ],
    "Scaffolds": [
        {
            "Taxon OID": "3300031730",
            "Scaffold ID": "Ga0307516_10045150",
            "Organism": "Predicted Viral",
            "NCBI TaxID": 0
        },
        {
            "Taxon OID": "3300031730",
            "Scaffold ID": "Ga0307516_10133304",
            "Organism": "Predicted Viral",
            "NCBI TaxID": 0
        },
        {
            "Taxon OID": "3300031730",
            "Scaffold ID": "Ga0307516_10202168",
            "Organism": "Predicted Viral",
            "NCBI TaxID": 0
        }
     ]
}

Back to top


© Pavlopoulos Lab, Bioinformatics & Integrative Biology | B.S.R.C. "Alexander Fleming" | Privacy Notice
Make sure JavaScript is enabled in your browser settings to achieve functionality.