ucsc数据库是什么数据库,ucsc数据库详细教程

https://github.com/Wy2160640/cruzdb

UCSC基因组数据库是注释，调节和变异以及越来越多的分类群的各种数据的重要资源。该库旨在简化数据的利用，以便我们可以进行复杂的分析，而无需采用易于操作，容易出错的操作。作为动机，以下是一些功能的示例：

>>> from cruzdb import Genome>>> g = Genomedb=”hg18″)>>> muc5b = g.refGene.filter_byname2=”MUC5B”).first)>>> muc5brefGenechr11:MUC5B:1200870-1239982)>>> muc5b.strand’+’# the first 4 introns>>> muc5b.introns[:4][1200999L, 1203486L), 1203543L, 1204010L), 1204082L, 1204420L), 1204682L, 1204836L)]# the first 4 exons.>>> muc5b.exons[:4][1200870L, 1200999L), 1203486L, 1203543L), 1204010L, 1204082L), 1204420L, 1204682L)]# note that some of these are not coding because they are < cdsStart>>> muc5b.cdsStart1200929L# the extent of the 5′ utr.>>> muc5b.utr51200870L, 1200929L)# we can get the first 4) actual CDS’s with:>>> muc5b.cds[:4][1200929L, 1200999L), 1203486L, 1203543L), 1204010L, 1204082L), 1204420L, 1204682L)]# the cds sequence from the UCSC DAS server as a list with one entry per cds>>> muc5b.cds_sequence #doctest: +ELLIPSIS[‘atgggtgccccgagcgcgtgccggacgctggtgttggctctggcggccatgctcgtggtgccgcaggcag’, …]>>> transcript = g.knownGene.filter_byname=”uc001aaa.2″).first)>>> transcript.is_codingFalse# convert a genome coordinate to a local coordinate.>>> transcript.localizetranscript.txStart)0L# or localize to the CDNA position.>>> print transcript.localizetranscript.cdsStart, cdna=True)None 命令行调用 python -m cruzdb hg18 input.bed refGene cpgIslandExt

使用版本hg18中的refGene和cpgIslandExt表注释间隔。

数据框

……是这样的。我们可以从桌子上得到一个：

>>> df = g.dataframe’cpgIslandExt’)>>> df.columns #doctest: +ELLIPSISIndex[chrom, chromStart, chromEnd, name, length, cpgNum, gcNum, perCpg, perGc, obsExp], dtype=object)

通过将’refGene’更改为’knownGene’，可以使用knownGene注释重复上述所有操作。而且，它可以很容易地完成一组基因。

空间的

可以使用k近邻，上游和下游搜索。上行和下游搜索使用查询功能的链来确定方向：

>>> nearest = g.knearest”refGene”, “chr1”, 9444, 9555, 二分快三计划t; transcript.is_codingFalse# convert a genome coordinate to a local coordinate.>>> transcript.localizetranscript.txStart)0L# or localize to the CDNA position.>>> print transcript.localizetranscript.cdsStart, cdna=True)None 命令行调用 python -m cruzdb hg18 input.bed refGene cpgIslandExt

使用版本hg18中的refGene和cpgIslandExt表注释间隔。

数据框

……是这样的。我们可以从桌子上得到一个：

>>> df = g.dataframe’cpgIslandExt’)>>> df.columns #doctest: +ELLIPSISIndex[chrom, chromStart, chromEnd, name, length, cpgNum, gcNum, perCpg, perGc, obsExp], dtype=object)

通过将’refGene’更改为’knownGene’，可以使用knownGene注释重复上述所有操作。而且，它可以很容易地完成一组基因。

空间的

可以使用k近邻，上游和下游搜索。上行和下游搜索使用查询功能的链来确定方向：

>>> nearest = g.knearest”refGene”, “chr1″, 9444, 9555, k=6)>>> up_list = g.upstream”refGene”, “chr1″, 9444, 9555, k=6)>>> down_list = g.downstream”refGene”, “chr1”, 9444, 9555, k=6) 镜像

以上使用UCSC的mysql接口。现在可以通过以下方式将任何表从UCSC镜像到本地sqlite数据库：

>>> import os>>> if os.path.exists”/tmp/u.db”): os.unlink’/tmp/u.db’)>>> g = Genome’hg18′)>>> gs = g.mirror[‘chromInfo’], ‘sqlite:tmp/u.db’)

然后用作：

>>> gs.chromInfo<class ‘cruzdb.sqlsoup.chromInfo’> 代码

大多数每行功能都在Feature类的cruzdb/models.py中实现。如果要向功能添加内容（如现有feature.utr5），请在此处添加。
这些表使用sqlalchemy反映并映射到cruzdb/__ init__.py中Genome类的__getattr__方法中，所以像这样调用：

genome.knownGene

调用__getattr__方法，将表arg设置为’knownGene’，然后反映该表，并返回父类为Feature和sqlalchemy的declarative_base的对象。

贡献

要开始编码，获取一些UCSC表的副本可能很有礼貌，以免使UCSC服务器过载。你可以运行类似的东西：

Genome’hg18′).mirror[“refGene”, “cpgIslandExt”, “chromInfo”, “knownGene”, “kgXref”], “sqlite:tmp/hg18.db”)

然后连接将是这样的：

g = Genome”sqlite:tmp/hg18.db”)

转载于:https://www.cnblogs.com/yahengwang/p/10195614.html

ucsc数据库是什么数据库,ucsc数据库详细教程

Published by

风君子

发表回复取消回复

Published by

风君子

发表回复 取消回复

发表回复取消回复