In parsing a large 3 gigabyte file with DCG, efficiency is of importance.
The current version of my lexer is using mostly the or predicate ;/2 but I read that indexing can help.
Indexing is a technique used to quickly select candidate clauses of a predicate for a specific goal. In most Prolog systems, indexing is done (only) on the first argument of the head. If this argument is instantiated to an atom, integer, float or compound term with functor, hashing is used to quickly select all clauses where the first argument may unify with the first argument of the goal. SWI-Prolog supports just-in-time and multi-argument indexing. See section 2.18.
Can someone give an example of using indexing for lexing and possibly explain how it improves efficiency?
Details
Note: I changed some of the names before coping the source code into this question. If you find a mistake feel free to edit it here or leave me a comment and I will gladly fix it.
Currently my lexer/tokenizer (based on mzapotoczny/prolog-interpreter parser.pl) is this
% N.B.
% Since the lexer uses "" for values, the double_quotes flag has to be set to `chars`.
% If double_quotes flag is set to `code`, the the values with "" will not be matched.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- set_prolog_flag(double_quotes,chars).
lexer(Tokens) -->
white_space,
(
( ":", !, { Token = tokColon }
; "(", !, { Token = tokLParen }
; ")", !, { Token = tokRParen }
; "{", !, { Token = tokLMusta}
; "}", !, { Token = tokRMusta}
; "\\", !, { Token = tokSlash}
; "->", !, { Token = tokImpl}
; "+", !, { Token = tokPlus }
; "-", !, { Token = tokMinus }
; "*", !, { Token = tokTimes }
; "=", !, { Token = tokEqual }
; "<", !, { Token = tokLt }
; ">", !, { Token = tokGt }
; "_", !, { Token = tokUnderscore }
; ".", !, { Token = tokPeriod }
; "/", !, { Token = tokForwardSlash }
; ",", !, { Token = tokComma }
; ";", !, { Token = tokSemicolon }
; digit(D), !,
number(D, N),
{ Token = tokNumber(N) }
; letter(L), !, identifier(L, Id),
{ member((Id, Token), [ (div, tokDiv),
(mod, tokMod),
(where, tokWhere)]),
!
; Token = tokVar(Id)
}
; [_],
{ Token = tokUnknown }
),
!,
{ Tokens = [Token | TokList] },
lexer(TokList)
; [],
{ Tokens = [] }
).
white_space -->
[Char], { code_type(Char, space) }, !, white_space.
white_space -->
"--", whole_line, !, white_space.
white_space -->
[].
whole_line --> "\n", !.
whole_line --> [_], whole_line.
digit(D) -->
[D],
{ code_type(D, digit) }.
digits([D|T]) -->
digit(D),
!,
digits(T).
digits([]) -->
[].
number(D, N) -->
digits(Ds),
{ number_chars(N, [D|Ds]) }.
letter(L) -->
[L], { code_type(L, alpha) }.
alphanum([A|T]) -->
[A], { code_type(A, alnum) }, !, alphanum(T).
alphanum([]) -->
[].
alphanum([]).
alphanum([H|T]) :- code_type(H, alpha), alphanum(T).
identifier(L, Id) -->
alphanum(As),
{ atom_codes(Id, [L|As]) }.
Here are some helper predicates used for development and testing.
read_file_for_lexing_and_user_review(Path) :-
open(Path,read,Input),
read_input_for_user_review(Input), !,
close(Input).
read_file_for_lexing_and_performance(Path,Limit) :-
open(Path,read,Input),
read_input_for_performance(Input,0,Limit), !,
close(Input).
read_input(Input) :-
at_end_of_stream(Input).
read_input(Input) :-
\+ at_end_of_stream(Input),
read_string(Input, "\n", "\r\t ", _, Line),
lex_line(Line),
read_input(Input).
read_input_for_user_review(Input) :-
at_end_of_stream(Input).
read_input_for_user_review(Input) :-
\+ at_end_of_stream(Input),
read_string(Input, "\n", "\r\t ", _, Line),
lex_line_for_user_review(Line),
nl,
print('Press spacebar to continue or any other key to exit: '),
get_single_char(Key),
process_user_continue_or_exit_key(Key,Input).
read_input_for_performance(Input,Count,Limit) :-
Count >= Limit.
read_input_for_performance(Input,_,_) :-
at_end_of_stream(Input).
read_input_for_performance(Input,Count0,Limit) :-
% print(Count0),
\+ at_end_of_stream(Input),
read_string(Input, "\n", "\r\t ", _, Line),
lex_line(Line),
Count is Count0 + 1,
read_input_for_performance(Input,Count,Limit).
process_user_continue_or_exit_key(32,Input) :- % space bar
nl, nl,
read_input_for_user_review(Input).
process_user_continue_or_exit_key(Key) :-
Key \= 32.
lex_line_for_user_review(Line) :-
lex_line(Line,TokList),
print(Line),
nl,
print(TokList),
nl.
lex_line(Line,TokList) :-
string_chars(Line,Code_line),
phrase(lexer(TokList),Code_line).
lex_line(Line) :-
string_chars(Line,Code_line),
phrase(lexer(TokList),Code_line).
read_user_input_for_lexing_and_user_review :-
print('Enter a line to parse or just Enter to exit: '),
nl,
read_string(user, "\n", "\r", _, String),
nl,
lex_line_for_user_review(String),
nl,
continue_user_input_for_lexing_and_user_review(String).
continue_user_input_for_lexing_and_user_review(String) :-
string_length(String,N),
N > 0,
read_user_input_for_lexing_and_user_review.
continue_user_input_for_lexing_and_user_review(String) :-
string_length(String,0).
read_user_input_for_lexing_and_user_review/0
allows a user to enter a string at the terminal for lexing and review the tokens.
read_file_for_lexing_and_user_review/1
Reads a file for lexing and review the tokens for each line one line at a time.
read_file_for_lexing_and_performance/2
Reads a file for lexing with a limit on the number of lines to lex. This is for use with gathering basic performance statistics to measure efficiency. Meant to be used with time/1.
Examples of debug and testing predicates.
?- read_user_input_for_lexing_and_user_review.
'Enter a line to parse or just Enter to exit: '
|: ID GRAA_HUMAN Reviewed; 262 AA.
"ID GRAA_HUMAN Reviewed; 262 AA."
[tokVar('ID'),tokVar('GRAA'),tokUnderscore,tokVar('HUMAN'),tokVar('Reviewed'),tokSemicolon,tokNumber(262),tokVar('AA'),tokPeriod]
'Enter a line to parse or just Enter to exit: '
|:
...
?- read_file_for_lexing_and_user_review('C:/example_lines.dat').
"ID GRAA_HUMAN Reviewed; 262 AA."
[tokVar('ID'),tokVar('GRAA'),tokUnderscore,tokVar('HUMAN'),tokVar('Reviewed'),tokSemicolon,tokNumber(262),tokVar('AA'),tokPeriod]
'Press spacebar to continue or any other key to exit: '
"AC P12544; A4PHN1; Q6IB36;"
[tokVar('AC'),tokVar('P12544'),tokSemicolon,tokVar('A4PHN1'),tokSemicolon,tokVar('Q6IB36'),tokSemicolon]
'Press spacebar to continue or any other key to exit: '
"DT 01-OCT-1989, integrated into UniProtKB/Swiss-Prot."
[tokVar('DT'),tokNumber(1),tokMinus,tokVar('OCT'),tokMinus,tokNumber(1989),tokComma,tokVar(integrated),tokVar(into),tokVar('UniProtKB'),tokForwardSlash,tokVar('Swiss'),tokMinus,tokVar('Prot'),tokPeriod]
'Press spacebar to continue or any other key to exit: '
"//"
[tokForwardSlash,tokForwardSlash]
'Press spacebar to continue or any other key to exit: '
true.
?- time(read_file_for_lexing_and_performance('C:/example_lines.dat',1000000)).
% 815,414,350 inferences, 63.094 CPU in 64.251 seconds (98% CPU, 12923853 Lips)
true.
Related question
How to see all of the answers in SWI-Prolog without pressing spacebar? Specifically see the section on details.
Real world example data. (UniProt Knowledgebase Structure of a sequence entry)
File format
ID GRAA_HUMAN Reviewed; 262 AA.
AC P12544; A4PHN1; Q6IB36;
DT 01-OCT-1989, integrated into UniProtKB/Swiss-Prot.
DT 11-JAN-2011, sequence version 2.
DT 07-JUN-2017, entry version 186.
DE RecName: Full=Granzyme A;
DE EC=3.4.21.78;
DE AltName: Full=CTL tryptase;
DE AltName: Full=Cytotoxic T-lymphocyte proteinase 1;
DE AltName: Full=Fragmentin-1;
DE AltName: Full=Granzyme-1;
DE AltName: Full=Hanukkah factor;
DE Short=H factor;
DE Short=HF;
DE Flags: Precursor;
GN Name=GZMA; Synonyms=CTLA3, HFSP;
OS Homo sapiens (Human).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
OC Catarrhini; Hominidae; Homo.
OX NCBI_TaxID=9606;
RN [1]
RP NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM ALPHA), AND VARIANT THR-121.
RC TISSUE=T-cell;
RX PubMed=3257574; DOI=10.1073/pnas.85.4.1184;
RA Gershenfeld H.K., Hershberger R.J., Shows T.B., Weissman I.L.;
RT "Cloning and chromosomal assignment of a human cDNA encoding a T cell-
RT and natural killer cell-specific trypsin-like serine protease.";
RL Proc. Natl. Acad. Sci. U.S.A. 85:1184-1188(1988).
RN [2]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM ALPHA), AND VARIANT
RP THR-121.
RA Ebert L., Schick M., Neubert P., Schatten R., Henze S., Korn B.;
RT "Cloning of human full open reading frames in Gateway(TM) system entry
RT vector (pDONR201).";
RL Submitted (JUN-2004) to the EMBL/GenBank/DDBJ databases.
RN [3]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RX PubMed=15372022; DOI=10.1038/nature02919;
RA Schmutz J., Martin J., Terry A., Couronne O., Grimwood J., Lowry S.,
RA Gordon L.A., Scott D., Xie G., Huang W., Hellsten U., Tran-Gyamfi M.,
RA She X., Prabhakar S., Aerts A., Altherr M., Bajorek E., Black S.,
RA Branscomb E., Caoile C., Challacombe J.F., Chan Y.M., Denys M.,
RA Detter J.C., Escobar J., Flowers D., Fotopulos D., Glavina T.,
RA Gomez M., Gonzales E., Goodstein D., Grigoriev I., Groza M.,
RA Hammon N., Hawkins T., Haydu L., Israni S., Jett J., Kadner K.,
RA Kimball H., Kobayashi A., Lopez F., Lou Y., Martinez D., Medina C.,
RA Morgan J., Nandkeshwar R., Noonan J.P., Pitluck S., Pollard M.,
RA Predki P., Priest J., Ramirez L., Retterer J., Rodriguez A.,
RA Rogers S., Salamov A., Salazar A., Thayer N., Tice H., Tsai M.,
RA Ustaszewska A., Vo N., Wheeler J., Wu K., Yang J., Dickson M.,
RA Cheng J.-F., Eichler E.E., Olsen A., Pennacchio L.A., Rokhsar D.S.,
RA Richardson P., Lucas S.M., Myers R.M., Rubin E.M.;
RT "The DNA sequence and comparative analysis of human chromosome 5.";
RL Nature 431:268-274(2004).
RN [4]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM ALPHA), AND VARIANT
RP THR-121.
RC TISSUE=Blood;
RX PubMed=15489334; DOI=10.1101/gr.2596504;
RG The MGC Project Team;
RT "The status, quality, and expansion of the NIH full-length cDNA
RT project: the Mammalian Gene Collection (MGC).";
RL Genome Res. 14:2121-2127(2004).
RN [5]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 1-72, NUCLEOTIDE SEQUENCE [MRNA]
RP OF 1-34 (ISOFORM BETA), ALTERNATIVE PROMOTER USAGE, AND INDUCTION.
RX PubMed=17180578; DOI=10.1007/s10038-006-0099-9;
RA Ruike Y., Katsuma S., Hirasawa A., Tsujimoto G.;
RT "Glucocorticoid-induced alternative promoter usage for a novel 5'
RT variant of granzyme A.";
RL J. Hum. Genet. 52:172-178(2007).
RN [6]
RP NUCLEOTIDE SEQUENCE [GENOMIC DNA] OF 1-23.
RA Goralski T.J., Krensky A.M.;
RT "The upstream region of the human granzyme A locus contains both
RT positive and negative transcriptional regulatory elements.";
RL Submitted (NOV-1995) to the EMBL/GenBank/DDBJ databases.
RN [7]
RP PROTEIN SEQUENCE OF 29-53.
RX PubMed=3047119;
RA Poe M., Bennett C.D., Biddison W.E., Blake J.T., Norton G.P.,
RA Rodkey J.A., Sigal N.H., Turner R.V., Wu J.K., Zweerink H.J.;
RT "Human cytotoxic lymphocyte tryptase. Its purification from granules
RT and the characterization of inhibitor and substrate specificity.";
RL J. Biol. Chem. 263:13215-13222(1988).
RN [8]
RP PROTEIN SEQUENCE OF 29-40, AND CHARACTERIZATION.
RX PubMed=3262682;
RA Hameed A., Lowrey D.M., Lichtenheld M., Podack E.R.;
RT "Characterization of three serine esterases isolated from human IL-2
RT activated killer cells.";
RL J. Immunol. 141:3142-3147(1988).
RN [9]
RP PROTEIN SEQUENCE OF 29-39, AND CHARACTERIZATION.
RX PubMed=3263427;
RA Kraehenbuhl O., Rey C., Jenne D.E., Lanzavecchia A., Groscurth P.,
RA Carrel S., Tschopp J.;
RT "Characterization of granzymes A and B isolated from granules of
RT cloned human cytotoxic T lymphocytes.";
RL J. Immunol. 141:3471-3477(1988).
RN [10]
RP FUNCTION AS SET PROTEASE.
RX PubMed=11555662; DOI=10.1074/jbc.M108137200;
RA Beresford P.J., Zhang D., Oh D.Y., Fan Z., Greer E.L., Russo M.L.,
RA Jaju M., Lieberman J.;
RT "Granzyme A activates an endoplasmic reticulum-associated caspase-
RT independent nuclease to induce single-stranded DNA nicks.";
RL J. Biol. Chem. 276:43285-43293(2001).
RN [11]
RP FUNCTION AS SET PROTEASE.
RX PubMed=12628186; DOI=10.1016/S0092-8674(03)00150-8;
RA Fan Z., Beresford P.J., Oh D.Y., Zhang D., Lieberman J.;
RT "Tumor suppressor NM23-H1 is a granzyme A-activated DNase during CTL-
RT mediated apoptosis, and the nucleosome assembly protein SET is its
RT inhibitor.";
RL Cell 112:659-672(2003).
RN [12]
RP FUNCTION, AND INTERACTION WITH APEX1.
RX PubMed=12524539; DOI=10.1038/ni885;
RA Fan Z., Beresford P.J., Zhang D., Xu Z., Novina C.D., Yoshida A.,
RA Pommier Y., Lieberman J.;
RT "Cleaving the oxidative repair protein Ape1 enhances cell death
RT mediated by granzyme A.";
RL Nat. Immunol. 4:145-153(2003).
RN [13]
RP FUNCTION AS SET PROTEASE.
RX PubMed=16818237; DOI=10.1016/j.molcel.2006.06.005;
RA Chowdhury D., Beresford P.J., Zhu P., Zhang D., Sung J.S., Demple B.,
RA Perrino F.W., Lieberman J.;
RT "The exonuclease TREX1 is in the SET complex and acts in concert with
RT NM23-H1 to degrade DNA during granzyme A-mediated cell death.";
RL Mol. Cell 23:133-142(2006).
RN [14]
RP GLYCOSYLATION [LARGE SCALE ANALYSIS] AT ASN-170.
RC TISSUE=Liver;
RX PubMed=19159218; DOI=10.1021/pr8008012;
RA Chen R., Jiang X., Sun D., Han G., Wang F., Ye M., Wang L., Zou H.;
RT "Glycoproteomics analysis of human liver tissue by combination of
RT multiple enzyme digestion and hydrazide chemistry.";
RL J. Proteome Res. 8:651-661(2009).
RN [15]
RP 3D-STRUCTURE MODELING OF 29-262.
RX PubMed=3237717; DOI=10.1002/prot.340040306;
RA Murphy M.E.P., Moult J., Bleackley R.C., Gershenfeld H.,
RA Weissman I.L., James M.N.G.;
RT "Comparative molecular model building of two serine proteinases from
RT cytotoxic T lymphocytes.";
RL Proteins 4:190-204(1988).
RN [16]
RP X-RAY CRYSTALLOGRAPHY (2.4 ANGSTROMS) OF 29-262 IN COMPLEX WITH A
RP TRIPEPTIDE CMK INHIBITOR.
RX PubMed=12819769; DOI=10.1038/nsb944;
RA Bell J.K., Goetz D.H., Mahrus S., Harris J.L., Fletterick R.J.,
RA Craik C.S.;
RT "The oligomeric structure of human granzyme A is a determinant of its
RT extended substrate specificity.";
RL Nat. Struct. Biol. 10:527-534(2003).
RN [17]
RP X-RAY CRYSTALLOGRAPHY (2.5 ANGSTROMS) OF 29-262 IN COMPLEX WITH
RP SUBSTRATE.
RX PubMed=12819770; DOI=10.1038/nsb945;
RA Hink-Schauer C., Estebanez-Perpina E., Kurschus F.C., Bode W.,
RA Jenne D.E.;
RT "Crystal structure of the apoptosis-inducing human granzyme A dimer.";
RL Nat. Struct. Biol. 10:535-540(2003).
CC -!- FUNCTION: Abundant protease in the cytosolic granules of cytotoxic
CC T-cells and NK-cells which activates caspase-independent cell
CC death with morphological features of apoptosis when delivered into
CC the target cell through the immunological synapse. It cleaves
CC after Lys or Arg. Cleaves APEX1 after 'Lys-31' and destroys its
CC oxidative repair activity. Cleaves the nucleosome assembly protein
CC SET after 'Lys-189', which disrupts its nucleosome assembly
CC activity and allows the SET complex to translocate into the
CC nucleus to nick and degrade the DNA. {ECO:0000269|PubMed:11555662,
CC ECO:0000269|PubMed:12524539, ECO:0000269|PubMed:12628186,
CC ECO:0000269|PubMed:16818237}.
CC -!- CATALYTIC ACTIVITY:
CC Reaction=Hydrolysis of proteins, including fibronectin, type IV
CC collagen and nucleolin. Preferential cleavage: -Arg-|-Xaa-,
CC -Lys-|-Xaa- >> -Phe-|-Xaa- in small molecule substrates.;
CC EC=3.4.21.78;
CC -!- SUBUNIT: Homodimer; disulfide-linked. Interacts with APEX1.
CC {ECO:0000269|PubMed:12524539, ECO:0000269|PubMed:12819769,
CC ECO:0000269|PubMed:12819770}.
CC -!- SUBCELLULAR LOCATION: Isoform alpha: Secreted. Cytoplasmic
CC granule.
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative promoter usage; Named isoforms=2;
CC Name=alpha;
CC IsoId=P12544-1; Sequence=Displayed;
CC Name=beta;
CC IsoId=P12544-2; Sequence=VSP_038571, VSP_038572;
CC -!- INDUCTION: Dexamethasone (DEX) induces expression of isoform beta
CC and represses expression of isoform alpha. The alteration in
CC expression is mediated by binding of glucocorticoid receptor to
CC independent promoters adjacent to the alternative first exons of
CC isoform alpha and isoform beta. {ECO:0000269|PubMed:17180578}.
CC -!- SIMILARITY: Belongs to the peptidase S1 family. Granzyme
CC subfamily. {ECO:0000255|PROSITE-ProRule:PRU00274}.
CC -!- CAUTION: Exons 1a and 1b of the sequence reported in
CC PubMed:17180578 are of human origin, however exon 2 shows strong
CC similarity to the rat sequence. {ECO:0000305}.
CC -!- WEB RESOURCE: Name=Atlas of Genetics and Cytogenetics in Oncology
CC and Haematology;
CC URL="http://atlasgeneticsoncology.org/Genes/GZMAID51130ch5q11.html";
DR EMBL; M18737; AAA52647.1; -; mRNA.
DR EMBL; CR456968; CAG33249.1; -; mRNA.
DR EMBL; AC091977; -; NOT_ANNOTATED_CDS; Genomic_DNA.
DR EMBL; BC015739; AAH15739.1; -; mRNA.
DR EMBL; AB284134; BAF56159.1; -; mRNA.
DR EMBL; U40006; AAD00009.1; -; Genomic_DNA.
DR CCDS; CCDS3965.1; -. [P12544-1]
DR PIR; A31372; A31372.
DR RefSeq; NP_006135.1; NM_006144.3.
DR UniGene; Hs.90708; -.
DR PDB; 1HF1; Model; -; A=29-262.
DR PDB; 1OP8; X-ray; 2.50 A; A/B/C/D/E/F=29-262.
DR PDB; 1ORF; X-ray; 2.40 A; A=29-262.
DR PDBsum; 1HF1; -.
DR PDBsum; 1OP8; -.
DR PDBsum; 1ORF; -.
DR ProteinModelPortal; P12544; -.
DR SMR; P12544; -.
DR BioGrid; 109256; 17.
DR IntAct; P12544; 4.
DR STRING; 9606.ENSP00000274306; -.
DR ChEMBL; CHEMBL4307; -.
DR MEROPS; S01.135; -.
DR iPTMnet; P12544; -.
DR PhosphoSitePlus; P12544; -.
DR BioMuta; GZMA; -.
DR DMDM; 317373360; -.
DR MaxQB; P12544; -.
DR PaxDb; P12544; -.
DR PeptideAtlas; P12544; -.
DR PRIDE; P12544; -.
DR DNASU; 3001; -.
DR Ensembl; ENST00000274306; ENSP00000274306; ENSG00000145649. [P12544-1]
DR GeneID; 3001; -.
DR KEGG; hsa:3001; -.
DR UCSC; uc003jpm.4; human. [P12544-1]
DR CTD; 3001; -.
DR DisGeNET; 3001; -.
DR GeneCards; GZMA; -.
DR HGNC; HGNC:4708; GZMA.
DR HPA; HPA054134; -.
DR MIM; 140050; gene.
DR neXtProt; NX_P12544; -.
DR OpenTargets; ENSG00000145649; -.
DR PharmGKB; PA29086; -.
DR eggNOG; KOG3627; Eukaryota.
DR eggNOG; COG5640; LUCA.
DR GeneTree; ENSGT00760000118895; -.
DR HOGENOM; HOG000251820; -.
DR HOVERGEN; HBG013304; -.
DR InParanoid; P12544; -.
DR KO; K01352; -.
DR OMA; KEFPYPC; -.
DR OrthoDB; EOG091G0AH5; -.
DR PhylomeDB; P12544; -.
DR TreeFam; TF333630; -.
DR BRENDA; 3.4.21.78; 2681.
DR SIGNOR; P12544; -.
DR EvolutionaryTrace; P12544; -.
DR GeneWiki; GZMA; -.
DR GenomeRNAi; 3001; -.
DR PRO; PR:P12544; -.
DR Proteomes; UP000005640; Chromosome 5.
DR Bgee; ENSG00000145649; Expressed in 150 organ(s), highest expression level in leukocyte.
DR CleanEx; HS_GZMA; -.
DR Genevisible; P12544; HS.
DR GO; GO:0005576; C:extracellular region; IEA:UniProtKB-SubCell.
DR GO; GO:0001772; C:immunological synapse; TAS:UniProtKB.
DR GO; GO:0005634; C:nucleus; TAS:UniProtKB.
DR GO; GO:0042803; F:protein homodimerization activity; IDA:UniProtKB.
DR GO; GO:0004252; F:serine-type endopeptidase activity; IDA:UniProtKB.
DR GO; GO:0006915; P:apoptotic process; TAS:UniProtKB.
DR GO; GO:0019835; P:cytolysis; IEA:UniProtKB-KW.
DR GO; GO:0006955; P:immune response; TAS:UniProtKB.
DR GO; GO:0043392; P:negative regulation of DNA binding; IDA:UniProtKB.
DR GO; GO:0032078; P:negative regulation of endodeoxyribonuclease activity; IDA:UniProtKB.
DR GO; GO:0051354; P:negative regulation of oxidoreductase activity; IDA:UniProtKB.
DR GO; GO:0043065; P:positive regulation of apoptotic process; IDA:UniProtKB.
DR GO; GO:0051603; P:proteolysis involved in cellular protein catabolic process; IDA:UniProtKB.
DR CDD; cd00190; Tryp_SPc; 1.
DR InterPro; IPR009003; Peptidase_S1_PA.
DR InterPro; IPR001314; Peptidase_S1A.
DR InterPro; IPR001254; Trypsin_dom.
DR InterPro; IPR018114; TRYPSIN_HIS.
DR InterPro; IPR033116; TRYPSIN_SER.
DR Pfam; PF00089; Trypsin; 1.
DR PRINTS; PR00722; CHYMOTRYPSIN.
DR SMART; SM00020; Tryp_SPc; 1.
DR SUPFAM; SSF50494; SSF50494; 1.
DR PROSITE; PS50240; TRYPSIN_DOM; 1.
DR PROSITE; PS00134; TRYPSIN_HIS; 1.
DR PROSITE; PS00135; TRYPSIN_SER; 1.
PE 1: Evidence at protein level;
KW 3D-structure; Alternative promoter usage; Apoptosis;
KW Complete proteome; Cytolysis; Direct protein sequencing;
KW Disulfide bond; Glycoprotein; Hydrolase; Polymorphism; Protease;
KW Reference proteome; Secreted; Serine protease; Signal; Zymogen.
FT SIGNAL 1 26 In isoform alpha.
FT PROPEP 27 28 Activation peptide (in isoform alpha).
FT {ECO:0000269|PubMed:3047119,
FT ECO:0000269|PubMed:3262682,
FT ECO:0000269|PubMed:3263427}.
FT /FTId=PRO_0000027393.
FT CHAIN 29 262 Granzyme A.
FT /FTId=PRO_0000027394.
FT DOMAIN 29 259 Peptidase S1. {ECO:0000255|PROSITE-
FT ProRule:PRU00274}.
FT ACT_SITE 69 69 Charge relay system.
FT ACT_SITE 114 114 Charge relay system.
FT ACT_SITE 212 212 Charge relay system.
FT CARBOHYD 170 170 N-linked (GlcNAc...) asparagine.
FT {ECO:0000269|PubMed:19159218}.
FT DISULFID 54 70
FT DISULFID 148 218
FT DISULFID 179 197
FT DISULFID 208 234
FT VAR_SEQ 1 17 Missing (in isoform beta).
FT {ECO:0000303|PubMed:17180578}.
FT /FTId=VSP_038571.
FT VAR_SEQ 18 23 LLLIPE -> MTKGLR (in isoform beta).
FT {ECO:0000303|PubMed:17180578}.
FT /FTId=VSP_038572.
FT VARIANT 121 121 M -> T (in dbSNP:rs3104233).
FT {ECO:0000269|PubMed:15489334,
FT ECO:0000269|PubMed:3257574,
FT ECO:0000269|Ref.2}.
FT /FTId=VAR_024291.
FT CONFLICT 33 34 NE -> DT (in Ref. 5; no nucleotide
FT entry). {ECO:0000305}.
FT CONFLICT 36 36 T -> V (in Ref. 5; no nucleotide entry).
FT {ECO:0000305}.
FT CONFLICT 47 47 S -> K (in Ref. 5; no nucleotide entry).
FT {ECO:0000305}.
FT CONFLICT 49 52 DRKT -> KPDS (in Ref. 5; no nucleotide
FT entry). {ECO:0000305}.
FT CONFLICT 62 62 D -> N (in Ref. 5; no nucleotide entry).
FT {ECO:0000305}.
FT CONFLICT 71 72 NL -> IP (in Ref. 5; no nucleotide
FT entry). {ECO:0000305}.
FT STRAND 43 47 {ECO:0000244|PDB:1ORF}.
FT STRAND 49 51 {ECO:0000244|PDB:1ORF}.
FT STRAND 53 60 {ECO:0000244|PDB:1ORF}.
FT STRAND 63 66 {ECO:0000244|PDB:1ORF}.
FT STRAND 77 81 {ECO:0000244|PDB:1ORF}.
FT STRAND 83 87 {ECO:0000244|PDB:1ORF}.
FT STRAND 93 95 {ECO:0000244|PDB:1ORF}.
FT STRAND 97 102 {ECO:0000244|PDB:1ORF}.
FT TURN 108 110 {ECO:0000244|PDB:1ORF}.
FT STRAND 116 122 {ECO:0000244|PDB:1ORF}.
FT STRAND 127 130 {ECO:0000244|PDB:1ORF}.
FT STRAND 147 154 {ECO:0000244|PDB:1ORF}.
FT STRAND 156 160 {ECO:0000244|PDB:1ORF}.
FT STRAND 167 174 {ECO:0000244|PDB:1ORF}.
FT HELIX 176 179 {ECO:0000244|PDB:1ORF}.
FT TURN 182 189 {ECO:0000244|PDB:1ORF}.
FT STRAND 195 199 {ECO:0000244|PDB:1ORF}.
FT STRAND 215 218 {ECO:0000244|PDB:1ORF}.
FT STRAND 221 228 {ECO:0000244|PDB:1ORF}.
FT STRAND 241 245 {ECO:0000244|PDB:1ORF}.
FT HELIX 248 258 {ECO:0000244|PDB:1ORF}.
SQ SEQUENCE 262 AA; 28999 MW; FD773628BA6F301B CRC64;
MRNSYRFLAS SLSVVVSLLL IPEDVCEKII GGNEVTPHSR PYMVLLSLDR KTICAGALIA
KDWVLTAAHC NLNKRSQVIL GAHSITREEP TKQIMLVKKE FPYPCYDPAT REGDLKLLQL
MEKAKINKYV TILHLPKKGD DVKPGTMCQV AGWGRTHNSA SWSDTLREVN ITIIDRKVCN
DRNHYNFNPV IGMNMVCAGS LRGGRDSCNG DSGSPLLCEG VFRGVTSFGL ENKCGDPRGP
GVYILLSKKH LNWIIMTIKG AV
//
AA (Amino Acid) codes