I have a java program that has to parse a python setup.py file to extract info from it. I sorta have something working, but I hit a wall. I am starting with a simple raw file first, once i get that running, then i will worry about stripping out the noise that i don't want to make it reflect an actual file.
So here's my grammer
grammar SetupPy ;
file_input: (NEWLINE | setupDeclaration)* EOF;
setupDeclaration : 'setup' '(' method ')';
method : setupRequires testRequires;
setupRequires : 'setup_requires' '=' '[' LISTVAL* ']' COMMA;
testRequires : 'tests_require' '=' '[' LISTVAL* ']' COMMA;
WS: [ \t\n\r]+ -> skip ;
COMMA : ',' -> skip ;
LISTVAL : SHORT_STRING ;
UNKNOWN_CHAR
: .
;
fragment SHORT_STRING
: '\'' ( STRING_ESCAPE_SEQ | ~[\\\r\n\f'] )* '\''
| '"' ( STRING_ESCAPE_SEQ | ~[\\\r\n\f"] )* '"'
;
/// stringescapeseq ::= "\" <any source character>
fragment STRING_ESCAPE_SEQ
: '\\' .
| '\\' NEWLINE
;
fragment SPACES
: [ \t]+
;
NEWLINE
: ( {atStartOfInput()}? SPACES
| ( '\r'? '\n' | '\r' | '\f' ) SPACES?
)
{
String newLine = getText().replaceAll("[^\r\n\f]+", "");
String spaces = getText().replaceAll("[\r\n\f]+", "");
int next = _input.LA(1);
if (opened > 0 || next == '\r' || next == '\n' || next == '\f' || next == '#') {
// If we're inside a list or on a blank line, ignore all indents,
// dedents and line breaks.
skip();
}
else {
emit(commonToken(NEWLINE, newLine));
int indent = getIndentationCount(spaces);
int previous = indents.isEmpty() ? 0 : indents.peek();
if (indent == previous) {
// skip indents of the same size as the present indent-size
skip();
}
else if (indent > previous) {
indents.push(indent);
emit(commonToken(Python3Parser.INDENT, spaces));
}
else {
// Possibly emit more than 1 DEDENT token.
while(!indents.isEmpty() && indents.peek() > indent) {
this.emit(createDedent());
indents.pop();
}
}
}
}
;
and my current test file (like i said, stripping the noise from a normal file is next step)
setup(
setup_requires=['pytest-runner'],
tests_require=['pytest', 'unittest2'],
)
Where i am stuck is how to tell antlr that setup_requires and tests_requires contain arrays. I want the values of those arrays, no matter if someone used single quotes, double quotes, each value on a different line, and combinations of all the above. I don't have a clue how to pull that off. Can i get some help please? maybe an example or two?
Things to note,
- no i can't use jython and just run the file.
- Regex isn't an option due to the huge variations in developer styles for this file
And of course after this issue, I still need to figure out how to strip the noise from a normal file. I tried using the Python3 grammar to do this, but me being a novice at antlr, it blew me away. i couldn't figure out how to write the rules to pull the values, so I decided to try a far simpler grammar. And quickly hit another wall.
edit here is an actual setup.py file that it will eventually have to parse. keeping in mind the setup_requires and test_requires may or may not be there and may or may not be in that order.
# -*- coding: utf-8 -*-
from __future__ import with_statement
from setuptools import setup
def get_version(fname='mccabe.py'):
with open(fname) as f:
for line in f:
if line.startswith('__version__'):
return eval(line.split('=')[-1])
def get_long_description():
descr = []
for fname in ('README.rst',):
with open(fname) as f:
descr.append(f.read())
return '\n\n'.join(descr)
setup(
name='mccabe',
version=get_version(),
description="McCabe checker, plugin for flake8",
long_description=get_long_description(),
keywords='flake8 mccabe',
author='Tarek Ziade',
author_email='tarek@ziade.org',
maintainer='Ian Cordasco',
maintainer_email='graffatcolmingov@gmail.com',
url='https://github.com/pycqa/mccabe',
license='Expat license',
py_modules=['mccabe'],
zip_safe=False,
setup_requires=['pytest-runner'],
tests_require=['pytest'],
entry_points={
'flake8.extension': [
'C90 = mccabe:McCabeChecker',
],
},
classifiers=[
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'Intended Audience :: Developers',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Topic :: Software Development :: Libraries :: Python Modules',
'Topic :: Software Development :: Quality Assurance',
],
)
Trying to debug and simplify and realized i don't need to find the method, just the values. so I'm playing with this grammer
grammar SetupPy ;
file_input: (ignore setupRequires ignore | ignore testRequires ignore )* EOF;
setupRequires : 'setup_requires' '=' '[' dependencyValue* (',' dependencyValue)* ']';
testRequires : 'tests_require' '=' '[' dependencyValue* (',' dependencyValue)* ']';
dependencyValue: LISTVAL;
ignore : UNKNOWN_CHAR? ;
LISTVAL: SHORT_STRING;
UNKNOWN_CHAR: . -> channel(HIDDEN);
fragment SHORT_STRING: '\'' ( STRING_ESCAPE_SEQ | ~[\\\r\n\f'] )* '\''
| '"' ( STRING_ESCAPE_SEQ | ~[\\\r\n\f"] )* '"';
fragment STRING_ESCAPE_SEQ
: '\\' .
| '\\'
;
Works great for the simple one, even handles the out of order issue. but doesnt' work on the full file, it gets hung up on the
def get_version(fname='mccabe.py'):
equals sign in that line.