General File utilities¶
For some repetitive file actions, we have included wrappers into the framework to ease the process and avoid code duplication.
The valuea_framework.file contains these wrappers.
valuea_framework.file.excel.reader¶
The excel reader utilizes both xlrd and openpyxl to support old (Excel 97) and new (Excel 2010) format files (xls, xlsx).
In input/sample.xlsx we have created a test excel file containing
two sheets, using the code below we are going to output all rows in sheet1.
1import valuea_framework.file.excel
2
3
4for sheetname, record in valuea_framework.file.excel.reader(
5 filename='input/sample.xlsx',
6 header_column=0,
7 yield_sheetname=True,
8 heading_translate=lambda x:x.upper(),
9 file_type='xlsx',
10 sheetname=['sheet1']):
11 print (sheetname, record)
12
The reader() itself returns a regular iterator, which accepts the following parameters:
filename: the filename or file handle to useheader_column: which row to use to retrieve headings (default 0)yield_sheetname: return sheetname as well as record (default True)heading_translate:function to apply when generating fieldnames (default None)file_type: file type, parser to use (default ‘xlsx’)sheetname: sheet name or names to extract, when None iterate all sheets (default None)
This example should output something similar to:
('Sheet1', OrderedDict([(u'HEADER1', u'A'), (u'HEADER2', 5L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'B'), (u'HEADER2', 4L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'C'), (u'HEADER2', 3L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'D'), (u'HEADER2', 2L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'E'), (u'HEADER2', 1L)]))
valuea_framework.file.flatfile.CSVReader¶
The file input/sample.csv is a simple comma separated file for
demonstration purposes.
To read this file, use the following code
1import valuea_framework.file.flatfile
2
3for record in valuea_framework.file.flatfile.CSVReader(filename='input/sample.csv').\
4 reader(heading=True, heading_translate=lambda x: x.strip().upper()):
5 print (record)
6
This creates a reader object CSVReader() which accepts a filename or handle as constructor, optionally an
encoding (encoding) can be supplied. Next the reader() is used to parse the actual results.
Parameters for the reader are the following:
filename: the filename or file handle to useheading: does the file contain a heading, when there is onedictitems will be returned, otherwiselistitemsheading_translate:function to apply when generating fieldnames (default None)limit: limit the number of result rows (default None)skip_lines: skip number of top lines before processing (default None)heading_cols: static list of fieldnames, in case the heading doesn’t supply this (default None)
The example should output a similar result to this:
{'HEADING2': '1', 'HEADING1': 'a'}
{'HEADING2': '2', 'HEADING1': 'b'}
{'HEADING2': '3', 'HEADING1': 'c'}
{'HEADING2': '4', 'HEADING1': 'd'}
valuea_framework.file.flatfile.XMLReader¶
The XMLReader class adds a simple wrapper to read an xml file and yields results in an flat structure, our example below
creates a reader which parses input/sample.xml and replaces
the tags in sample.tagB.tagB1.records for easier access.
Since all nodes within the xml structure are yielded, we need to filter the ones we require explicitly, hence the
if record['__path__'].startswith('sample.tagB.tagB1.records').
1import valuea_framework.file.flatfile
2
3for record in valuea_framework.file.flatfile.XMLReader(
4 filename='input/sample.xml',
5 root_tagname='sample.tagB.tagB1.records'):
6 if record['__path__'].startswith('sample.tagB.tagB1.records'):
7 print (record)
Below the result output, notice the internal keys __path__ and __path_no_ns__ which shows the internal path,
the record.fieldA tags contain the data we searched.
{'__path__': 'sample.tagB.tagB1.records.record', '__path_no_ns__': 'sample.tagB.tagB1.records.record', 'record.fieldA': 'A'}
{'__path__': 'sample.tagB.tagB1.records.record', '__path_no_ns__': 'sample.tagB.tagB1.records.record', 'record.fieldA': 'B'}
{'__path__': 'sample.tagB.tagB1.records.record', '__path_no_ns__': 'sample.tagB.tagB1.records.record', 'record.fieldA': 'C'}