General File utilities

For some repetitive file actions, we have included wrappers into the framework to ease the process and avoid code duplication.

The valuea_framework.file contains these wrappers.

valuea_framework.file.excel.reader

The excel reader utilizes both xlrd and openpyxl to support old (Excel 97) and new (Excel 2010) format files (xls, xlsx).

In input/sample.xlsx we have created a test excel file containing two sheets, using the code below we are going to output all rows in sheet1.

lib_xlsx_reader.py
 1import valuea_framework.file.excel
 2
 3
 4for sheetname, record in valuea_framework.file.excel.reader(
 5        filename='input/sample.xlsx',
 6        header_column=0,
 7        yield_sheetname=True,
 8        heading_translate=lambda x:x.upper(),
 9        file_type='xlsx',
10        sheetname=['sheet1']):
11    print (sheetname, record)
12

The reader() itself returns a regular iterator, which accepts the following parameters:

  • filename : the filename or file handle to use

  • header_column: which row to use to retrieve headings (default 0)

  • yield_sheetname: return sheetname as well as record (default True)

  • heading_translate:function to apply when generating fieldnames (default None)

  • file_type: file type, parser to use (default ‘xlsx’)

  • sheetname: sheet name or names to extract, when None iterate all sheets (default None)

This example should output something similar to:

('Sheet1', OrderedDict([(u'HEADER1', u'A'), (u'HEADER2', 5L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'B'), (u'HEADER2', 4L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'C'), (u'HEADER2', 3L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'D'), (u'HEADER2', 2L)]))
('Sheet1', OrderedDict([(u'HEADER1', u'E'), (u'HEADER2', 1L)]))

valuea_framework.file.flatfile.CSVReader

The file input/sample.csv is a simple comma separated file for demonstration purposes.

To read this file, use the following code

lib_csv_reader.py
1import valuea_framework.file.flatfile
2
3for record in valuea_framework.file.flatfile.CSVReader(filename='input/sample.csv').\
4        reader(heading=True, heading_translate=lambda x: x.strip().upper()):
5    print (record)
6

This creates a reader object CSVReader() which accepts a filename or handle as constructor, optionally an encoding (encoding) can be supplied. Next the reader() is used to parse the actual results.

Parameters for the reader are the following:

  • filename : the filename or file handle to use

  • heading: does the file contain a heading, when there is one dict items will be returned, otherwise list items

  • heading_translate:function to apply when generating fieldnames (default None)

  • limit: limit the number of result rows (default None)

  • skip_lines: skip number of top lines before processing (default None)

  • heading_cols: static list of fieldnames, in case the heading doesn’t supply this (default None)

The example should output a similar result to this:

{'HEADING2': '1', 'HEADING1': 'a'}
{'HEADING2': '2', 'HEADING1': 'b'}
{'HEADING2': '3', 'HEADING1': 'c'}
{'HEADING2': '4', 'HEADING1': 'd'}

valuea_framework.file.flatfile.XMLReader

The XMLReader class adds a simple wrapper to read an xml file and yields results in an flat structure, our example below creates a reader which parses input/sample.xml and replaces the tags in sample.tagB.tagB1.records for easier access. Since all nodes within the xml structure are yielded, we need to filter the ones we require explicitly, hence the if record['__path__'].startswith('sample.tagB.tagB1.records').

lib_xml_reader.py
1import valuea_framework.file.flatfile
2
3for record in valuea_framework.file.flatfile.XMLReader(
4        filename='input/sample.xml',
5        root_tagname='sample.tagB.tagB1.records'):
6    if record['__path__'].startswith('sample.tagB.tagB1.records'):
7        print (record)

Below the result output, notice the internal keys __path__ and __path_no_ns__ which shows the internal path, the record.fieldA tags contain the data we searched.

{'__path__': 'sample.tagB.tagB1.records.record', '__path_no_ns__': 'sample.tagB.tagB1.records.record', 'record.fieldA': 'A'}
{'__path__': 'sample.tagB.tagB1.records.record', '__path_no_ns__': 'sample.tagB.tagB1.records.record', 'record.fieldA': 'B'}
{'__path__': 'sample.tagB.tagB1.records.record', '__path_no_ns__': 'sample.tagB.tagB1.records.record', 'record.fieldA': 'C'}