Python酷库之旅-比翼双飞情侣库(04)_开发测试

Python酷库之旅-比翼双飞情侣库(04)

创始人

2024-11-06 13:38:10

0次

一、xlrd库的由来

二、xlrd库优缺点

1、优点

1-1、支持多种Excel文件格式

1-2、高效性

1-3、开源性

1-4、简单易用

1-5、良好的兼容性

2、缺点

2-1、对.xlsx格式支持有限

2-2、功能相对单一

2-3、更新和维护频率低

2-4、依赖外部资源

三、xlrd库的版本说明

1、xlrd 1.2.0版本

2、xlrd 2.0.1版本

3、xlrd3(非官方名称)

四、如何学好xlrd库？

1、获取xlrd库的属性和方法

2、获取xlrd库的帮助信息

3、用法精讲

3-10、xlrd.biffh.unpack_unicode函数

3-10-1、语法

3-10-2、参数

3-10-3、功能

3-10-4、返回值

3-10-5、说明

3-10-6、用法

3-11、xlrd.biffh.unpack_unicode_update_pos函数

3-11-1、语法

3-11-2、参数

3-11-3、功能

3-11-4、返回值

3-11-5、说明

3-11-6、用法

3-12、xlrd.biff_count_records函数

3-12-1、语法

3-12-2、参数

3-12-3、功能

3-12-4、返回值

3-12-5、说明

3-12-6、用法

五、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

在Excel中，通常所说的“情侣键”并非官方术语，而是对某些常用且经常成对出现的快捷键的一种形象化的称呼。其中，最为人熟知和广泛使用的“情侣键”是“Ctrl+C”和“Ctrl+V”。

1、Ctrl+C：这个快捷键的作用是“拷贝”或“复制”。当你在Excel中选中某个单元格、一行、一列或整个工作表的内容后，按下Ctrl+C键，这些内容就会被复制到计算机的剪贴板中，等待下一步的粘贴操作。
2、Ctrl+V：这个快捷键的作用是“粘贴”。在你按下Ctrl+C键将内容复制到剪贴板后，可以通过按下Ctrl+V键将这些内容粘贴到Excel中的另一个位置，这两个操作经常是连续进行的，因此Ctrl+C和Ctrl+V就像一对“情侣”，总是成对出现。

除了这对常见的“情侣键”外，Excel中还有许多其他的快捷键可以帮助用户更高效地完成各种操作。然而，这些快捷键通常并没有像Ctrl+C和Ctrl+V那样形成特定的“情侣”关系。

然而，今天我不再展开介绍“情侣键”，而是要重点推介Python中的“情侣库”，即xlrd和xlwt两个第三方库。

一、xlrd库的由来

xlrd库是一种用于在Python中读取Excel文件的库，它的名称中的"xl"代表Excel，"rd"代表读取，其开发者是John Machin(注：库名字符拆分诠释，只是一种猜测)。

xlrd最初是在2005年开始开发的，是基于Python的开源项目(下载：xlrd库官网下载)。

由于Excel文件在数据处理和分析中的重要性，xlrd库填补了Python在处理Excel文件方面的空白，使得用户可以方便地在Python环境中读取Excel文件的内容，并进行进一步的数据操作和分析。

二、xlrd库优缺点

1、优点

1-1、支持多种Excel文件格式

xlrd库支持多种Excel文件格式，包括`.xls`和`.xlsx`(在旧版本中)，这使得无论数据存储在哪种格式的Excel文件中，用户都可以使用xlrd库来读取。

1-2、高效性

xlrd库使用C语言编写，因此其性能非常高，即使面对非常大的Excel文件，xlrd也可以快速地读取其中的数据。

1-3、开源性

xlrd是完全开源的，可以在GitHub等平台上找到其源代码，这使得任何人都可以根据自己的需求对其进行修改和扩展。

1-4、简单易用

xlrd提供了简单直接的API来获取单元格数据、行列数等，使得从Excel文件中读取数据变得简单而高效。

1-5、良好的兼容性

xlrd库适配多种Python版本，包括Python 2.7(不包括3.0-3.3)或Python 3.4及以上版本，这为用户提供了广泛的兼容性选择。

2、缺点

2-1、对.xlsx格式支持有限

在xlrd 1.2.0之后的版本中(大约从2020年开始)，xlrd库不再支持`.xlsx`文件格式，这限制了xlrd在新版Excel文件(主要是`.xlsx`格式)上的应用。

2-2、功能相对单一

xlrd库主要专注于从Excel文件中读取数据，而不提供写入或修改Excel文件的功能，这使得在处理需要写入或修改Excel文件的任务时，用户需要结合其他库(如`openpyxl`或`xlwt`)使用。

2-3、更新和维护频率低

由于xlrd库主要关注于读取Excel文件的功能，并且随着`.xlsx`格式的普及，其使用范围逐渐缩小，因此，xlrd库的更新和维护频率可能相对较低。

2-4、依赖外部资源

在某些情况下，xlrd库可能需要依赖外部资源或库来完全发挥其功能，这可能会增加用户在使用xlrd库时的复杂性和不确定性。

总之，xlrd库在读取Excel文件方面具有高效、开源和简单易用等优点，但在对`.xlsx`格式的支持、功能单一以及更新和维护频率等方面存在一些缺点，用户在选择使用xlrd库时需要根据自己的需求进行权衡和选择。

三、xlrd库的版本说明

xlrd库适配的Python版本根据库的不同版本而有所不同。以下是针对几个主要版本的说明：

1、xlrd 1.2.0版本

1-1、适配Python>=2.7(不包括3.0-3.3)或Python>=3.4。
1-2、该版本支持xlsx文件格式，并且是一个广泛使用的版本，因为它能够处理小到中等大小的Excel文件，并且具有较好的性能表现。

2、xlrd 2.0.1版本

2-1、适配Python>=2.7(不包括3.0-3.5)或Python>=3.6。
2-2、该版本不再支持xlsx文件格式，仅支持旧版的xls文件格式，因为在xlrd 2.0版本之后，xlrd移除了对xlsx格式的支持。

3、xlrd3(非官方名称)

xlrd3是xlrd的开源扩展库，提供了对xlsx文件格式的支持，然而，请注意，xlrd3并不是xlrd的官方名称(下载：GitHub - Dragon2fly/xlrd3)。

四、如何学好xlrd库？

1、获取xlrd库的属性和方法

用print()和dir()两个函数获取xlrd库所有属性和方法的列表

# ['Book', 'FILE_FORMAT_DESCRIPTIONS', 'FMLA_TYPE_ARRAY', 'FMLA_TYPE_CELL', 'FMLA_TYPE_COND_FMT', 'FMLA_TYPE_DATA_VAL', # 'FMLA_TYPE_NAME', 'FMLA_TYPE_SHARED', 'Operand', 'PEEK_SIZE', 'Ref3D', 'XLDateError', 'XLRDError', 'XLS_SIGNATURE', # 'XL_CELL_BLANK', 'XL_CELL_BOOLEAN', 'XL_CELL_DATE', 'XL_CELL_EMPTY', 'XL_CELL_ERROR', 'XL_CELL_NUMBER', 'XL_CELL_TEXT', 'ZIP_SIGNATURE',  # '__VERSION__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__',  # '__spec__', '__version__',  # 'biff_text_from_num', 'biffh', 'book', 'cellname', 'cellnameabs', 'colname', 'compdoc', 'count_records', 'decompile_formula',  # 'dump', 'dump_formula', 'empty_cell', 'error_text_from_code', 'evaluate_name_formula', 'formatting', 'formula', 'info',  # 'inspect_format', 'oBOOL', 'oERR', 'oNUM', 'oREF', 'oREL', 'oSTRG', 'oUNK', 'okind_dict', 'open_workbook', 'open_workbook_xls',  # 'os', 'pprint', 'rangename3d', 'rangename3drel', 'sheet', 'sys', 'timemachine', 'xldate', 'xldate_as_datetime', 'xldate_as_tuple', 'zipfile']

2、获取xlrd库的帮助信息

用help()函数获取xlrd库的帮助信息

Help on package xlrd:  NAME     xlrd  DESCRIPTION     # Copyright (c) 2005-2012 Stephen John Machin, Lingfo Pty Ltd     # This module is part of the xlrd package, which is released under a     # BSD-style licence.  PACKAGE CONTENTS     biffh     book     compdoc     formatting     formula     info     sheet     timemachine     xldate  FUNCTIONS     count_records(filename, outfile=<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>)         For debugging and analysis: summarise the file's BIFF records.         ie: produce a sorted file of ``(record_name, count)``.                  :param filename: The path to the file to be summarised.         :param outfile: An open file, to which the summary is written.          dump(filename, outfile=<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>, unnumbered=False)         For debugging: dump an XLS file's BIFF records in char & hex.                  :param filename: The path to the file to be dumped.         :param outfile: An open file, to which the dump is written.         :param unnumbered: If true, omit offsets (for meaningful diffs).          inspect_format(path=None, content=None)         Inspect the content at the supplied path or the :class:`bytes` content provided         and return the file's type as a :class:`str`, or ``None`` if it cannot         be determined.                  :param path:           A :class:`string ` path containing the content to inspect.           ``~`` will be expanded.                  :param content:           The :class:`bytes` content to inspect.                  :returns:            A :class:`str`, or ``None`` if the format cannot be determined.            The return value can always be looked up in :data:`FILE_FORMAT_DESCRIPTIONS`            to return a human-readable description of the format found.          open_workbook(filename=None, logfile=<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>, verbosity=0, use_mmap=True, file_contents=None, encoding_override=None, formatting_info=False, on_demand=False, ragged_rows=False, ignore_workbook_corruption=False)         Open a spreadsheet file for data extraction.                  :param filename: The path to the spreadsheet file to be opened.                  :param logfile: An open file to which messages and diagnostics are written.                  :param verbosity: Increases the volume of trace material written to the                           logfile.                  :param use_mmap:                    Whether to use the mmap module is determined heuristically.           Use this arg to override the result.                    Current heuristic: mmap is used if it exists.                  :param file_contents:                    A string or an :class:`mmap.mmap` object or some other behave-alike           object. If ``file_contents`` is supplied, ``filename`` will not be used,           except (possibly) in messages.                  :param encoding_override:                    Used to overcome missing or bad codepage information           in older-version files. See :doc:`unicode`.                  :param formatting_info:                    The default is ``False``, which saves memory.           In this case, "Blank" cells, which are those with their own formatting           information but no data, are treated as empty by ignoring the file's           ``BLANK`` and ``MULBLANK`` records.           This cuts off any bottom or right "margin" of rows of empty or blank           cells.           Only :meth:`~xlrd.sheet.Sheet.cell_value` and           :meth:`~xlrd.sheet.Sheet.cell_type` are available.                    When ``True``, formatting information will be read from the spreadsheet           file. This provides all cells, including empty and blank cells.           Formatting information is available for each cell.                    Note that this will raise a NotImplementedError when used with an           xlsx file.                  :param on_demand:                    Governs whether sheets are all loaded initially or when demanded           by the caller. See :doc:`on_demand`.                  :param ragged_rows:                    The default of ``False`` means all rows are padded out with empty cells so           that all rows have the same size as found in           :attr:`~xlrd.sheet.Sheet.ncols`.                    ``True`` means that there are no empty cells at the ends of rows.           This can result in substantial memory savings if rows are of widely           varying sizes. See also the :meth:`~xlrd.sheet.Sheet.row_len` method.                           :param ignore_workbook_corruption:                    This option allows to read corrupted workbooks.           When ``False`` you may face CompDocError: Workbook corruption.           When ``True`` that exception will be ignored.                  :returns: An instance of the :class:`~xlrd.book.Book` class.  DATA     FILE_FORMAT_DESCRIPTIONS = {'xls': 'Excel xls', 'xlsb': 'Excel 2007 xl...     FMLA_TYPE_ARRAY = 4     FMLA_TYPE_CELL = 1     FMLA_TYPE_COND_FMT = 8     FMLA_TYPE_DATA_VAL = 16     FMLA_TYPE_NAME = 32     FMLA_TYPE_SHARED = 2     PEEK_SIZE = 8     XLS_SIGNATURE = b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'     XL_CELL_BLANK = 6     XL_CELL_BOOLEAN = 4     XL_CELL_DATE = 3     XL_CELL_EMPTY = 0     XL_CELL_ERROR = 5     XL_CELL_NUMBER = 2     XL_CELL_TEXT = 1     ZIP_SIGNATURE = b'PK\x03\x04'     __VERSION__ = '2.0.1'     biff_text_from_num = {0: '(not BIFF)', 20: '2.0', 21: '2.1', 30: '3', ...     empty_cell = empty:''     error_text_from_code = {0: '#NULL!', 7: '#DIV/0!', 15: '#VALUE!', 23: ...     oBOOL = 3     oERR = 4     oNUM = 2     oREF = -1     oREL = -2     oSTRG = 1     oUNK = 0     okind_dict = {-2: 'oREL', -1: 'oREF', 0: 'oUNK', 1: 'oSTRG', 2: 'oNUM'...  VERSION     2.0.1  FILE     e:\python_workspace\pythonproject\lib\site-packages\xlrd\__init__.py

3、用法精讲

3-10、xlrd.biffh.unpack_unicode函数

3-10-1、语法

xlrd.biffh.unpack_unicode(data, pos, lenlen=2)

3-10-2、参数

3-10-2-1、data(必须)：一个字节串(bytes)，代表Excel文件的BIFF数据。

3-10-2-2、pos(必须)：一个整数，指示从data的哪个位置开始读取Unicode字符串。

3-10-2-3、lenlen(可选)：一个可选参数，指定长度前缀的字节数。默认值为 2，这意味着Unicode字符串的长度是以两个字节的整数形式存储的，但在某些情况下，它可能是一个字节或其他值。

3-10-3、功能

用于从BIFF数据中解码Unicode字符串。不过，这个函数通常不是直接供外部调用的，而是由xlrd库内部的其他部分在读取Excel文件时使用。

3-10-4、返回值

返回解码后的Unicode字符串。

3-10-5、说明

函数的工作方式大致如下：

3-10-5-1、从data的pos位置开始，读取lenlen字节作为Unicode字符串的长度。
3-10-5-2、使用这个长度值，从data中读取相应数量的字节，并将这些字节解码为Unicode字符串。
3-10-5-3、返回解码后的Unicode字符串。

3-10-6、用法

# 10、xlrd.biffh.unpack_unicode函数 def unpack_unicode(data, pos, lenlen=2):     # 检查数据长度是否足够读取长度前缀     if len(data) < pos + lenlen:         raise ValueError("Data is too short to read length prefix.")     # 读取长度前缀(假设是小端序)     length = int.from_bytes(data[pos:pos + lenlen], byteorder='little')     # 检查剩余数据长度是否足够读取字符串     if len(data) < pos + lenlen + length * 2:         raise ValueError("Data is too short to read the entire string.")     # 读取实际的Unicode字符串(UTF-16LE编码)     string_data = data[pos + lenlen:pos + lenlen + length * 2]     # 解码UTF-16LE编码的字符串     return string_data.decode('utf-16le') if __name__ == '__main__':     # 这是一个模拟的BIFF数据     # 它包含一个2字节的长度前缀(0x000A，即10)和一个10字节的UTF-16LE编码的字符串"Hello, world!"     data = b'\x00\x0A\x00H\x00e\x00l\x00l\x00o\x00,\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00'     # 假设我们知道字符串从第2个字节开始(跳过了长度前缀)     pos = 2     # 调用模拟的unpack_unicode函数     try:         result = unpack_unicode(data, pos)         print(result)  # 输出: Hello, world!     except ValueError as e:         print(f"An error occurred: {e}")

3-11、xlrd.biffh.unpack_unicode_update_pos函数

3-11-1、语法

xlrd.biffh.unpack_unicode_update_pos(data, pos, lenlen=2, known_len=None)

3-11-2、参数

3-11-2-1、data(必须)：一个字符串，包含要解析的Unicode字符串的二进制数据。

3-11-2-2、pos(可选)：一个整数，在data字节串中的当前位置索引。

3-11-2-3、lenlen(可选)：一个整数，指定长度前缀的字节数，默认值为2。

3-11-2-4、known_len(可选)：一个整数或None，默认值为None，如果提供了known_len，则该函数将使用此值作为Unicode字符串的长度，而不是从data中读取长度前缀。这在某些情况下可能是有用的，比如当您已经通过其他方式知道了字符串的长度时。如果known_len为None，则函数将从data中读取长度前缀以确定字符串的长度。

3-11-3、功能

用于从Excel文件的BIFF(Binary Interchange File Format)结构中解析Unicode字符串，并且更新读取位置。

3-11-4、返回值

3-11-4-1、解析后的Unicode字符串(str)。

3-11-4-2、更新后的pos值(int)，即字符串之后的下一个字节在data中的索引。

3-11-5、说明

无

3-11-6、用法

# 11、xlrd.biffh.unpack_unicode_update_pos函数 def unpack_unicode_update_pos(data, pos, lenlen=2, known_len=None):     # 初始化字符串长度     if known_len is not None:         length = known_len     else:         # 检查数据长度是否足够读取长度前缀         if len(data) < pos + lenlen:             raise ValueError("Data is too short to read length prefix.")         # 读取长度前缀(假设是小端序)         length = int.from_bytes(data[pos:pos + lenlen], byteorder='little')     # 检查剩余数据长度是否足够读取字符串     if len(data) < pos + lenlen + length * 2:         raise ValueError("Data is too short to read the entire string.")     # 更新 pos 为字符串后面的位置     new_pos = pos + lenlen + length * 2     # 读取实际的Unicode字符串(UTF-16LE编码)     string_data = data[pos + lenlen:new_pos]     # 解码UTF-16LE编码的字符串     string_value = string_data.decode('utf-16le')     # 返回解码后的字符串和新的 pos 值     return string_value, new_pos if __name__ == '__main__':     # 这是一个模拟的BIFF数据     data = b'\x00\x0A\x00H\x00e\x00l\x00l\x00o\x00,\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00'     # 初始位置     pos = 2     # 调用模拟的unpack_unicode_update_pos函数     try:         string_value, new_pos = unpack_unicode_update_pos(data, pos)         print(string_value)  # 输出: Hello, world!         print(f"Updated position: {new_pos}")  # 输出新的位置     except ValueError as e:         print(f"An error occurred: {e}")

3-12、xlrd.biff_count_records函数

3-12-1、语法

xlrd.biff_count_records(mem, stream_offset, stream_len, fout=<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>)

3-12-2、参数

3-12-2-1、mem(必须)：指向内存中的一个缓冲区(比如一个字节数组)，其中包含了Excel文件的二进制内容。

3-12-2-2、stream_offset(必须)：指定从mem缓冲区中的哪个位置开始读取BIFF记录。

3-12-2-3、stream_len(必须)：指定从stream_offset开始，需要读取的BIFF数据流的长度。

3-12-2-4、fout(可选)：默认值为<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>，一个文件对象或类似的输出流，用于记录或打印函数的输出信息。

3-12-3、功能

用于遍历给定的BIFF数据流(从mem缓冲区的stream_offset位置开始，长度为stream_len)，并统计其中的BIFF记录数量或类型。

3-12-4、返回值

返回一个整数，表示在给定数据流中找到的BIFF记录的数量。

3-12-5、说明

无

3-12-6、用法

# 12、xlrd.biff_count_records函数 import sys def biff_count_records(mem, stream_offset, stream_len, fout=None):     """     模拟从内存中读取数据并“统计”BIFF记录数量的函数。     注意：这个函数不处理真实的BIFF格式，只是模拟。     :param mem: 字节数据，包含BIFF格式的数据     :param stream_offset: 从mem中开始读取的偏移量     :param stream_len: 从stream_offset开始读取的长度     :param fout: 输出流，默认为None（输出到标准输出）     :return: 假设的“记录”数量     """     # 检查输出流     if fout is None:         fout = sys.stdout  # 在Windows中，使用sys.stdout来重定向到标准输出     # 切片数据以模拟从特定偏移量和长度读取     data = mem[stream_offset:stream_offset + stream_len]     # 假设每个“记录”以特定的字节序列开始（例如：'\x00\x0B'），这里用简单的'\n'代替     record_start_bytes = b'\n'  # 只是一个示例，不是真实的BIFF记录标识     count = 0     # 遍历数据并“统计”记录     pos = 0     while pos < len(data):         if data.startswith(record_start_bytes, pos):             count += 1             # 假设记录长度固定，这里简化为换行符的长度             pos += len(record_start_bytes)         else:             # 如果不是记录开始，则向前移动一个字节             pos += 1     # 输出到文件或标准输出（如果提供了fout）     fout.write(f"Number of BIFF-like records: {count}\n")     # 注意：不需要手动关闭sys.stdout     return count if __name__ == '__main__':     fake_biff_data = b'This is not real BIFF data\nbut we pretend it is\nwith newline characters\nas record separators\n'     stream_offset = 0     stream_len = len(fake_biff_data)     # 调用函数并打印结果（将输出重定向到标准输出）     record_count = biff_count_records(fake_biff_data, stream_offset, stream_len)     # 这里不需要再次打印，因为结果已经通过fout输出到标准输出了     # 如果需要再次打印到控制台，可以取消下一行的注释     print(f"Number of BIFF-like records: {record_count}")

五、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

上一篇：德州wpk辅助！（德州wpk）软件透明挂（到底有辅助器）确实是真的有挂的（有辅助插件）有挂教程（哔哩哔哩）

下一篇：问题解决：Fatal Python error: initfsencoding: unable to load the file system codec