xlsx
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
Requirements for Outputs
All Excel files
Professional Font
Zero Formula Errors
Preserve Existing Templates (when updating templates)
Financial models
Color Coding Standards
Unless otherwise stated by the user or existing template
Industry-Standard Color Conventions
Number Formatting Standards
Required Format Rules
Formula Construction Rules
Assumptions Placement
Formula Error Prevention
Documentation Requirements for Hardcodes
- "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
- "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
- "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
- "Source: FactSet, 8/20/2025, Consensus Estimates Screen"
XLSX creation, editing, and analysis
Overview
A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
Important Requirements
LibreOffice Required for Formula Recalculation: You can assume LibreOffice is installed for recalculating formula values using the scripts/recalc.py script. The script automatically configures LibreOffice on first run, including in sandboxed environments where Unix sockets are restricted (handled by scripts/office/soffice.py)
Reading and analyzing data
Data analysis with pandas
For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:
import pandas as pdRead Excel
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dictAnalyze
df.head() # Preview data
df.info() # Column info
df.describe() # StatisticsWrite Excel
df.to_excel('output.xlsx', index=False)Excel File Workflows
CRITICAL: Use Formulas, Not Hardcoded Values
Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.
❌ WRONG - Hardcoding Calculated Values
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5✅ CORRECT - Using Excel Formulas
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
Common Workflow
python scripts/recalc.py output.xlsx- The script returns JSON with error details
- If
status is errors_found, check error_summary for specific error types and locations- Fix the identified errors and recalculate again
- Common errors to fix:
-
#REF!: Invalid cell references-
#DIV/0!: Division by zero-
#VALUE!: Wrong data type in formula-
#NAME?: Unrecognized formula nameCreating new Excel files
# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignmentwb = Workbook()
sheet = wb.active
Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])Add formula
sheet['B2'] = '=SUM(A1:A10)'Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')Column width
sheet.column_dimensions['A'].width = 20wb.save('output.xlsx')
Editing existing Excel files
# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbookLoad existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheetWorking with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'wb.save('modified.xlsx')
Recalculating formulas
Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided scripts/recalc.py script to recalculate formulas:
python scripts/recalc.py <excel_file> [timeout_seconds]Example:
python scripts/recalc.py output.xlsx 30The script:
Formula Verification Checklist
Quick checks to ensure formulas work correctly:
Essential Verification
Common Pitfalls
pd.notna()/ in formulas (#DIV/0!)Formula Testing Strategy
Interpreting scripts/recalc.py Output
The script returns JSON with error details:
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}Best Practices
Library Selection
Working with openpyxl
data_only=True to read calculated values: load_workbook('file.xlsx', data_only=True)data_only=True and saved, formulas are replaced with values and permanently lostread_only=True for reading or write_only=True for writingWorking with pandas
pd.read_excel('file.xlsx', dtype={'id': str})pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])pd.read_excel('file.xlsx', parse_dates=['date_column'])Code Style Guidelines
IMPORTANT: When generating Python code for Excel operations:
For Excel files themselves: