Using python's difflib to find text file difference and capturing output in excel

Vyom

The Power of x480
Staff member
Admin
Attached with this post are three files:
1. diff_tool.py: Python file, which uses difflib to find difference between two text files.
2. my_shopping_list.txt: Text file 1
3. friends_shopping_list.txt: Text file 2
4. diff.html: HTML file, which is the output that lists the differences.

If all files are kept in same folder, here's the command that executes the script:
Bash:
python diff_tool.py my_shopping_list.txt friends_shopping_list.txt --html diff.html

Now my requirement is that I want the output to be in excel. When I open the HTML file in excel, it translates to table very beautifully, but all highlights are gone.
So I want to find a way to retain that formatting. If I can convert the HTML file into Excel file right from within python, that would be the dream!
 

Attachments

  • my_shopping_list.txt
    16 bytes · Views: 123
  • friends_shopping_list.txt
    23 bytes · Views: 112

patkim

Cyborg Agent
I could not find .py file in the list of attachments, but I downloaded one sample script that uses difflib from GitHub (hopefully should be similar) and executed the script with the specified command. This is what I get.

python-output.png




I am neither Excel expert nor I know anything in Python but I can give you some idea why formatting is possibly lost.
Excel does not have any native capability to show background highlight for single or part of text selectively in a cell. It can do it for the entire cell as a whole.
It can do it selectively for the text forecolor but not the background color. Hence from the html only that formatting is lost which Excel does not support. I guess that’s the way it is.

There might be any third-party tools or add-ons that might be adding such a capability to Excel but I am not aware of any if there are.
See below snapshot with sample data both html and Excel view. Only foreground formatting is retained in Excel when it’s imported into it.

excel-color.png


MS Word supports such formatting though.
 
Last edited:
OP
Vyom

Vyom

The Power of x480
Staff member
Admin
Not sure why the post removed the .py and .html file. Trying to attach again.
Edit: Alright, so it seems both files are not supported. Attaching zip of all 4 files.
 

Attachments

  • Test_Files.zip
    1.7 KB · Views: 114
OP
Vyom

Vyom

The Power of x480
Staff member
Admin
I am neither Excel expert nor I know anything in Python but I can give you some idea why formatting is possibly lost.
Excel does not have any native capability to show background highlight for single or part of text selectively in a cell. It can do it for the entire cell as a whole.
It can do it selectively for the text forecolor but not the background color. Hence from the html only that formatting is lost which Excel does not support. I guess that’s the way it is.
Yes, seems you have hit the nail. Looks like color of select text inside an Excel cell isn't supported. So I will have to forego this functionality and think of something else.
Thanks for looking into it.

Edit: Although, the font color can be modified of specific words inside a cell. So maybe the highlights can be coverted to font color!
 

khalil1210

In the zone
I have modified the code to generate excel directly. You can check it

Code:
diff_tool.py my_shopping_list.txt friends_shopping_list.txt diff

this will generate two files

you can comment out code which is not required

diff_cell.xlsx
diff_individual.xlsx
 

Attachments

  • Test_Files.zip
    2.8 KB · Views: 127
OP
Vyom

Vyom

The Power of x480
Staff member
Admin
Thanks for the attempt @khalil1210 , but I actually wanted the output provided in HTML to convert to Excel, as it is.
But I found a way to do that, and it's working for me to convert html to excel! it strips all formatting though, but I don't require formatting now.
I found the solution of converting html to excel here: Python | Convert an HTML table into excel - GeeksforGeeks
 
Top Bottom