Data analysts often turn to add-ins when they encounter complicated computations in Excel. Here we examine and compare some common add-ins in terms of deployment process, development efficiency, application fluidity and, particularly, computational capabilities. esProc shows more excellent performance than others. Looking Looking for the Best Excel Add-ins for details.
Excel provides convenient methods, including easy to use functions and intuitive buttons and menus, for performing simple computations. But it’s hard for it alone to accomplish complicated computations and certain special ones. So it offers the add-in interface through which an external application can be connected to use their language or script to help Excel handle those computations.
Now I’ll investigate a number of common Excel add-ins and evaluate their computational capabilities.
One of the earliest Excel add-ins, Excel DNA lets you create dynamic library functions for Excel using C#, F#, VB.net, etc.
First, you need to write a user-defined function. Below is an example written in C#, which is listed in the official website of Excel DNA. MyFunction is the name of the user-defined function.
The code needs to be compiled as a dynamic library to be used in Excel.
Then you configure the relationship between the user-defined function and the add-in. The following DnaSample.dna file configures the add-in’s name ("My name") and the corresponding dynamic library Mylibrary.dll, which includes multiple user-defined functions.
Finally you configure the new add-in in Excel to be able to call MyFunction from a cell, as shown below:
Since a compiled program is ready to execute and closely integrated into Excel, the execution is smooth and fast. So the biggest advantage of Excel DNA is the high fluidity.
As the name implies, the add-in can make full use of the facilities Windows DNA Architecture offers, such as development languages and tools, integration with Excel, interactive debugging and etc.
In theory, languages like C#, F# and http:// VB.net are universal and all-powerful. But I noted that the sample programs in their websites are all about string output. Surely this is not a good choice for display of strengths. So why is that?
Because their capabilities exist only on paper.
Those languages lack class libraries for structured data computations. Hardcoding is needed even for the most basic calculations. The extremely roundabout code isn’t suitable for handling complicated computations.
Besides, C#, F# and http:// VB.net are compiled languages instead of interpreted languages. This type of programming languages requires users to maintain a compilation environment for compiling the algorithm, in case it is changed. But configuring Windows compilation environment is complex and difficult to learn. Actually these languages have a high technological threshold. This means that Excel DNA is more suitable for professional programmers who use it as an interface, rather than for most of the data analysts who directly use it for desktop analysis.
Other add-ins, such as Java-based JINX, also lacks class libraries for structured data computations. So JINX is unsuitable for data computations, too. Even Excel VBA, the spreadsheet tool’s built-in add-in, is no better in terms of expression capability (which means that it isn’t suitable for computing data). But as it doesn’t require integration and compilation, it is more competitive than Excel DNA and/JINX.
It has great interface management ability. It can access Excel menu bar, buttons and a pop-up dialog using simpler syntax and define an add-in interface in a JS file. That’s much convenient than VBA.
Unfortunately, the interface management isn’t the key aspect of a data computing add-in. It’s not worthy of attention.
A standard data computing add-in should have class libraries for structured computations, like PyXLL. PyXLL is a Pyhon-based add-in. Python Pandas has the structured computations class libraries.
PyXLL doesn’t need hardcoding when implementing simple algorithms, such as grouping & aggregation over the specified area. Here’s one example. Select a batch of employee records from an Excel worksheet, pass them to a user-defined function groupEmp, perform grouping & aggregation algorithm in PyXLL and return the result. The code for implementing the process is as follows:
The core code occupies only one line. Others are basically the routine. The program is succinct.
Of course the cooperation of multiple functions, instead of a single basic function, is needed to perform complicated and special computations. The problem is that it’s not convenient to do that in PyXLL.
Here’s one example of standardizing and then grouping and aggregate data. Based on a Excel table recording unit styles (columns A-E), the user-defined function will group records by STYLE and BEDROOMS and calculate averages over SQFEET column, BATHS column and PRICE column. You need to remove $ from each value of the string style PRICE column and convert it to numeric style for the computation.
The user-defined function for implementing the algorithm is as follows (only core code is shown):
Only one line is for grouping, but six lines for pre-processing. That’s a little complicated.
Another example is splitting one row into multiple rows. Column A stores IDs and column B stores corresponding List values. There are List values that have multiple members separated by space. The user-defined function needs to split them by spaces and correspond each of them to the ID.
The user-defined function for implementing the algorithm is as follows:
The core code is complicated. PyXLL is not good at handling complicated or special computations.
PyXLL has one more problem. Excel has to call an external interpreter to parse the Python script. This results in very low fluidity and seriously bad user experience. Yet low fluidity isn’t a unique PyXLL problem. It’s common among all scripts add-ins requiring external interpreters, such as XLwings, Bert and RExcel. XLwings is another Python-based add-in and so share same pros and cons with PyXLL. Both Bert and RExcel are R-based. The R language is designed for implementing scientific modeling algorithms. Its structured computation class library isn’t professional. So both add-ins have weaker computing ability and lower fluidity than PyXLL.
he biggest advantage of an interpreted language is that they support immediate execution without compilation and that they are easy to maintain and modify.
esProc is the professional data computing engine that offers an Excel add-in to write scripts using its SPL language. Similar to PyXLL, it boasts a wealth of structured computation function to implement simple algorithms effortlessly. To group and aggregate records in the specified area, for example, you just need the following script ( groupEmp.dfx):
The core code is A2. It’s concise. Then we can call the user-defined function in an Excel cell. The syntax is =dfx("groupEmp",A1:D20).
esProc handles other basic algorithms in a simple and easy way (Here only the core code is shown):
Now we can see that the standard of evaluating an add-in’s computational ability is how well it does in handling complicated or special computations.
Compared with PyXLL, esProc has an edge in this aspect.
It’s much easier and simpler in converting data to standard format and then grouping it in esProc than in PyXLL:
It’s extremely simple to split one row into multiple rows in esProc:
A more complicated example is to calculate installments. An Excel worksheet records loan information, including load ID, amount, number of terms (by month) and annual rate:
The user-defined function needs to get the detailed data for each term, including payable amount, interest, principal and principal balance. The expected result in a new worksheet is as follows:
It’s rather convenient to get this done in esProc:
Though having powerful computational ability, esProc is also an add-in whose execution depends on the external interpreter JVM. So the unfluidity problem still exists.
Is there a way to both retain the computational capability and run the application smoothly?
esProc successfully achieve that by replacing the user-defined function with the clipboard!
To find the students whose score of each subject ranks in top 3, for example. Column A contains student names, column B-D contains scores of math, English and physics respectively. Now we want to find the eligible students to append them to the scores in college.
The source data:
Select the valid cells, copy them to the clipboard through ctrl+C, and then execute the following esProc script:
After the above script is executed, just select cell B11 and press ctrl+V and then data in the clipboard is copied into B11-D13. This gets the same result as the user-defined function does, as shown below:
With esProc, you can always use the clipboard instead of writing a user-defined function except for certain cases, such as multiple areas are involved in the computation.
hrough the clipboard, you can have both a fluid computing process and the powerful computing ability, without the complicated deployment of the add-in. For data analysts, that’s really convenient.
In theory all add-ins including PyXLL can solve the unfluidity problem through the clipboard only if they can provide corresponding functions (getting data from the clipboard and convert it into the structured data within the Excel) in future versions.
In a nutshell, a fluid add-in has weak computing ability, and one that has strong computing ability is not so fluid. Yet with the clipboard method that can make up for the fluidity limitations, esProc is surely the most suitable add-in for data analysts.
来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/31543054/viewspace-2716433/，如需转载，请注明出处，否则将追究法律责任。