Part 1 of this 3-part series looks at the basic problems of translating basic MATLAB functions to C.

DSP DesignLine

Part 2 looks at complex functions and the role C interface constraints play in the translation process. It will be published Thursday, December 6. For a related product story, see MATLAB-to-C tool get major upgrade.

Most signal processing and communication projects nowadays at some point require translating MATLAB code into equivalent C code. Goals and requirements of the resulting C code can be diverse. Examples include:


While C code requirements vary widely, some translation pitfalls are common across all applications and teams. Many of these pitfalls derive from the fact that MATLAB is essentially an interpreted language. Consequently, it does not require a priori knowledge. I.e., MATLAB doesn’t know the type, shape, dimension or even the existence of any variable or function until execution.

Let us list some of the most common or bothersome pitfalls:

  • MATLAB indexes array elements starting with 1 whereas C indexing starts with 0
  • MATLAB is column major whereas C is row major. This means consecutive data in MATLAB are column elements whereas consecutive data in C are row elements.
  • MATLAB is inherently a vector-based representation. This can make the translation challenging. In particular:
    • You must replace the simplest vector operations with a loop construct
    • Operators (such as times ‘*’) in MATLAB perform different operations depending on the type of the operands
    • MATLAB includes very simple and powerful vector operations such as the concatenation « [] »and column « x(:) » operators or « end » construct, which can be quite hard to map to C
  • MATLAB supports « polymorphism » whereas C does not. I.e., you can write a generic function in MATLAB, which can process different types of input parameters. In C, each parameter has one given type, which cannot change.
  • MATLAB supports dynamic extensions and sizing of arrays, whereas C code requires storage to be allocated explicitly using malloc/free.
  • MATLAB draws on a rich set of libraries that are not available in C. Implementing such functions requires writing new code. Sometimes there are pre-exiting libraries and functions available for your target platform, but integrating them into your application can be problematic. In addition to running into licensing issues (such using GPL code for commercial application), interfacing with these libraries is non-trivial.
  • MATLAB supports reusing the same variable for different contents (different types). C does not, as each variable has one unique type.

As we will see, the process of writing C code from MATLAB is trickier than it sounds.

Preliminary step – What are my inputs?
When converting MATLAB to C, the first thing you need is a description of your function’s input parameters. This is because MATLAB, being an interpreted language, does not require declaration of data type for any variable.

Complicating matters, MATLAB variables have the following features not available in C

  • The ability to have a variable number of parameters
  • MATLAB data types that have no C equivalent, such as « cell arrays », a collection of heterogeneous types packed into slices of an array

Typically, information about input parameters can be found in comments embedded at the top of the MATLAB function, or in a report written by the MATLAB code implementer.

Conversion Examples
With the input parameters defined, we can now turn to the MATLAB code itself. To illustrate some basic issues with translating MATLAB to C, we will show three examples, which we will translate by hand.

Example 1 – Simple(?) MATLAB code
Let’s start with a simple example. The MATLAB code below take in a vector ‘x’, and returns all but the first two values, sorted in ascending order, that are greater than a given threshold.

In MATLAB, this is an easy feat:


We can run it on a simple case:


Even such a trivial example, which does not make use of any advanced matrix or mathematical features of MATLAB, presents a number of problems for the C implementer:


An example implementation, as written in syntactically correct C, is shown below (where the « my_sort » function still has to be implemented):


We have only scratched the surface, but clearly, there is nothing obvious about translating such simple MATLAB code into C.

Example 2: Polymorphism
MATLAB does not require knowing the type of the input parameters. This means that one single function can be called with different arguments. Moreover, the times operator ‘*’ in MATLAB can perform a cross-product, a norm or an element-by-element multiplication depending on the operands, as shown below.


In this case, the function « polymorph » performs two very different computations to compute ‘a’ and ‘b’:

  • for ‘a’, it computes the norm of the vector ‘x_complex_matrix’ which is +30.
  • for ‘b’, it computes the sum of the multiplication of ‘x_complex_matrix’ with a scalar (‘x_real_scalar’). The result is 12 -6i

We can run this function in MATLAB:


When translating this behavior to C, you are likely to want to write two completely different functions for « polymorph’.


Here is an example of how this function would look like in C, assuming we know that the input matrix has 10 elements.


(Click to enlarge)

Example 3: Dealing with unintended array expansion
One of our favorite (and real-life) examples is the unintentional expansion of a pre-defined array in MATLAB.

The story goes like this: the MATLAB designer thinks that array ‘x’ is 14-elements long. He declares it to be that way and never worries about whether this is actually the case or not. The C implementer, taking his word for it, declares ‘x’ to be ‘double[14]’.

After countless simulations and week-ends spent debugging the mismatches between the MATLAB code and the C code, it turns out x was actually 15-long (in some exceptional case) and that the C code was reading in incorrect values.

MATLAB, indeed, lets you write statements such as:


If N turns out to be 15 at some point, you are not going to get any hint: « x » is going to be automatically extended to 15 elements. Not so in C, with the consequences mentioned above.

Part2:Part 2 of this 3-part series looks at complex functions and the role C interface constraints play in the translation process.

Complex functions and C interfaces

In the first part of this series, we showed how basic MATLAB operations can be translated into C code. This section focuses on translating more complex MATLAB functions or toolbox functions into C, and on integrating C algorithms derived from MATLAB into your application.

Implementing complex MATLAB functions
Of the many MATLAB functions used to develop algorithms, most do not have equivalents in the C language. For example, the C language does not have equivalents for basic MATLAB functions such as « find » (which automatically extracts elements from an array) or « sort » (which sorts array elements).

Some of these functions are fairly well known and can be found in C libraries or on the Internet. For example, it is easy to find sorting algorithms. Nevertheless, finding the most efficient implementation for MATLAB’s sort function can be quite an adventure, as noted in this article.

In addition, you may not be able to use the functions you find due to license restrictions. These include license restrictions on derivative work from copyrighted material (such as MATLAB files) or open-source (GPL) files. In such case, you may need to carefully and independently recode the algorithms by hand for your commercial application.

When you implement a MATLAB function in a language such as C, you may find it difficult to replicate MATLAB’s results. For example, image processing applications may use morphology functions such as « imresize » to resize an image. When implementing such functions, it is easy to end up with results that are a fraction of a pixel off from MATLAB’s results. (Even MATLAB 2006b and 2007a produce different results for the « imresize » function because they implement it differently). To replicate MATLAB’s results, you may need to recalibrate your algorithm in C. This might add significant delay to your project.

Some functions have implementations in both MATLAB and C. The standard math C library can be used to implement functions such as divide, sqrt, cos, sin, etc. in C when dealing with real scalar data. Things get more complicated once you start adding complex numbers and matrices into the mix.

For example, the operation « a/b » may be look like a simple divide operator in MATLAB.

When the operands are complex scalars, the operation is slightly more complicated in C:


If b is a matrix, « a/b » becomes a « matrix divide, » which is a lot more complicated to implement. A « matrix divide » requires solving a linear system of equations consisting of finding ‘x’ such that « a*x=b ». This is typically performed using complex matrix manipulations based on matrix QR decompositions. An example of this approach is shown in Implementing matrix inversions in fixed-point hardware.

Integrating with other functions and interface definition
When translating a MATLAB algorithm to C, you may be able to re-use existing code or leverage functions available in libraries. Your task then is to integrate the C code derived from your MATLAB algorithm with existing source code and libraries. To make this integration successful you need to make sure the interfaces of your different functions are compatible. In particular, you need to pay special attention to conventions on how complex and multi-dimensional variables are laid out in memory, when data are passed by references and by value, and whether indices start from 1 or 0.

Indices start with 1 in MATLAB and 0 in C
One difference between MATLAB and C is that MATLAB uses one-base indexing whereas C uses zero-base indexing. This means that the index of the first element of an array in C is 0, and is 1 in MATLAB.

You need to be aware of this when communicating index values between C and MATLAB functions, otherwise you may end up addressing elements that are off by one in your final C code.


Consecutive array elements are stored in columns in MATLAB and in rows in C
MATLAB use column-major order, which mean column elements are stored next to each other in memory. In contrast, C uses row-major order, which means that row elements are stored next to each other. (See Wikipedia for details on the two array layouts.) Array layout is critical for correctly passing arrays between C and MATLAB functions. It is also important when accessing a specific data element using a single subscript ((*a)[i] instead of a[i][j]), or for traversing an array without cache misses for good performance.

If your NxM MATLAB array is implemented as an NxM array in C, you need to watch out for single subscripts. In the example below, the 7th element of the array cst in MATLAB is cst(7) that stores the value « 4 ». The 7th element of the array cst in C is cst[7-1] that stores the value « 7 ». To access the value « 4 » in the C code, you would need to access cst[4-1]. As a result, when arrays are addressed with single-subscript expressions, you might have to add some extra computation in order to access the right location within your array.


(Click to enlarge)

Some algorithms such as symmetrical filters do not depend on the orientation of the data, which means they produce the same outputs when traversing data row by row or column by column. In this case, it is not necessary to recompute single subscript expressions in your C code.

If you want the data in the C code to be located at the same positions as in MATLAB, your other option is to map the NxM array in MATLAB into a MxN array in C. In the example below, the 7th element of the array is cst(7) in MATLAB and cst_rot[7-1] in C and they both store the value « 4 ». This requires rotating the data when interfacing with other C code that expects data to be stored in row-major order. An example of this approach is shown here.


(Click to enlarge)

Rotating data into and out of your function requires extra memory and copying. In many cases, it is more efficient to translate MATLAB’s column-major code into a row-major C code.

Dealing with Complex numbers
Support for complex data is built into MATLAB. For example, the result of an FFT is simply stored into a variable « y=fft(x) ». The implementation of the complex data type itself is abstracted away from the programmer. In contrast, ANSI-C does not have a built-in data type for complex numbers. (Complex data support has been added as part of the C99 standard.)

Typically, there are two ways of implementing complex arrays:

  • Separate or split complex: the real part and imaginary part are stored into two separate arrays (e.g. separate a_real[0] and a_imag[0])
  • Interleaved: the real and imaginary parts of a given array element are stored next to each other in memory (e.g. interleaved a[0] for the real part and a[1] for the imaginary part)

Much like with row-major vs. column-major, the decision on which implementation to use depends on how data is accessed in the algorithm and most importantly on how data is communicated to external functions.

MATLAB stores real and imaginary parts in separate arrays. This makes calls to the « real » and « imag » functions trivial to implement. On the other hand, many programming environments (including C99, FORTRAN, etc) and common libraries (Intel MKL, LAPACK, BLAS, FFTW, etc.) make use of the interleaved complex data type.

Translating split complex data into interleaved complex data requires extra memory and copies. If your algorithm often calls functions that are only available with interleaved complex data, it is more efficient to translate MATLAB’s split complex arrays into interleaved complex arrays in C.

Passing arrays by reference
MATLAB programmers typically do not have to worry about whether data are passed by reference or by value. Internally, all arrays in MATLAB are kept as « references » as long as their values remain unchanged. A local copy of the array is created as soon as a value within the array changes, in order to prevent unexpected side-effects.

To replicate this semantics in C, you need to copy data explicitly. As an example, if an array is the input of a function, it may only be passed by reference if that array does not get modified inside of the function.

MATLAB doesn’t explicitly support parameter passing by reference. However, an implicit convention is to use the same parameter names for both the input and output of the function to perform « in-place operations, » as illustrated in this blog. In C code, this can be accomplished by passing a pointer to the input parameter.

Interfacing with C libraries
The considerations presented in the previous sections apply to library functions as well as to functions that you implement yourself. For example, FFTW, LAPACK or the Intel Math Kernel Library all implement complex data as interleaved data. Data in LAPACK are passed in column-major format. FFTW, on the other hand, supports both row- and column-major formats.

Most libraries have specific memory allocation requirements on what data must be pre-allocated before the function is called, and freed after the function returns. The LAPACK library has the special property that all input, output or intermediary arrays must be pre-allocated by the caller of the library function and passed by reference. This way no dynamic memory allocation happens within the function called.

Some libraries even use specific memory allocators. For example, the FFTW library defines its own « fftw_malloc » function to guarantee that the returned pointer obeys the alignment restrictions imposed by the FFTW algorithm implementation.

In addition to constraints on the interfaces, the actual mapping between MATLAB functions and library functions is also non-trivial. Even though MATLAB makes use of both the LAPACK and FFTW libraries internally, these libraries do not have equivalent functions for operations such as matrix divide « mrdivide » or the FFT-based FIR filtering « fftfilt, » which is part of the signal-processing toolbox.

part 3: Code generation and verification

Part 3 of this 3-part series examines the verification process and makes the case for automatic C generation.

In the first two parts of this series, we navigated the waters of MATLAB to C translation and encountered some of the Charybdis and Scylla monsters that await the unsuspecting engineer along the way. By now, it should be plain indeed, that translating MATLAB to C, short of being a Homeric Odyssey, is anything but an easy task.

In this third and last part, we will explore verification and emphasize how automatic C code generation solves most of the issues described so far. We will illustrate this approach with Catalytic MCS (MATLAB to C Synthesis).

Making sure the C is correct
Verifying that the generated C code matches the original MATLAB code is both extremely important and challenging.

Thankfully, various techniques exist for such verification. A popular albeit tedious method consists of using files to pass input and output data, and comparing the results between the MATLAB code and C code. A more systematic method consists of turning the generated C code into a compiled MEX-file that can be called directly from MATLAB like any other function.

The mex command can be used from the MATLAB command line to compile C code into a MEX-file. The mex command uses the MEX API, which provides functions to pass data between your C code and MATLAB. The MEX-file guide provides more information on how to write your own MEX-files.

Turning your C model into a MEX-file allows you to rerun your original MATLAB simulation with no change and ensure the results are identical. However, you need to write a MEX wrapper function, which passes all the data you need to exchange between MATLAB and C. The wrapper uses arcane functions provided by the MEX-API. This task is also error prone—crashes, segmentation violations or incorrect results can easily occur if the MEX wrapper does not allocate and access the data properly.

The case for automation
Automating the generation of C code from the MATLAB file can help overcome these problems. Conceptually, an automated MATLAB-to-C converter offers the following advantages:

  • The generated code is correct by construction
  • The MATLAB and C code are always in-sync
  • You can spend more time in MATLAB developing and tuning your algorithms rather than writing and debugging low-level C-code
  • Updates to the algorithms result in automatic, immediate update of the C code

However, automatic translation of MATLAB code to C code raises a few important questions:

  • How powerful is the set of MATLAB commands that can be translated to C? The solution loses much of its appeal if it requires the algorithm designer to restrict his style drastically. The designer must then heavily rewrite his MATLAB code, a time-consuming and error prone process. Furthermore, the code loses a lot of its flexibility.
  • How efficient is the generated code? Are advanced techniques for memory management and C-code optimization available?
  • Does the generated C-code easily interface with the existing code for your application?
  • How readable is the resulting C code? Is the C-code understandable and maintainable? Does the generator retain the structure of the original MATLAB file?
  • Can the code generator target specific libraries for a given application?

We will review all these pitfalls when looking at Catalytic MCS. But note that code generation from MATLAB isn’t a new concept, although it has received limited attention.

The AccelDSP tool from Xilinx targets the FPGA market with Verilog/VHDL code generation from MATLAB. (For a related how-to article, see Generate FPGA designs from M-code.) This tool relies heavily on library modules to obtain the highest quality results out of a quantized design.

More recently, The Mathworks introduced a restricted subset of the MATLAB language called « Embedded MATLAB, » for which Real-Time Workshop can generate C code. The subset is aimed at addressing the needs of embedded designers, who do not perform algorithm exploration or mind restricting their MATLAB style for an embedded target. In particular, Embedded MATLAB avoids a lot of the issues described in the previous two parts of this series by requiring rewrite of the MATLAB in Embedded MATLAB.

In the remainder of this article, we’ll demonstrate how you can use Catalytic MCS to generate high quality, flexible C code from existing MATLAB code, which was written by algorithm experts without consideration for matrix size or type declaration.

Our Initial Example
Let’s get back to our initial, simple example from part 1:


This is a typical example of the power of MATLAB: you need not think about memory allocation, matrix size, naming or declaration of variables—MATLAB takes care of these.


(Click to enlarge)

Figure 1. Catalytic MCS GUI and automatic C-code generation.
Catalytic MCS accepts this code as-is and translates it to the C code shown in Figure 1. (A confession: the C code shown in part I was generated by Catalytic MCS).

We still need to show how MCS helps solve the core issues mentioned earlier. Let’s revisit that now.

Polymorphism
In MATLAB, polymorphism implies that a function can behave differently depending upon the type of the input parameters.

Consider, for example, the following trivial function:


The C code for this function would be completely different if both var1 and var2 are matrices or var1 is a matrix and var2 a scalar.

Catalytic MCS automatically analyzes the M-code and extracts the type (real, complex, etc.) and shape (scalar, row, column, matrix, etc.) of variables. Based on this analysis, MCS generates C code that works for all possible configurations. This ensures that the C code behaves identically to the MATLAB code.

The default generated code, shown in Figure 2 on the left, includes many options to account for the different possible sizes of var1 and var2. MCS analyzes the « polymorph » function and computes an element-element multiply (if var1 or var2 is scalar) or matrix multiply. MCS analysis can also identify that the multiply is for real floating-point data, not complex data.

If you select the input variables var1 and var2 to be 2×5 and 5×4 in size, respectively, MCS generates the very much simplified code shown on the right.


(Click to enlarge)

Figure 2. The MCS GUI and two possible C code generated from the MATLAB file: on the left, the most general C code, which works for all types of var1 and var2 matrices; on the right, a specialized version for fixed-size matrices.
MCS also optimizes the generated code. For example, if a parameter to a function is a constant, MCS uses advanced compiler optimization techniques such as constant propagation and automatic function specialization. Note that you need not constrain all sizes and types to generate tighter code—doing so for your input variablesis usually sufficient. The MCS analysis tool will propagate such information automatically. Also, if your code requires use of dynamic sizes, a range of sizes (instead of fixed sizes) can be specified.

Vector statements
MATLAB excels at representing matrices and matrix operations. However, implementing matrix operations and concatenations in C is challenging.MCS automatically generates the necessary loops and indexing for vector, matrix or N-dimensional array operations. As shown in the previous examples, MCS also uses 0-based indexing as required for C code.

MCS allocates the memory needed to perform concatenations and self-concatenations such as:


MCS also optimizes the generated C code. The following example illustrates one such optimization:

Consider the two simple vector statements:


In this case, creating a temporary matrix x and then computing the exponential of x would be inefficient. It would result in extra memory use, two distinct loops, and potential performance degradation due to cache misses. Instead, MCS performs an advanced « loop fusion » optimization, using only one loop and converting the temporary matrix « x » into a simple scalar temporary « itemp_0. » This is illustrated in Figure 3.


(Click to enlarge)

Figure 3. Example of loop fusion performed by Catalytic MCS.
This case is very trivial. However, it illustrates that the optimizations MCS performs can require careful thought from a C implementer.

Verifying your C code and catching unintentional array extensions
As stated earlier, the best way to check the behavior of your C code is to run it in the MATLAB context using the MEX interface. Writing the interface code by hand is tedious and error-prone. However, MCS automates this generation. MCS further guarantees that the code is correct by construction, but you can verify this yourself.

In addition, the generated code can also automatically catch unexpected errors at run-time. Let’s look at a case where the MATLAB happens to mistakenly write beyond an array boundary:


This function extends x if N is greater than 15.

MCS accepts the code but provides (on request) debugging information, which points to the exact problem when you execute the generated C code. In the screenshot below (Figure 4), MCS is used to generate the C-code and automatically compile it into a MEX-file for verification, directly from the MATLAB command line.


Figure 4. Debugging support in MCS.Much like other compilers, which provide both « debug » and « optimized » modes, MCS provides both a « safe » mode and « fast » modes for debugging or optimization.

By default, MCS generates « safe » code, which points out any potential problem (array extension, out-of-bound array access, incompatible sizes in operations, etc.) at run-time. This is invaluable for debugging your design. Once the model has been verified and behaves correctly, MCS can be invoked using the « -fast » command-line option to generate « fast » code. « Fast » code bypasses error checking and is tight and fast.

Interfacing the generated code
One critical aspect of MCS generated C code is the ease of integrating it with external C code and libraries.

For instance, most existing C code uses row-major layout, meaning that consecutive memory locations are on the same row (as opposed to column). Straightforward code generation from MATLAB yields column-major code. You can introduce an additional transposition at the interface to get around this (as explained here), but this is both inefficient and costly. MCS can generate either row-major or column-major code. This allows data to be passed smoothly and directly between the generated code and the code it needs to be integrated with (such as libraries, C application, and other MATLAB blocks).

For example, we are used to thinking about VGA resolution as 640×480. However, C code typically stores such an image as 480×640 (that is, 480 lines and 640 columns), and not 640×480. This highlights the importance of carefully considering the choice of your interface.

Figure 5 shows two pieces of code generated by MCS for the same MATLAB source file: the piece on the left uses column-major format, the right one uses row-major. If this code needs to be interfaced with external C code, you would require the row-major version.


(Click to enlarge)

Figure 5. Example of column-major code (left) and row-major code (right) generated from the same MATLAB source file.
Another often neglected topic is the interfacing of code with complex numbers. MCS also provides options to automatically split a complex variable into two separate variables holding the real and imaginary parts, or to interleave real and imaginary parts in arbitrary order.

Having control over the memory layout and interfaces of multi-dimensional arrays and complex data dramatically simplifies the integration of the C code generated from MATLAB within an application or with existing libraries. MCS, by providing easy interfacing capability, eases interfacing of code with libraries such as FFTW (for discrete Fourier and Cosine transforms), LAPACK (for all linear algebra operations), or target-specific ones.

Conclusion
Automatic C code generation with MCS from normal, unconstrained MATLAB code can help bypass many pitfalls during the translation process.

Possible applications include:

  • Running C simulations on a general purpose processor
  • Targeting embedded applications
  • Accelerating simulations
  • Getting a parameterizable, golden C model as a reference for implementation

In general, when considering an automatic translation tool for MATLAB to C, the following may be helpful:

  • Consider whether or not a lot of pre-existing MATLAB code exists. If legacy code is unavailable and you are writing new code, restricted MATLAB support will hurt you less than having to rewrite a lot of existing MATLAB code to fit the tool’s restrictions.
  • Ensure that MATLAB coding style restrictions do not limit describing your algorithm efficiently and conveniently

Assess the quality of the generated C code. Beyond that, assess how well the tool answers your needs in terms of interfacing (for example, generating row-major code for easy integration with external C code).

Preview of a better way…
This part highlighted some of the pitfalls in developing MATLAB library functions in C and interfacing these functions with external functions or libraries.

Part 3 of this article will present how Catalytic MCS can automatically generate C code from MATLAB that is easy to integrate with other C functions and that interfaces common libraries automatically. In addition, the Catalytic Function Library makes it possible to automatically generate functionally-equivalent C code for many MATLAB functions.

About the authors
Marc Barberis leads the applications group at Catalytic Inc. Prior to Catalytic, he held several positions in wireless design and system level simulation tools . His interests include physical layer for 3G and digital receiver algorithms. Marc holds a MS in EE from Ecole Nationale Superieure des Telecommunications de Paris. He can be reached at marc@catalyticinc.com.

Luc Semeria is product manager at Catalytic Inc. Prior to Catalytic, he held several positions at Synopsys Inc. His research interests include compilers, EDA tools, computer architecture, and DSP algorithms. Luc holds a Ph.D. in Electrical Engineering from Stanford University. He can be reached at luc@catalyticinc.com.

Related articles: