Example Development of fffFillBuffer()
The following is an example of how the fpFillBuffer()
function in foliosr
could be developed. The example demonstrates how the code changes as limitations of the implementation are identified. With each implementation, code revisions are shown in bold.
Implementation 1—fpFillBuffer() Function
/***************************************************************** *Function: fffFillBuffer() *Summary: Read fff input from stream and parse into kvtoken.h codes *****************************************************************/ int pascal _export fffFillBuffer( void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { BOOL bRetVal; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; fffReadSourceFile(pContext); bRetVal = fffProcessBuffer(pContext, pcBuf); *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); }
The parameters in fffFillBuffer()
are as follows:
Parameter | In/Out | Description |
---|---|---|
pCFContext
|
In | A pointer to the context structure of the custom reader. |
pcBuf
|
In/Out | A pointer to the token output buffer. |
pnBufOut
|
Out | A pointer to the number of bytes written to the output buffer. |
pnPercentDone
|
Out | A pointer to the percentage complete. |
cbBufOutMax
|
In | The maximum number of bytes that the token output buffer can hold. |
Structure of Implementation 1
- The local variable
pContext
is set to the address of thepCFContext
void pointer, cast to a pointer to the global context structure for the reader. This provides access to all members of this structure. - After setting the
pContext
variable, a call is made to read the source file. - Next, a call is made to
fffProcessBuffer()
. The second parameter in the call is a pointer to the token output buffer. If this call fails, usually because of memory allocation errors, it returnsFALSE
. - The percentage complete is calculated.
- The number of
BYTES
written to the token output buffer is calculated. This is based on the value ofpContext->pcBufOut
, which is increased each time a token is written to the buffer. - The function returns to the structured access layer.
- Subsequent calls to
fffFillBuffer()
are made by the structured access layer until the percentage complete is 100.
Problems with Implementation 1
- There is a limit to the size of the token output buffer, typically 4 KB. If
fffProcessBuffer()
generates a token stream larger than this, there is a memory overflow. IffffProcessBuffer()
generates a small token stream and the entire file has not been read, the output token buffer is underutilized. - It might not be possible to process the entire input buffer from the source file because of boundary conditions. An example of a "boundary condition" is when the input buffer terminates part way through a control sequence in the original document. Another file read operation is required before the complete control sequence can be parsed.
- This function might be interrupted by other calls from the structured access layer to process headers, footers, footnotes, and endnotes, or to retrieve the document summary information. This can cause values of variables in the global context to change, and the source file to be repositioned.
Implementation 2—Processing a Large Token Stream
Implementation 2 addresses the problem of processing a token stream that is larger than the output buffer size limit.
/***************************************************************** * Function: fffFillBuffer() * Summary: Read fff input from stream and parse into kvtoken.h codes *****************************************************************/ int pascal _export fffFillBuffer( void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { BOOL bRetVal = TRUE; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; pContext->cbBufOutMax = 9 * cbBufOutMax / 10; /* Process the portion of the fff file that is in the input buffer but do * not return from the fffFillBuffer() function unless the output buffer is * at least 90% full. If any of the memory allocations fail during the * execution of fffProcessBuffer(), bRetVal will be set to FALSE, resulting * in this conversion failing "gracefully". */ do { if( pContext->bBufOutFull ) { pContext->bBufOutFull = FALSE; } else { fffReadSourceFile(pContext); } bRetVal = fffProcessBuffer(pContext, pcBuf); *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); }while( bRetVal && !pContext->bBufOutFull && *pnPercentDone < 100 ); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); }
Structure of Implementation 2
cbBufOutMax
is used to setpContext->cbBufOutMax
. This is used infffProcessBuffer()
to monitor how full the token output buffer becomes as the source file is processed.- When the source file input buffer has been processed,
fffProcessBuffer()
returns, and the percentage complete is calculated. - If the token output buffer is not filled to a value greater than
pContext->cbBufOutMax
,pContext->bBufOutFull
remains set toFALSE
, and if the percentage complete is less than 100, thedo-while
loop is re-entered without returning from this function to the structured access layer. There is another call tofffReadSourceFile()
, followed byfffProcessBuffer()
. - When the token output buffer is filled to a value greater than
pContext->cbBufOutMax
,pContext->bBufOutFull
is set toTRUE
. In this case, thedo-while
loop ends, the number of bytes written to the token output buffer is calculated, and control returns to the structured access layer. - The structured access layer continues to make calls to
fffFillBuffer()
until the entire source file is processed. - Each time the structured access layer calls
fffFillBuffer()
, another empty token output buffer is provided for the custom reader to use. - If the previous call to
fffFillBuffer()
exited because the previous token output buffer exceeded allowable capacity,pContext->bBufOutFull
is reset toFALSE
and no call is made to read the next buffer from the input source file.
Problems with Implementation 2
- It might not be possible to process the entire input buffer from the source file because of boundary conditions.
- This function might be interrupted by other calls from the structured access layer to process headers, footers, footnotes, or endnotes, or to retrieve the document summary information. This can cause values of variables in the global context to change, and the source file to be repositioned.
Boundary Conditions
A boundary condition can result from many situations arising from input file processing. For example, the input buffer might end with an incomplete command. In Folio flat files, this could be an incomplete element. In other word processing documents, a boundary condition might result from an incomplete control sequence, a split double-byte character, or a partial UTF-7 or UTF-8 sequence. These can be handled jointly by fffProcessBuffer()
, which must detect the boundary condition, and fffReadSourceFile()
.
The following example shows partial code used in fffReadSourceFile()
:
/**************************************************************** * * Function: fffReadSourceFile() * ***************************************************************/ int pascal fffReadSourceFile(TPfffGlobals *pContext) { int nBytes; /* Transfer remaining data to beginning of buffer prior to next read */ if( pContext->nResidualBytes ) { memcpy(pContext->cInputBuf, pContext->pcBufIn, pContext->nResidualBytes); } /* Read from file, without over-writing any text from the previous buffer */ nBytes = (*pContext->pIO->kwReadFunc)(pContext->pIO, pContext->cInputBuf + pContext->nResidualBytes, BUFFERSIZE - pContext->nResidualBytes); /* Update input buffer control parameters */ pContext->unTotalBytesRead += (UINT)nBytes; pContext->pcBufIn = pContext->cInputBuf; pContext->pcBufInMax = pContext->pcBufIn + pContext->nResidualBytes + nBytes; pContext->nResidualBytes = 0; return nBytes; }
If fffProcessBuffer()
is unable to process the entire input source file buffer, it sets the value for pContext->nResidualBytes
. When the next call to fffReadSourceFile()
is made, any residual bytes are copied to the beginning of the input source file buffer, and the number of bytes to be read is reduced to make sure that this buffer does not overflow.
A good way to test the code for boundary conditions is to vary the size of BUFFERSIZE
and make sure that the results remain consistent.
NOTE: With ReadSourceFile()
, the source file can be read by calls to retrieve header or footer information. If this occurs, the value for pContext->unTotalBytesRead
is incorrect.
Implementation 3—Interrupting Structured Access Layer Calls
Implementation 3 addresses the problem of boundary conditions and interrupting calls from the structured access layer.
/**************************************************************************** * Function: fffFillBuffer() * Summary: Read fff input from stream and parse into kvtoken.h codes ****************************************************************************/ int pascal _export fffFillBuffer( void *pCFContext, BYTE *pcBuf, UINT *pnBufOut, int *pnPercentDone, UINT cbBufOutMax ) { double dTotalBytesProcessed, dFileSize; BOOL bRetVal = TRUE; TPfffGlobals *pContext = (TPfffGlobals *)pCFContext; pContext->pcBufOut = pcBuf; pContext->cbBufOutMax = 9 * cbBufOutMax / 10; /* Process the portion of the fff file that is in the input buffer but do * not return from the fffFillBuffer() function unless the output buffer is * at least 90% full. If any of the memory allocations fail during the * execution of fffProcessBuffer(), bRetVal will be set to FALSE, resulting * in this conversion failing "gracefully". */ do { if( pContext->bBufOutFull ) { pContext->bBufOutFull = FALSE; } else { fffReadSourceFile(pContext); } bRetVal = fffProcessBuffer(pContext, pcBuf); if( pContext->bHeaderCompleted ) { *pnPercentDone = 100; pContext->bHeaderCompleted = FALSE; } else if( pContext->bFooterCompleted ) { *pnPercentDone = 100; pContext->bFooterCompleted = FALSE; } else { if( pContext->unTotalBytesProcessed >= pContext->unFileSize ) { *pnPercentDone = 100; } else if( pContext->unFileSize < FFF_MAX_ULONG ) { *pnPercentDone = (int)(pContext->unTotalBytesProcessed * (UINT)100 / pContext->unFileSize); } else { dTotalBytesProcessed = pContext->unTotalBytesProcessed; dFileSize = pContext->unFileSize; *pnPercentDone = (int)(dTotalBytesProcessed * 100 / dFileSize); } } }while( bRetVal && !pContext->bBufOutFull && *pnPercentDone < 100 ); *pnBufOut = (UINT)(pContext->pcBufOut - pcBuf); return (bRetVal ? KVERR_Success : KVERR_General); }
Structure of Implementation 3
- The most significant change in Implementation 3 is the addition of the code that checks whether the processing of the header or footer is complete. The variables for
pContext->bHeaderCompleted
andpContext->bFooterCompleted
are set toTRUE
infffProcessBuffer()
when a header or footer is processed and the end of that portion of the document is reached. - The other piece of code added in Implementation 3 is unique to
foliosr
. Folio files can be 50 MB or larger. Therefore, an unsigned integer is too small to accurately calculate the percentage complete. If the file size exceedsFFF_MAX_ULONG
, which is defined as(UINT)(0xFFFFFFFF / 0x64)
, the doubles are used for that calculation. - Prior to returning, the token output buffer is as full as possible and never overflows. The minimum number of calls is made.