添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Silly question...

I have a yxdb file.

The file is the output of the PDF to Text tool.

The output file contains 1700 rows grouped in 53 pages.

The relevant fields are Pg, Row, Text.

Pg is page number.  Row is the row number relative to a given page.  Each page averages 30-40 lines.

I want to process each page individually -- text processing (parsing etc.)

Batch Macro (using Pg) as the control value OR Iterative Macro?

I assume batch macro, but I am trying to find an excuse to use an Iterative Macro ( if it makes sense).

Screenshot 2024-03-25 172626.png

Hi @hellyars ,

If you want to deal with the text in one [Pg] in a chunck, I would concatenate the [Text] with Summary tool Group By [Pg], and do the text processing on the concat_Text.

If I do not need to Macro, I would keep all the processes in one Standard Workflow.

(Partially because I am not so good at Macro, but also because of the readability of workflow.)

If there is any reason you have to use Macro, please ignore this post.

Went with a batch macro with a control parameter set to page number.

Image based PDFs are always a mess.  Especially documents that have not changed in decades.

Macro makes it cleaner/easier to troubleshoot sections of larger workflow etc.

Alteryx PDF to Text tool did an awesome job.

Compared favorably to AWS Textract and fewer steps moving files around.

But there's always that last 5% that have to be manually tweaked to work.

Hi @hellyars

Just to add a simple rule rule that you can use to determine whether to use an iterative or batch macro.

If you know the number of iterations before you start the macro AND the inputs don't change between iterations, use a Batch macro.  Otherwise use an Iterative macro

In your case, you know the number of iterations (53) AND the Inputs don't change between iterations so a batch macro is the way to go with 53 records in the control parameter.

A typical use cases for an Iterative macro is calling an API that implements paging.  Many of these return a result in the response that gives you the URL of the next page or an offset to add to a start value.  When there are no more pages, the API doesn't return the next page result.   You can't use a batch macro because you don't know how many pages are in the data until you call the first one.

The other typical iterative use case is where your building some kind of hierarchy of indeterminate depth.  Here the inputs change between iterations, because you pass all the records in the first iteration and pull out the top level parents.  For the next iteration you either remove the parent records from the input or mark them in some way so they are ignored.   You keep on iterating until all the records are either removed from the input or marked.

As an aside, you can always convert a batch macro into an iterative one, by embedding the stop logic into it.  In your case it would be stopping when engine.iteration number = number of pages - 1. I wouldn't recommend doing this though, since the logic because more complex and harder to maintain

Dan