Workflow Type: Galaxy
V 20 Renamed several output datasets in workflow
Associated Tutorial
This workflows is part of the tutorial Text-Mining Differences in Chinese Newspaper Articles, available in the GTN
Features
- Includes a Galaxy Workflow Report
- Uses Galaxy Workflow Comments
Thanks to...
Workflow Author(s): Daniela Schneider
Tutorial Author(s): Daniela Schneider
Tutorial Contributor(s): Björn Grüning, Daniela Schneider, Saskia Hiltemann, Teresa Müller, Helena Rasche
Funder(s): German Competence Center Cloud Technologies for Data Management and Processing, Ministry of Science, Research and Arts, German Network for Bioinformatics Infrastructure Service, Training, Cooperations & Cloud Computing
Inputs
| ID | Name | Description | Type |
|---|---|---|---|
| Input censored text | #main/Input censored text | Upload the censored text containing replacement characters like ‘×’. |
|
| Input uncensored text | #main/Input uncensored text | Upload the uncensored text without replacement characters. |
|
Steps
| ID | Name | Description |
|---|---|---|
| 2 | Preprocessing of censored text | This step uses Regular Expressions to delete all empty spaces (\s) and show only one character per line (\1\n). The result is a cleaned and reformatted text showing only one character per line. toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy3 |
| 3 | Preprocessing of uncensored text | This step uses Regular Expressions to delete all empty spaces (\s) and show only one character per line (\1\n). The result is a cleaned and reformatted text showing only one character per line. toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy3 |
| 4 | Comparison with diff - user version | The diff tool compares the two cleaned texts. This version (HTML version) creates an HTML file, which colour codes differences as additions (green) or extractions (red) when comparing the texts. toolshed.g2.bx.psu.edu/repos/bgruening/diff/diff/3.10+galaxy1 |
| 5 | Comparison with diff - computer version | The diff tool compares the two cleaned texts. This version of the output (raw output) is used for the further steps of the analysis. It is less intuitive for users. Therefore, the second diff below includes a more visual version of the output (HTML). toolshed.g2.bx.psu.edu/repos/bgruening/diff/diff/3.10+galaxy1 |
| 6 | Select only censored lines | This step selects all lines from the diff file that contain the censorship symbol ×. Grep1 |
| 7 | Compute | This step unifies the formatting and adds potentially missing columns, should lines extracted before coming up empty in the second text. This ensures the proper number of columns and allows the smooth running of the next steps. toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/2.1+galaxy0 |
| 8 | Cut | This step selects only column 9, which contains the uncensored characters from text two. The result is only one column with different rows of Chinese characters. This step allows scaling words by frequency the word cloud in the next step. meaning characters that appear more often appear bigger, making the results evident at first sight. Cut1 |
| 9 | Datamash | This step sums up how often which character appeared in the table before. toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.9+galaxy0 |
| 10 | Generate a word cloud | This step shows, which characters were censored in the first text. The bigger the word, the more often it appeared in the text. toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.6+galaxy0 |
| 11 | Sort | Sorts the quantified results from those appearing most to those appearing least. toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/9.5+galaxy3 |
Outputs
| ID | Name | Description | Type |
|---|---|---|---|
| output_graphic | #main/output_graphic | n/a |
|
Version History
Creators and SubmitterCreators
Not specifiedSubmitter
Discussion Channel
Activity
Views: 2938 Downloads: 394 Runs: 0
Created: 2nd Jun 2025 at 11:01
Last updated: 11th May 2026 at 14:58
Annotated Properties
Scientific disciplines
Computer Science
AttributionsNone
Visit source
Run on Galaxy
master
2.0