Topic: Delete "_expr3"

Hi.

I'd like to get rid of "_expr3".

The following type of files are affected:

"expr1 (expr3).ext" should become "expr1 ().ext"
"expr1 (expr2)_expr3.ext1.ext2" should become "expr1 (expr2).ext1.ext2"

The problem is the 2 different kind of files with 1 and 2 extensions.

How can I tell Siren to delete "_expr3"?

Thanks,

Re: Delete "_expr3"

Hello,

geohei wrote:

"expr1 (expr3).ext" should become "expr1 ().ext"
"expr1 (expr2)_expr3.ext1.ext2" should become "expr1 (expr2).ext1.ext2"

I assume there is no underscore in front of "expr3" in the first example (not a typo).
I am not sure to understand the extension problem.

If you work on the file name (%f) an expression like this one should work:
%f("_expr3")("expr3")

The two string deletions are chained:
- on the file name without path, delete all the "_expr3"
- on the resulting string (no "_expr3") delete all the "expr3".

The reverse: %f("expr3")("_expr3")
won't do the job as well. Just after the first deletion as all "expr3" have been deleted, only "_" (underscore) will remain from the "_expr3". The second deletion won't be able do its job.

Best regards

Re: Delete "_expr3"

Hi all.

Georges, is 'expr3' always an same expression or is 'expr3' always on the same place but different words???
In case of the first then Rémis solution would work.
For the second case an RegEx would work.

4 (edited by geohei 2007-12-23 09:45:43)

Re: Delete "_expr3"

Sorry guys, I messed up my initial posting. The task looks like this:

"expr1 (expr2)_expr3.ext1"      should become "expr1 (expr2).ext1"
"expr1 (expr2)_expr3.ext1.ext2" should become "expr1 (expr2).ext1.ext2"

- In other words ... expr3 changes (is not a constant string). Everything after the underscore (which is unique in the filename base) needs to be disregarded.

- This applies also with a second extension appended (.ext2).

Sorry again ...

Thanks,

Re: Delete "_expr3"

I think it's no matter how many extension you have since we match from the left.
Let's try to but the string 'expr1 (expr2)_expr3.ext1' into RegEx parts:

expr1 (expr2)_expr3.ext1

1) Find any char one or more time             .+            expr1                till you find an occurrence of the next part, here the space
2) Find an space                                       \s
3) Find an open bracket                            \(             (
4) Find any char one or more time             .+            expr2                till you find an occurrence of the next part, here the closing bracket
5) Find an close bracket                            \)             )
6) Find any char one or more time             .+            _expr3              till you find an occurrence of the next part, here the dot
7) Find an dot                                           \.            .
8) Find any char one or more time             .+            ext1 or ext1.ext2

Note: for an proper result it may be you have to use an ?-char to match non greedy ==>   .+?


That would be an expression like:                           .+\s\(.+\).+\..+                 (Or   .+?\s\(.+?\).+?\..+   to match non greedy)

Now we group () all parts for back referencing:      (.+\s)(\(.+\))(.+)(\..+)           => for better reading =>  (.+\s)  (\(.+\))  (.+)  (\..+)

I use this expression on the hole filename ==> %f

The "frame skeleton" for RegEx search and replace in Siren 2 is    ==>  (s///g)   ==> (s/search/replace/g)   ==>  %f(s///g)

So we use this expression in Siren  ==> %f(s/(.+\s)(\(.+\))(.+)(\..+)//g)      => for better reading =>  %f(s/    (.+\s)  (\(.+\))  (.+)  (\..+)    //g)


Now  you can refer back to the contents we catch in the groups()  by using \1 and \2 and so on 
==> use this between the last two // right before the g)  ==>  %f(s/ /\1\2\4/g)

Try your self adding thoose \1 \2 \3 \4 thinggy to see whats happens ;-)

At the end you should use an RegEx expression like  ==> %f(s/(.+\s)(\(.+\))(.+)(\..+)/\1\2\4/g)   because you want drop part three \3


HTH?

Re: Delete "_expr3"

Hi, re-reading your post

> Everything after the underscore (which is unique in the filename base) needs to be disregarded.

you could also search for ALL till an underscore till an dot and then the rest
and then leave out all from underscore till an dot.

That's your challenge for the feasts holidays big_smile




And, ooh, you wrote    "> Everything after the underscore"     so the underscore should not be dropped???

Re: Delete "_expr3"

@Stefan

RegEx ... good stuff, but somewhat difficult without your explanantions.

Can you confirm that:
(.+)(\s) considers everything before the LAST space and excludes the space
(.+\s) considers everything before the LAST space and includes the space

I interpreted your explanation (Find an space \s) as "find the first space". In fact, it seems to be more something like "find the last space". Please correct if wrong.


Why is the "/g" at the end of RegEx (s/search/replace/g)?


Taking up your idea of the second post (all until underscore, then dot and the rest) doesn't work since the search for the atom ("_" and ".") starts from the right ?!?!

The result would be:
%f(s/(.+)(\_.+)(\..+)/\1\3/g)

"expr1 (expr2)_expr3.ext1"      -> "expr1 (expr2).ext1"
"expr1 (expr2)_expr3.ext1.ext2" -> "expr1 (expr2).ext2"

.ext1 is missing in line 2.
.ext1 is (for some reason?) part of /2 (_expr3.ext1)

How come ...?

Thanks,

8 (edited by Stefan 2007-12-24 01:18:54)

Re: Delete "_expr3"

Quick answer on the way:

If you expression finds the LAST occurrences instead the first ... you have to learn what GREEDY means big_smile
That's how the most RegEx engines works ... find-as-most-as-possible
So have you tried to  limit the match by using an question-mark ? ?
I don't know why your test find the the LAST... maybe you have other filenames then you provide us as examples?
But try to use something like .+? instead of .+



> Can you confirm that:
> (.+)(\s) considers everything before the LAST space and excludes the space
> (.+\s) considers everything before the LAST space and includes the space
No big_smile  (and a little bit YES because you may have seen such an result.... but didn't understood where it came from... so NO, that's not how it's work.. it's an side-effect only)


See:
. finds one char or sign
+ finds one or more from them you search for left of this +-sign
So .+ will find any char/sign if there is one or more of this "any" char/sign.

Example:
. finds an 'a' or one 'b' or one 'C' or one '-' or one '3' or... but only ONE of any sign. "Any sign" or char because you didn't know which char/sign will be in the filename.

If you want to find more then ONE you search for ANY char/sign and use an quantifier like * or +
So .+ find ANY...   but not ONE item only but as-many-as-possible .... till the RegEx engine find an other expression you search for.
As i sad.. the most RegEx engines are greedy and find more then you want.
So use the ?-modifier to search "lazy"


The only difference between  (.+)(\s) and (.+\s)  is that you first have split the match into two groups rather then using one group for back referencing.
Using two groups is good if you want to leave one group content out of the replacement.
If you have two groups you can choose which content from which group you want... group one \1, or group \2, or both \1\2.
With one group you have simply no choice ;-)


See this link for an overview about RegEx syntax
http://www.regular-expressions.info/reference.html
http://scarabee.software.free.fr/forum/ … php?id=131


--------------------------------------


> Why is the "/g" at the end of RegEx (s/search/replace/g)?
g means global and search not only for the first occurrences in an string but for all. See Siren help too.
You may play with this modifier as well as with all the other RegEx  meta chars to find the right expression for you.

Damn big_smile it's now longer as i want to wrote big_smile

So do you need more help? Should i post the answer?


--------------------------------

> %f(s/(.+)(\_.+)(\..+)/\1\3/g)

You don't have to escape the underscore since this is no meta character from RegEx. That would be [\^$.|?*+()
Simple use (_.+)


Good work so far Georges. Do you want me to present the solution or do you want "learning by doing"?   You choose.

Re: Delete "_expr3"

Here are an link about this topic:

Mastering Regular Expressions => http://www.oreilly.com/catalog/regex/chapter/ch04.html

It's about the technical behind RegEx, heavy stuff big_smile , i wish i could english :-(

Re: Delete "_expr3"

Stefan wrote:

If you expression finds the LAST occurrences instead the first ... you have to learn what GREEDY means big_smile
That's how the most RegEx engines works ... find-as-most-as-possible
So have you tried to  limit the match by using an question-mark ? ?
I don't know why your test find the the LAST... maybe you have other filenames then you provide us as examples?
But try to use something like .+? instead of .+

Ok, here one of the real files ...

%f(s/(.+)(\s)/\1 1###2 \2/g)
11;14 (Channel1 - 30112007 1330 169 MP2192g MP2192e AC320384g)_+0+0+160.mpg.abc
11;14 (Channel1 - 30112007 1330 169 MP2192g MP2192e 1###2 AC320384g)_+0+0+160.mpg.abc

Things change with .+?. See below ...

%f(s/(.+?)(\s)/\1 1###2 \2/g)
11;14 (Channel1 - 30112007 1330 169 MP2192g MP2192e AC320384g)_+0+0+160.mpg.abc
11;14 1###2 (Channel1 ...
Stefan wrote:

...
If you want to find more then ONE you search for ANY char/sign and use an quantifier like * or +
So .+ find ANY...   but not ONE item only but as-many-as-possible .... till the RegEx engine find an other expression you search for.
As i sad.. the most RegEx engines are greedy and find more then you want.
So use the ?-modifier to search "lazy"

I don't get this "greedy" stuff. In fact, the .+ only works as expected if I use .+?. \s means "until next space", which is a clear definition. Hence ... why does (.+)(\s) not simply stop at the first space found ?! In other words ... would you mind to explain "greedy" by simple words. Thanks,

Stefan wrote:

g means global and search not only for the first occurrences in an string but for all. See Siren help too.

Search within Siren help didn't reveal any "global". Sorry!

Stefan wrote:

Good work so far Georges. Do you want me to present the solution or do you want "learning by doing"?   You choose.

Let's continue ...

%f(s/(.+?)(_.+)(\..+?)/\1 1###2 \2 2###3 \3 3###4 \4/g)
11;14 (Channel1 - 30112007 1330 169 MP2192g MP2192e AC320384g)_+0+0+160.mpg.abc
11;14 (Channel1 - 30112007 1330 169 MP2192g MP2192e AC320384g) 1###2 _+0+0+160.mpg 2###3 .abc 3###4

For me, this should work. But why does \..+? not stop at . of .mpg? Something like ...
- take everything before _ (excluding)
- take _ and everything behind until next . (excluding)
- take . and everything behind.

Where's the mistake?

Thanks,

11 (edited by Stefan 2007-12-24 23:10:39)

Re: Delete "_expr3"

Huh O.O

let's see if i can handle this ;-)

We know an . dot finds one single char/sign.
So .+ finds one or as-many-as-possible till the last char/sign in the string/filename. That's how regex engines works.
An \s    (if you use .+\s)    finds one single withespace/blank, but only after the .+ is satisfied
And this is only if the .+ has cached all it could get. All chars, all digits, all blanks, all signs.
Yes, please note that the . dot find ANY char or sign, even whitespace blanks!   That's Fact!   Think like that! big_smile    It's the way it works.



> would you mind to explain "greedy" by simple words.
What is simpler then "get as many as you can get" ? big_smile
Maybe:
'greedy' find the right most match.
'Lazy' with ?-sign find the left most possible match.
i will post an link to an tool so you can watch how regex engine works



Since your real filename are different from your example filenames you have seen other results then i had expected.
Your right that .+\s finds the very last blank on the right.
So use .+?\s to let the engine match the first blank from the left.


----------------------------------------------

Note:
i don't talk here about groups() because they don't matter for matching.
The groups are only good for you have somewhat you can refer back to later if you need the content of this match.
And i don't tell you that there are regex engines that works other way around like we talk about here big_smile That are for a further lesson big_smile (or read Mastering of RegEx big_smile)

----------------------------------------------


You have
1;14 (Channel1 - 30112007 1330 169 MP2192g MP2192e AC320384g)_+0+0+160.mpg.abc

You want to have
1;14 (Channel1 - 30112007 1330 169 MP2192g MP2192e AC320384g).mpg.abc


1.) We have to search for ALL till an underscore  ===> .+_
2.) We have to search for ALL till the first dot      ===> .+?\.
3.) We have to catch the rest                             ===> .+



It also works if we move the dot from part 2 to part 3 (i think you want keep the dot?)
1.) We have to search for ALL till an underscore  ===> .+_
2.) We have to search for ALL                            ===> .+?
3.) till the first dot and catch the rest                  ===> \..+


I think you don't want keep the underscore but drop them?
1.) We have to search for ALL                            ===> .+
2.) till an underscore and search for ALL             ===> _.+?
3.) till the first dot and catch the rest                  ===> \..+


So i try an expression like => (.+)_.+?(\..+) and replace with \1\2        ===>   %f(s/(.+)_.+?(\..+)/\1\2/g)
We don't have to group the _.+? because we didn't need them later


-----------------------------------------------------------------------------------

Test it

Try

(.+)_.+(\..+)       ===>   %f(s/(.+)_.+(\..+)/\1\2/)

instead of
(.+)_.+?(\..+)      ===>   %f(s/(.+)_.+?(\..+)/\1\2/)

to see the greedy effect.

And we don't need the global 'g'-sign because we have only one occurrence of our match

-----------------------------------------------------------------------------------




> Search within Siren help didn't reveal any "global". Sorry!
OK. big_smile sorry.
Try:
1.) open Siren help with F1-key
2.) Strg+F to search
3.) search for regu
You there.
That 'g' means global i know from reading PERL RegEx manuals on the net.


-------------------------------------------------


> But why does \..+? not stop at . of .mpg?

This ?-sign you have to put to the   (-how do we call this?-)   last .+ group left before the \..+ group.
Because this last group should be lazy and stop before the dot we want to catch with the \.
So try  .+?\..+

--------------------------------

I am afraid i didn't find the right words sad
Maybe you should join an real regex forum?

----------------

I hope i have covered all you questions, if not please nudge me ;-)

If you have more questions please ask.

Merry x-mas or what ever to all.



i hope no regex pro will read this stammering ever

Re: Delete "_expr3"

The Regex Coach is the tool i had in mind.

With this tool you can enter your file name and an reg expression
and then watch step-by-step how the engine works and match parts.

The Regex Coach - interactive regular expressions
http://www.weitz.de/regex-coach/

http://www.weitz.de/regex-coach/shot.png




The parse tree
If you select the "Tree" tab you'll see a (simplified) graphical representation of the parse tree of the regular expression.
This is how the regex engine "sees" the expression and it might help you to understand what's going on (or why the regular expression isn't interpreted as you intended it to be).

Single-stepping through the matching process
Finally, the "Step" tab will lead you to two panes which have the same content as the two main panes.
However, here you can watch the regex engine "at work". This is best explained with an example, so see the corresponding part of the tutorial. http://www.weitz.de/regex-coach/tutorial18.html

Re: Delete "_expr3"

Some words to regex regular expression:

Yes there are pros you can wake up in the middle of the night and they tell you an correct 255 chars long regex ;-)

But we have to find our regex by trail and error :-(

1.) First we have to split our string/filename into parts: what we want to match, what is in front of this match and what is behind.
2.) Then we have to find the corresponding regex meta chars to match this parts
3.) Then we have to correct this meta chars mostly ;-)
That's the normal way and this goes only better if you have to spend time every day with regex questions so you have the right meta char "by hand". For others
it's the best to download an "one sheet regex syntax"-PDF,   ==>  http://www.ilovejackdaniels.com/cheat-s … eat-sheet/
print it out and lay it under the keyboard ;-)

Mostly you need only a few meta chars:
.      dot find one of ANY char
\s    finds an withespace / space / blank
\d    finds one digit
+    finds one or more of the left meta char (i.e. there MUST be something)
*    finds non or more of the left meta char (i.e. there must NOT be something)

And then by every use you will learn some more about regex and even can help other *sic*

Just my $0.02 of the day.

Re: Delete "_expr3"

Hi Stefan.

I hardly ever got as much support in such detail than in this thread by you!

Well ... thanks a lot. I read thoroughly through all 3 posted articles and could simulate perfectly what you wanted me to test. The RegEx works fine for me now, and I got some deep insight into this subject.

Thanks a lot again and merry x-mas!

Bye,

Re: Delete "_expr3"

Hallelujah, great it works big_smile

Good job Georges.

Re: Delete "_expr3"

Well, it's more something like "Good job Stefan" big_smile